WorldWideScience

Sample records for audio feature space

  1. Emotion-based Music Rretrieval on a Well-reduced Audio Feature Space

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Chua, Bee Yong; Nanopoulos, Alexandros

    2009-01-01

    Music expresses emotion. A number of audio extracted features have influence on the perceived emotional expression of music. These audio features generate a high-dimensional space, on which music similarity retrieval can be performed effectively, with respect to human perception of the music-emotion...... on a number of dimensionality reduction algorithms, including both classic and novel approaches. The paper clearly envisages which dimensionality reduction techniques on the considered audio feature space, can preserve in average the accuracy of the emotion-based music retrieval....

  2. Audio feature extraction using probability distribution function

    Science.gov (United States)

    Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

    2015-05-01

    Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.

  3. On the Use of Memory Models in Audio Features

    DEFF Research Database (Denmark)

    Jensen, Karl Kristoffer

    2011-01-01

    Audio feature estimation is potentially improved by including higher- level models. One such model is the Short Term Memory (STM) model. A new paradigm of audio feature estimation is obtained by adding the influence of notes in the STM. These notes are identified when the perceptual spectral flux...

  4. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  5. Music Genre Classification Using MIDI and Audio Features

    Science.gov (United States)

    Cataltepe, Zehra; Yaslan, Yusuf; Sonmez, Abdullah

    2007-12-01

    We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.

  6. Music Genre Classification Using MIDI and Audio Features

    Directory of Open Access Journals (Sweden)

    Abdullah Sonmez

    2007-01-01

    Full Text Available We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD. NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.

  7. Survey of compressed domain audio features and their expressiveness

    Science.gov (United States)

    Pfeiffer, Silvia; Vincent, Thomas

    2003-01-01

    We give an overview of existing audio analysis approaches in the compressed domain and incorporate them into a coherent formal structure. After examining the kinds of information accessible in an MPEG-1 compressed audio stream, we describe a coherent approach to determine features from them and report on a number of applications they enable. Most of them aim at creating an index to the audio stream by segmenting the stream into temporally coherent regions, which may be classified into pre-specified types of sounds such as music, speech, speakers, animal sounds, sound effects, or silence. Other applications centre around sound recognition such as gender, beat or speech recognition.

  8. Turkish Music Genre Classification using Audio and Lyrics Features

    Directory of Open Access Journals (Sweden)

    Önder ÇOBAN

    2017-05-01

    Full Text Available Music Information Retrieval (MIR has become a popular research area in recent years. In this context, researchers have developed music information systems to find solutions for such major problems as automatic playlist creation, hit song detection, and music genre or mood classification. Meta-data information, lyrics, or melodic content of music are used as feature resource in previous works. However, lyrics do not often used in MIR systems and the number of works in this field is not enough especially for Turkish. In this paper, firstly, we have extended our previously created Turkish MIR (TMIR dataset, which comprises of Turkish lyrics, by including the audio file of each song. Secondly, we have investigated the effect of using audio and textual features together or separately on automatic Music Genre Classification (MGC. We have extracted textual features from lyrics using different feature extraction models such as word2vec and traditional Bag of Words. We have conducted our experiments on Support Vector Machine (SVM algorithm and analysed the impact of feature selection and different feature groups on MGC. We have considered lyrics based MGC as a text classification task and also investigated the effect of term weighting method. Experimental results show that textual features can also be effective as well as audio features for Turkish MGC, especially when a supervised term weighting method is employed. We have achieved the highest success rate as 99,12\\% by using both audio and textual features together.

  9. Simple Solutions for Space Station Audio Problems

    Science.gov (United States)

    Wood, Eric

    2016-01-01

    Throughout this summer, a number of different projects were supported relating to various NASA programs, including the International Space Station (ISS) and Orion. The primary project that was worked on was designing and testing an acoustic diverter which could be used on the ISS to increase sound pressure levels in Node 1, a module that does not have any Audio Terminal Units (ATUs) inside it. This acoustic diverter is not intended to be a permanent solution to providing audio to Node 1; it is simply intended to improve conditions while more permanent solutions are under development. One of the most exciting aspects of this project is that the acoustic diverter is designed to be 3D printed on the ISS, using the 3D printer that was set up earlier this year. Because of this, no new hardware needs to be sent up to the station, and no extensive hardware testing needs to be performed on the ground before sending it to the station. Instead, the 3D part file can simply be uploaded to the station's 3D printer, where the diverter will be made.

  10. Audio-video feature correlation: faces and speech

    Science.gov (United States)

    Durand, Gwenael; Montacie, Claude; Caraty, Marie-Jose; Faudemay, Pascal

    1999-08-01

    This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

  11. Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

    Directory of Open Access Journals (Sweden)

    Pachet François

    2009-01-01

    Full Text Available We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs, a construct designed to support the representation of knowledge about audio signal processing. We describe the most important aspects of AFs, in particular their dimensional type system, on which are based pattern-based random generators, heuristics, and rewriting rules. We show how AFs generalize or improve previous approaches used in feature generation. We report on several projects using AFs for difficult audio classification tasks, demonstrating their advantage over standard audio features. More generally, we propose analytical features as a paradigm to bring raw signals into the world of symbolic computation.

  12. Music Genre Classification Using MIDI and Audio Features

    OpenAIRE

    Cataltepe Zehra; Yaslan Yusuf; Sonmez Abdullah

    2007-01-01

    We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to a...

  13. Audio segmentation using Flattened Local Trimmed Range for ecological acoustic space analysis

    Directory of Open Access Journals (Sweden)

    Giovany Vega

    2016-06-01

    Full Text Available The acoustic space in a given environment is filled with footprints arising from three processes: biophony, geophony and anthrophony. Bioacoustic research using passive acoustic sensors can result in thousands of recordings. An important component of processing these recordings is to automate signal detection. In this paper, we describe a new spectrogram-based approach for extracting individual audio events. Spectrogram-based audio event detection (AED relies on separating the spectrogram into background (i.e., noise and foreground (i.e., signal classes using a threshold such as a global threshold, a per-band threshold, or one given by a classifier. These methods are either too sensitive to noise, designed for an individual species, or require prior training data. Our goal is to develop an algorithm that is not sensitive to noise, does not need any prior training data and works with any type of audio event. To do this, we propose: (1 a spectrogram filtering method, the Flattened Local Trimmed Range (FLTR method, which models the spectrogram as a mixture of stationary and non-stationary energy processes and mitigates the effect of the stationary processes, and (2 an unsupervised algorithm that uses the filter to detect audio events. We measured the performance of the algorithm using a set of six thoroughly validated audio recordings and obtained a sensitivity of 94% and a positive predictive value of 89%. These sensitivity and positive predictive values are very high, given that the validated recordings are diverse and obtained from field conditions. The algorithm was then used to extract audio events in three datasets. Features of these audio events were plotted and showed the unique aspects of the three acoustic communities.

  14. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  15. Feature Selection in Hierarchical Feature Spaces

    OpenAIRE

    Ristoski, Petar; Paulheim, Heiko

    2014-01-01

    Feature selection is an important preprocessing step in data mining, which has an impact on both the runtime and the result quality of the subsequent processing steps. While there are many cases where hierarchic relations between features exist, most existing feature selection approaches are not capable of exploiting those relations. In this paper, we introduce a method for feature selection in hierarchical feature spaces. The method first eliminates redundant features along paths in the hier...

  16. Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

    Science.gov (United States)

    Muhammad, Ghulam; Alghathbar, Khalid

    2011-07-01

    Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

  17. Extraction, Mapping, and Evaluation of Expressive Acoustic Features for Adaptive Digital Audio Effects

    DEFF Research Database (Denmark)

    Holfelt, Jonas; Csapo, Gergely; Andersson, Nikolaj Schwab

    2017-01-01

    perceptual sound descriptors for communicating emotions. This project was aiming to exploit sounds as expressive in- dicators to create novel sound transformations. A test was conducted to see if guitar players could differentiate be- tween an adaptive and non-adaptive version of a digital au- dio effect......This paper describes the design and implementation of a real-time adaptive digital audio effect with an emphasis on using expressive audio features that control effect param- eters. Research in adaptive digital audio effects is cov- ered along with studies about expressivity and important...

  18. Turkish Music Genre Classification using Audio and Lyrics Features

    OpenAIRE

    Önder ÇOBAN

    2017-01-01

    Music Information Retrieval (MIR) has become a popular research area in recent years. In this context, researchers have developed music information systems to find solutions for such major problems as automatic playlist creation, hit song detection, and music genre or mood classification. Meta-data information, lyrics, or melodic content of music are used as feature resource in previous works. However, lyrics do not often used in MIR systems and the number of works in this field is not enough...

  19. Extraction Of Audio Features For Emotion Recognition System Based On Music

    Directory of Open Access Journals (Sweden)

    Kee Moe Han

    2015-08-01

    Full Text Available Music is the combination of melody linguistic information and the vocalists emotion. Since music is a work of art analyzing emotion in music by computer is a difficult task. Many approaches have been developed to detect the emotions included in music but the results are not satisfactory because emotion is very complex. In this paper the evaluations of audio features from the music files are presented. The extracted features are used to classify the different emotion classes of the vocalists. Musical features extraction is done by using Music Information Retrieval MIR tool box in this paper. The database of 100 music clips are used to classify the emotions perceived in music clips. Music may contain many emotions according to the vocalists mood such as happy sad nervous bored peace etc. In this paper the audio features related to the emotions of the vocalists are extracted to use in emotion recognition system based on music.

  20. Music preferences based on audio features, and its relation to personality

    OpenAIRE

    Dunn, Greg

    2009-01-01

    Recent studies have summarized reported music preferences by genre into four broadly defined categories, which relate to various personality characteristics. Other research has indicated that genre classification is ambiguous and inconsistent. This ambiguity suggests that research relating personality to music preferences based on genre could benefit from a more objective definition of music. This problem is addressed by investigating how music preferences linked to objective audio features r...

  1. Feature space analysis of MRI

    Science.gov (United States)

    Soltanian-Zadeh, Hamid; Windham, Joe P.; Peck, Donald J.

    1997-04-01

    This paper presents development and performance evaluation of an MRI feature space method. The method is useful for: identification of tissue types; segmentation of tissues; and quantitative measurements on tissues, to obtain information that can be used in decision making (diagnosis, treatment planning, and evaluation of treatment). The steps of the work accomplished are as follows: (1) Four T2-weighted and two T1-weighted images (before and after injection of Gadolinium) were acquired for ten tumor patients. (2) Images were analyed by two image analysts according to the following algorithm. The intracranial brain tissues were segmented from the scalp and background. The additive noise was suppressed using a multi-dimensional non-linear edge- preserving filter which preserves partial volume information on average. Image nonuniformities were corrected using a modified lowpass filtering approach. The resulting images were used to generate and visualize an optimal feature space. Cluster centers were identified on the feature space. Then images were segmented into normal tissues and different zones of the tumor. (3) Biopsy samples were extracted from each patient and were subsequently analyzed by the pathology laboratory. (4) Image analysis results were compared to each other and to the biopsy results. Pre- and post-surgery feature spaces were also compared. The proposed algorithm made it possible to visualize the MRI feature space and to segment the image. In all cases, the operators were able to find clusters for normal and abnormal tissues. Also, clusters for different zones of the tumor were found. Based on the clusters marked for each zone, the method successfully segmented the image into normal tissues (white matter, gray matter, and CSF) and different zones of the lesion (tumor, cyst, edema, radiation necrosis, necrotic core, and infiltrated tumor). The results agreed with those obtained from the biopsy samples. Comparison of pre- to post-surgery and radiation

  2. Interoperability: voice and audio Standars for Space mission

    OpenAIRE

    Peinado, Osvaldo Luis

    2016-01-01

    This paper describes the setup of voice communication in a space mission context, points out special requirements and operational approaches, and defines the transmission, coding, interface, and quality parameters needed for space mission support. It provides system designers with a subset of the larger industry set of standards from which to choose, depending on the application and purpose of the voice system.

  3. An Analysis of Audio Features to Develop a Human Activity Recognition Model Using Genetic Algorithms, Random Forests, and Neural Networks

    Directory of Open Access Journals (Sweden)

    Carlos E. Galván-Tejada

    2016-01-01

    Full Text Available This work presents a human activity recognition (HAR model based on audio features. The use of sound as an information source for HAR models represents a challenge because sound wave analyses generate very large amounts of data. However, feature selection techniques may reduce the amount of data required to represent an audio signal sample. Some of the audio features that were analyzed include Mel-frequency cepstral coefficients (MFCC. Although MFCC are commonly used in voice and instrument recognition, their utility within HAR models is yet to be confirmed, and this work validates their usefulness. Additionally, statistical features were extracted from the audio samples to generate the proposed HAR model. The size of the information is necessary to conform a HAR model impact directly on the accuracy of the model. This problem also was tackled in the present work; our results indicate that we are capable of recognizing a human activity with an accuracy of 85% using the HAR model proposed. This means that minimum computational costs are needed, thus allowing portable devices to identify human activities using audio as an information source.

  4. Estimation of violin bowing features from Audio recordings with Convolutional Networks

    DEFF Research Database (Denmark)

    Perez-Carillo, Alfonso; Purwins, Hendrik

    2017-01-01

    . However, the acquisition process usually involves the use of expensive sensing systems and complex setups that are generally intrusive in practice. An alternative to direct acquisition is through the analysis of the audio signal. So called indirect acquisition has many advantages including the simplicity...... and low-cost of the acquisition and its nonintrusive nature. The main challenge is designing robust detection algorithms to be as accurate as the direct approaches. In this paper, we present an indirect acquisition method to estimate violin bowing controls from audio signal analysis based on training...

  5. Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures

    Directory of Open Access Journals (Sweden)

    Alex Wilson

    2017-12-01

    Full Text Available The creation of multitrack mixes by audio engineers is a time-consuming activity and creating high-quality mixes requires a great deal of knowledge and experience. Previous studies on the perception of music mixes have been limited by the relatively small number of human-made mixes analysed. This paper describes a novel “mix-space”, a parameter space which contains all possible mixes using a finite set of tools, as well as methods for the parametric generation of artificial mixes in this space. Mixes that use track gain, panning and equalisation are considered. This allows statistical methods to be used in the study of music mixing practice, such as Monte Carlo simulations or population-based optimisation methods. Two applications are described: an investigation into the robustness and accuracy of tempo-estimation algorithms and an experiment to estimate distributions of spectral centroid values within sets of mixes. The potential for further work is also described.

  6. Intelligent audio analysis

    CERN Document Server

    Schuller, Björn W

    2013-01-01

    This book provides the reader with the knowledge necessary for comprehension of the field of Intelligent Audio Analysis. It firstly introduces standard methods and discusses the typical Intelligent Audio Analysis chain going from audio data to audio features to audio recognition.  Further, an introduction to audio source separation, and enhancement and robustness are given. After the introductory parts, the book shows several applications for the three types of audio: speech, music, and general sound. Each task is shortly introduced, followed by a description of the specific data and methods applied, experiments and results, and a conclusion for this specific task. The books provides benchmark results and standardized test-beds for a broader range of audio analysis tasks. The main focus thereby lies on the parallel advancement of realism in audio analysis, as too often today’s results are overly optimistic owing to idealized testing conditions, and it serves to stimulate synergies arising from transfer of ...

  7. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality...... short trajectories are constructed to rep- resent the motion of players. From these, four motion fea- tures are extracted and combined directly with audio fea- tures for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types...

  8. Fusion of visual and audio features for person identification in real video

    Science.gov (United States)

    Li, Dongge; Wei, Gang; Sethi, Ishwar K.; Dimitrova, Nevenka

    2001-01-01

    In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.

  9. Audio-visual Classification and Fusion of Spontaneous Affect Data in Likelihood Space

    NARCIS (Netherlands)

    Nicolaou, Mihalis A.; Gunes, Hatice; Pantic, Maja

    2010-01-01

    This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect, utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that the generative model structure in the classifier is

  10. Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Michalis Papakostas

    2017-06-01

    Full Text Available Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.

  11. Categorizing Video Game Audio

    DEFF Research Database (Denmark)

    Westerberg, Andreas Rytter; Schoenau-Fog, Henrik

    2015-01-01

    This paper dives into the subject of video game audio and how it can be categorized in order to deliver a message to a player in the most precise way. A new categorization, with a new take on the diegetic spaces, can be used a tool of inspiration for sound- and game-designers to rethink how...... they can use audio in video games. The conclusion of this study is that the current models' view of the diegetic spaces, used to categorize video game audio, is not t to categorize all sounds. This can however possibly be changed though a rethinking of how the player interprets audio....

  12. Detecting paralinguistic events in audio stream using context in features and probabilistic decisions.

    Science.gov (United States)

    Gupta, Rahul; Audhkhasi, Kartik; Lee, Sungbok; Narayanan, Shrikanth

    2016-03-01

    Non-verbal communication involves encoding, transmission and decoding of non-lexical cues and is realized using vocal (e.g. prosody) or visual (e.g. gaze, body language) channels during conversation. These cues perform the function of maintaining conversational flow, expressing emotions, and marking personality and interpersonal attitude. In particular, non-verbal cues in speech such as paralanguage and non-verbal vocal events (e.g. laughters, sighs, cries) are used to nuance meaning and convey emotions, mood and attitude. For instance, laughters are associated with affective expressions while fillers (e.g. um, ah, um) are used to hold floor during a conversation. In this paper we present an automatic non-verbal vocal events detection system focusing on the detect of laughter and fillers. We extend our system presented during Interspeech 2013 Social Signals Sub-challenge (that was the winning entry in the challenge) for frame-wise event detection and test several schemes for incorporating local context during detection. Specifically, we incorporate context at two separate levels in our system: (i) the raw frame-wise features and, (ii) the output decisions. Furthermore, our system processes the output probabilities based on a few heuristic rules in order to reduce erroneous frame-based predictions. Our overall system achieves an Area Under the Receiver Operating Characteristics curve of 95.3% for detecting laughters and 90.4% for fillers on the test set drawn from the data specifications of the Interspeech 2013 Social Signals Sub-challenge. We perform further analysis to understand the interrelation between the features and obtained results. Specifically, we conduct a feature sensitivity analysis and correlate it with each feature's stand alone performance. The observations suggest that the trained system is more sensitive to a feature carrying higher discriminability with implications towards a better system design.

  13. Detecting paralinguistic events in audio stream using context in features and probabilistic decisions☆

    Science.gov (United States)

    Gupta, Rahul; Audhkhasi, Kartik; Lee, Sungbok; Narayanan, Shrikanth

    2017-01-01

    Non-verbal communication involves encoding, transmission and decoding of non-lexical cues and is realized using vocal (e.g. prosody) or visual (e.g. gaze, body language) channels during conversation. These cues perform the function of maintaining conversational flow, expressing emotions, and marking personality and interpersonal attitude. In particular, non-verbal cues in speech such as paralanguage and non-verbal vocal events (e.g. laughters, sighs, cries) are used to nuance meaning and convey emotions, mood and attitude. For instance, laughters are associated with affective expressions while fillers (e.g. um, ah, um) are used to hold floor during a conversation. In this paper we present an automatic non-verbal vocal events detection system focusing on the detect of laughter and fillers. We extend our system presented during Interspeech 2013 Social Signals Sub-challenge (that was the winning entry in the challenge) for frame-wise event detection and test several schemes for incorporating local context during detection. Specifically, we incorporate context at two separate levels in our system: (i) the raw frame-wise features and, (ii) the output decisions. Furthermore, our system processes the output probabilities based on a few heuristic rules in order to reduce erroneous frame-based predictions. Our overall system achieves an Area Under the Receiver Operating Characteristics curve of 95.3% for detecting laughters and 90.4% for fillers on the test set drawn from the data specifications of the Interspeech 2013 Social Signals Sub-challenge. We perform further analysis to understand the interrelation between the features and obtained results. Specifically, we conduct a feature sensitivity analysis and correlate it with each feature's stand alone performance. The observations suggest that the trained system is more sensitive to a feature carrying higher discriminability with implications towards a better system design. PMID:28713197

  14. Feature space analysis: effects of MRI protocols

    Science.gov (United States)

    Soltanian-Zadeh, Hamid; Scarpace, Lisa; Peck, Donald J.

    2000-06-01

    We present a method for exploring the relationship between the image segmentation results obtained by an optimal feature space method and the MRI protocols used. The steps of the work accomplished are as follows. (1) Three patients with brain tumors were imaged by a 1.5T General Electric Signa MRI System, using multiple protocols (T1- and T2-weighted spin- echo and FLAIR). T1-weighted images were acquired before and after Gadolinium (Gd) injection. (2) Image volumes were co- registered, and images of a slice through the center of the tumor were selected for processing. (3) Nine sets of images were defined by selecting certain MR images (e.g., 4T2's + 1T1, 4T2's + FLAIR, 2T2's + 1T1). (4) Using the images in each set, the optimal feature space was generated and images were segmented into normal tissues and different tumor zones. (5) Segmentation results obtained using different MRI sets were compared. We found that the locations of the clusters for the tumor zones and their corresponding regions in the image domain changed to some extent as a function of the MR images (MRI protocols) used. However, the segmentation results for the total lesion and normal tissues remained almost unchanged.

  15. Audio Restoration

    Science.gov (United States)

    Esquef, Paulo A. A.

    The first reproducible recording of human voice was made in 1877 on a tinfoil cylinder phonograph devised by Thomas A. Edison. Since then, much effort has been expended to find better ways to record and reproduce sounds. By the mid-1920s, the first electrical recordings appeared and gradually took over purely acoustic recordings. The development of electronic computers, in conjunction with the ability to record data onto magnetic or optical media, culminated in the standardization of compact disc format in 1980. Nowadays, digital technology is applied to several audio applications, not only to improve the quality of modern and old recording/reproduction techniques, but also to trade off sound quality for less storage space and less taxing transmission capacity requirements.

  16. Audio Twister

    DEFF Research Database (Denmark)

    Cermak, Daniel; Moreno Garcia, Rodrigo; Monastiridis, Stefanos

    2015-01-01

    Daniel Cermak-Sassenrath, Rodrigo Moreno Garcia, Stefanos Monastiridis. Audio Twister. Installation. P-Hack Copenhagen 2015, Copenhagen, DK, Apr 24, 2015.......Daniel Cermak-Sassenrath, Rodrigo Moreno Garcia, Stefanos Monastiridis. Audio Twister. Installation. P-Hack Copenhagen 2015, Copenhagen, DK, Apr 24, 2015....

  17. Introduction to audio analysis a MATLAB approach

    CERN Document Server

    Giannakopoulos, Theodoros

    2014-01-01

    Introduction to Audio Analysis serves as a standalone introduction to audio analysis, providing theoretical background to many state-of-the-art techniques. It covers the essential theory necessary to develop audio engineering applications, but also uses programming techniques, notably MATLAB®, to take a more applied approach to the topic. Basic theory and reproducible experiments are combined to demonstrate theoretical concepts from a practical point of view and provide a solid foundation in the field of audio analysis. Audio feature extraction, audio classification, audio segmentation, au

  18. Robust Metric based Anomaly Detection in Kernel Feature Space

    Directory of Open Access Journals (Sweden)

    B. Du

    2012-07-01

    Full Text Available This thesis analyzes the anomalous measurement metric in high dimension feature space, where it is supposed the Gaussian assumption for state-of-art mahanlanobis algorithms is reasonable. The realization of the detector in high dimension feature space is by kernel trick. Besides, the masking and swamping effect is further inhibited by an iterative approach in the feature space. The proposed robust metric based anomaly detection presents promising performance in hyperspectral remote sensing images: the separability between anomalies and background is enlarged; background statistics is more concentrated, and immune to the contamination by anomalies.

  19. Software for objective comparison of vocal acoustic features over weeks of audio recording: KLFromRecordingDays

    Science.gov (United States)

    Soderstrom, Ken; Alalawi, Ali

    KLFromRecordingDays allows measurement of Kullback-Leibler (KL) distances between 2D probability distributions of vocal acoustic features. Greater KL distance measures reflect increased phonological divergence across the vocalizations compared. The software has been used to compare *.wav file recordings made by Sound Analysis Recorder 2011 of songbird vocalizations pre- and post-drug and surgical manipulations. Recordings from individual animals in *.wav format are first organized into subdirectories by recording day and then segmented into individual syllables uttered and acoustic features of these syllables using Sound Analysis Pro 2011 (SAP). KLFromRecordingDays uses syllable acoustic feature data output by SAP to a MySQL table to generate and compare "template" (typically pre-treatment) and "target" (typically post-treatment) probability distributions. These distributions are a series of virtual 2D plots of the duration of each syllable (as x-axis) to each of 13 other acoustic features measured by SAP for that syllable (as y-axes). Differences between "template" and "target" probability distributions for each acoustic feature are determined by calculating KL distance, a measure of divergence of the target 2D distribution pattern from that of the template. KL distances and the mean KL distance across all acoustic features are calculated for each recording day and output to an Excel spreadsheet. Resulting data for individual subjects may then be pooled across treatment groups and graphically summarized and used for statistical comparisons. Because SAP-generated MySQL files are accessed directly, data limits associated with spreadsheet output are avoided, and the totality of vocal output over weeks may be objectively analyzed all at once. The software has been useful for measuring drug effects on songbird vocalizations and assessing recovery from damage to regions of vocal motor cortex. It may be useful in studies employing other species, and as part of speech

  20. Software for objective comparison of vocal acoustic features over weeks of audio recording: KLFromRecordingDays

    Directory of Open Access Journals (Sweden)

    Ken Soderstrom

    2017-01-01

    Full Text Available KLFromRecordingDays allows measurement of Kullback–Leibler (KL distances between 2D probability distributions of vocal acoustic features. Greater KL distance measures reflect increased phonological divergence across the vocalizations compared. The software has been used to compare *.wav file recordings made by Sound Analysis Recorder 2011 of songbird vocalizations pre- and post-drug and surgical manipulations. Recordings from individual animals in *.wav format are first organized into subdirectories by recording day and then segmented into individual syllables uttered and acoustic features of these syllables using Sound Analysis Pro 2011 (SAP. KLFromRecordingDays uses syllable acoustic feature data output by SAP to a MySQL table to generate and compare “template” (typically pre-treatment and “target” (typically post-treatment probability distributions. These distributions are a series of virtual 2D plots of the duration of each syllable (as x-axis to each of 13 other acoustic features measured by SAP for that syllable (as y-axes. Differences between “template” and “target” probability distributions for each acoustic feature are determined by calculating KL distance, a measure of divergence of the target 2D distribution pattern from that of the template. KL distances and the mean KL distance across all acoustic features are calculated for each recording day and output to an Excel spreadsheet. Resulting data for individual subjects may then be pooled across treatment groups and graphically summarized and used for statistical comparisons. Because SAP-generated MySQL files are accessed directly, data limits associated with spreadsheet output are avoided, and the totality of vocal output over weeks may be objectively analyzed all at once. The software has been useful for measuring drug effects on songbird vocalizations and assessing recovery from damage to regions of vocal motor cortex. It may be useful in studies employing other

  1. Feature-space transformation improves supervised segmentation across scanners

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Achterberg, Hakim C.; de Bruijne, Marleen

    2015-01-01

    Image-segmentation techniques based on supervised classification generally perform well on the condition that training and test samples have the same feature distribution. However, if training and test images are acquired with different scanners or scanning parameters, their feature distributions...... can be very different, which can hurt the performance of such techniques. We propose a feature-space-transformation method to overcome these differences in feature distributions. Our method learns a mapping of the feature values of training voxels to values observed in images from the test scanner....... This transformation is learned from unlabeled images of subjects scanned on both the training scanner and the test scanner. We evaluated our method on hippocampus segmentation on 27 images of the Harmonized Hippocampal Protocol (HarP), a heterogeneous dataset consisting of 1.5T and 3T MR images. The results showed...

  2. Efficient audio signal processing for embedded systems

    Science.gov (United States)

    Chiu, Leung Kin

    As mobile platforms continue to pack on more computational power, electronics manufacturers start to differentiate their products by enhancing the audio features. However, consumers also demand smaller devices that could operate for longer time, hence imposing design constraints. In this research, we investigate two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound ”richer" and "fuller." Piezoelectric speakers have a small form factor but exhibit poor response in the low-frequency region. In the algorithm, we combine psychoacoustic bass extension and dynamic range compression to improve the perceived bass coming out from the tiny speakers. We also developed an audio energy reduction algorithm for loudspeaker power management. The perceptually transparent algorithm extends the battery life of mobile devices and prevents thermal damage in speakers. This method is similar to audio compression algorithms, which encode audio signals in such a ways that the compression artifacts are not easily perceivable. Instead of reducing the storage space, however, we suppress the audio contents that are below the hearing threshold, therefore reducing the signal energy. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The system is an example of an analog-to-information converter. The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine

  3. Audio Papers

    DEFF Research Database (Denmark)

    Groth, Sanne Krogh; Samson, Kristine

    2016-01-01

    of the written paper through its specific use of media, a sonic awareness of aesthetics and materiality, and creative approach towards communication. The audio paper is a performative format working together with an affective and elaborate understanding of language. It is an experiment embracing intellectual...... arguments and creative work, papers and performances, written scholarship and sonic aesthetics. For this special issue of Seismograf, the guidelines for authors and peer reviewers mainly focused on the format. Topic-wise we encouraged dealing with site-specificity and topics related to the island Amager...

  4. Morphological analysis of HI features - III. Metric space technique revisited

    Science.gov (United States)

    Robitaille, J.-F.; Joncas, G.; Khalil, A.

    2010-06-01

    This is the third paper on the morphological analysis of HI features. As in the first paper, we use the mathematical formalism of the metric space technique, developed by F. C. Adams and J. Wiseman, to quantify the complexity of 21-cm interstellar maps. This method compares the one-dimensional `output functions' of the maps, which characterize specific morphological and kinematical aspects of the maps. The HI feature catalogue, from our first paper, is increased from 28 to 51 features of known origin, such as star formation regions, the environment of Wolf-Rayet (WR) stars and supernova remnants. The maps come from the Canadian Galactic Plane Survey (CGPS), and have a resolution of 1 cos δ arcmin. Also, significant improvements are applied to the metric space technique. We present a new data reduction technique, new and improved `output functions' and a better characterization of the noise propagation and uncertainties in functions. We again look for correlations between the complexity of the HI features and other intrinsic aspects such as age, excitation parameter u, wind velocity, |z| and distance. Many interesting correlations are measured, for example: (i) more complex HI is associated with intense flux emission from star formation regions; (ii) the higher the wind velocity of the WR star, the more complex the HI topology; (iii) the higher the HI feature above the Galactic plane, the less complex its topology.

  5. The cross time and space features in remote sensing applications

    Science.gov (United States)

    Lu, J. X.; Song, W. L.; Qu, W.; Fu, J. E.; Pang, Z. G.

    2015-08-01

    Remote sensing is one subject of the modern geomatics, with a high priority for practical applications in which cross time and space analysis is one of its significant features. Object recognition and/or parameter retrieval are normally the first step in remote sensing applications, whereas cross time and space change analysis of those surface objects and/or parameters will make remote sensing applications more valuable. Based on a short review on the historic evolution of remote sensing and its current classification system, the cross time and space features commonly existing in remote sensing applications were discussed. The paper, aiming at improving remote sensing applications and promoting development of the remote sensing subject from a new vision, proposed a methodology based subject classification approach for remote sensing and then suggest to establish the theory of cross time and space remote sensing applications. The authors believe that such a new cross time and space concept meets the demand for new theories and new ideas from remote sensing subject and is of practical help to future remote sensing applications.

  6. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

    Science.gov (United States)

    Giannakopoulos, Theodoros

    2015-01-01

    Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

  7. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

    Directory of Open Access Journals (Sweden)

    Theodoros Giannakopoulos

    Full Text Available Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation, etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/. Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits. The feedback provided from all these particular audio applications has led to practical enhancement of the library.

  8. Deep Learning: Generalization Requires Deep Compositional Feature Space Design

    OpenAIRE

    Haloi, Mrinal

    2017-01-01

    Generalization error defines the discriminability and the representation power of a deep model. In this work, we claim that feature space design using deep compositional function plays a significant role in generalization along with explicit and implicit regularizations. Our claims are being established with several image classification experiments. We show that the information loss due to convolution and max pooling can be marginalized with the compositional design, improving generalization ...

  9. Invariant feature matching in parameter space with application to line features

    Science.gov (United States)

    Hecker, Y. C.; Bolle, Ruud M.

    1991-09-01

    This paper examines the combination of the Hough transform with geometric hashing as a technique for object recognition. Geometric hashing is a technique for fast indexing into object-model databases by creating multiple invariant indices from model features; yet its description applies to objects that are modeled by point sets. Extracting points locally from image data is a noise sensitive process, and the analysis of geometric hashing on point sets shown that it is very sensitive to noise. The use of the Hough transform as a first layer for extracting features imposes constraints on the image data, and in domains in which the constraints are appropriate, there is a significant reduction in noise effects on geometric hashing. The use of arbitrary primitive features in geometric hashing schemes also has other advantages. As a concrete example, experiments are performed with objects modeled by lines. The output of the line-Hough transform on intensity images is used to directly encode invariant geometric properties of shapes. points in Hough space that have high counts are combined to yield invariant geometric indices. Objects containing lines are modeled as a collection of points in dual space, and invariant indices in dual space are found by computing invariant dual space transformations. The combination of the Hough transform and geometric hashing is shown by experiments to be noise resistant and suitable for cluttered environments.

  10. A Method to Detect AAC Audio Forgery

    Directory of Open Access Journals (Sweden)

    Qingzhong Liu

    2015-08-01

    Full Text Available Advanced Audio Coding (AAC, a standardized lossy compression scheme for digital audio, which was designed to be the successor of the MP3 format, generally achieves better sound quality than MP3 at similar bit rates. While AAC is also the default or standard audio format for many devices and AAC audio files may be presented as important digital evidences, the authentication of the audio files is highly needed but relatively missing. In this paper, we propose a scheme to expose tampered AAC audio streams that are encoded at the same encoding bit-rate. Specifically, we design a shift-recompression based method to retrieve the differential features between the re-encoded audio stream at each shifting and original audio stream, learning classifier is employed to recognize different patterns of differential features of the doctored forgery files and original (untouched audio files. Experimental results show that our approach is very promising and effective to detect the forgery of the same encoding bit-rate on AAC audio streams. Our study also shows that shift recompression-based differential analysis is very effective for detection of the MP3 forgery at the same bit rate.

  11. Distributed Learning over Massive XML Documents in ELM Feature Space

    Directory of Open Access Journals (Sweden)

    Xin Bi

    2015-01-01

    Full Text Available With the exponentially increasing volume of XML data, centralized learning solutions are unable to meet the requirements of mining applications with massive training samples. In this paper, a solution to distributed learning over massive XML documents is proposed, which provides distributed conversion of XML documents into representation model in parallel based on MapReduce and a distributed learning component based on Extreme Learning Machine for mining tasks of classification or clustering. Within this framework, training samples are converted from raw XML datasets with better efficiency and information representation ability and taken to distributed learning algorithms in Extreme Learning Machine (ELM feature space. Extensive experiments are conducted on massive XML documents datasets to verify the effectiveness and efficiency for both classification and clustering applications.

  12. Digital audio watermarking fundamentals, techniques and challenges

    CERN Document Server

    Xiang, Yong; Yan, Bin

    2017-01-01

    This book offers comprehensive coverage on the most important aspects of audio watermarking, from classic techniques to the latest advances, from commonly investigated topics to emerging research subdomains, and from the research and development achievements to date, to current limitations, challenges, and future directions. It also addresses key topics such as reversible audio watermarking, audio watermarking with encryption, and imperceptibility control methods. The book sets itself apart from the existing literature in three main ways. Firstly, it not only reviews classical categories of audio watermarking techniques, but also provides detailed descriptions, analysis and experimental results of the latest work in each category. Secondly, it highlights the emerging research topic of reversible audio watermarking, including recent research trends, unique features, and the potentials of this subdomain. Lastly, the joint consideration of audio watermarking and encryption is also reviewed. With the help of this...

  13. Semantic Context Detection Using Audio Event Fusion

    Directory of Open Access Journals (Sweden)

    Cheng Wen-Huang

    2006-01-01

    Full Text Available Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model and discriminative (support vector machine (SVM approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

  14. Back to basics audio

    CERN Document Server

    Nathan, Julian

    1998-01-01

    Back to Basics Audio is a thorough, yet approachable handbook on audio electronics theory and equipment. The first part of the book discusses electrical and audio principles. Those principles form a basis for understanding the operation of equipment and systems, covered in the second section. Finally, the author addresses planning and installation of a home audio system.Julian Nathan joined the audio service and manufacturing industry in 1954 and moved into motion picture engineering and production in 1960. He installed and operated recording theaters in Sydney, Austra

  15. Reproducibility of MRI segmentation using a feature space method

    Science.gov (United States)

    Soltanian-Zadeh, Hamid; Windham, Joe P.; Scarpace, Lisa; Murnock, Tanya

    1998-06-01

    This paper presents reproducibility studies for the segmentation results obtained by our optimal MRI feature space method. The steps of the work accomplished are as follows. (1) Eleven patients with brain tumors were imaged by a 1.5 T General Electric Signa MRI System. Four T2- weighted and two T1-weighted images (before and after Gadolinium injection) were acquired for each patient. (2) Images of a slice through the center of the tumor were selected for processing. (3) Patient information was removed from the image headers and new names (unrecognizable by the image analysts) were given to the images. These images were blindly analyzed by the image analysts. (4) Segmentation results obtained by the two image analysts at two time points were compared to assess the reproducibility of the segmentation method. For each tissue segmented in each patient study, a comparison was done by kappa statistics and a similarity measure (an approximation of kappa statistics used by other researchers), to evaluate the number of pixels that were in both of the segmentation results obtained by the two image analysts (agreement) relative to the number of pixels that were not in both (disagreement). An overall agreement comparison was done by finding means and standard deviations of kappa statistics and the similarity measure found for each tissue type in the studies. The kappa statistics for white matter was the largest (0.80) followed by those of gray matter (0.68), partial volume (0.67), total lesion (0.66), and CSF (0.44). The similarity measure showed the same trend but it was always higher than kappa statistics. It was 0.85 for white matter, 0.77 for gray matter, 0.73 for partial volume, 0.72 for total lesion, and 0.47 for CSF.

  16. Detecting double compression of audio signal

    Science.gov (United States)

    Yang, Rui; Shi, Yun Q.; Huang, Jiwu

    2010-01-01

    MP3 is the most popular audio format nowadays in our daily life, for example music downloaded from the Internet and file saved in the digital recorder are often in MP3 format. However, low bitrate MP3s are often transcoded to high bitrate since high bitrate ones are of high commercial value. Also audio recording in digital recorder can be doctored easily by pervasive audio editing software. This paper presents two methods for the detection of double MP3 compression. The methods are essential for finding out fake-quality MP3 and audio forensics. The proposed methods use support vector machine classifiers with feature vectors formed by the distributions of the first digits of the quantized MDCT (modified discrete cosine transform) coefficients. Extensive experiments demonstrate the effectiveness of the proposed methods. To the best of our knowledge, this piece of work is the first one to detect double compression of audio signal.

  17. Frequency-Dependent Amplitude Panning for the Stereophonic Image Enhancement of Audio Recorded Using Two Closely Spaced Microphones

    Directory of Open Access Journals (Sweden)

    Chan Jun Chun

    2016-02-01

    Full Text Available In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak spatial information, such as the inter-channel time difference (ICTD and inter-channel level difference (ICLD. Moreover, when sound sources are recorded in a convolutive or a real room environment, the detection of sources is affected by reverberation effects. Thus, the proposed method first tries to estimate the source direction depending on the frequency using azimuth-frequency analysis. Then, a frequency-dependent amplitude panning technique is proposed to enhance the stereophonic image by modifying the stereophonic law of sines. To demonstrate the effectiveness of the proposed method, we compare its performance with that of a conventional method based on the beamforming technique in terms of directivity pattern, perceived direction, and quality degradation under three different recording conditions (anechoic, convolutive, and real reverberant. The comparison shows that the proposed method gives us better stereophonic images in a stereo loudspeaker reproduction than the conventional method without any annoying effects.

  18. Towards a Hybrid Audio Coder

    OpenAIRE

    Daudet, Laurent; Molla, Stéphane; Torrésani, Bruno

    2004-01-01

    International audience; The main features of a novel approach for audio signal encoding are described. The approach combines non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Essentially, several different components of the signal are estimated and transform coded using an appropriately chosen orthonormal basis. Different models and estimation procedures are discussed, and numerical results are provided.

  19. Robust AVS Audio Watermarking

    Science.gov (United States)

    Wang, Yong; Huang, Jiwu

    Part III of AVS(China Audio and Video Coding Standard) is the first standard for Hi-Fi audio proposed in China and is becoming more popular in some IT industries. For MP3 audio, some efforts have been made to solve the problems such as copyright pirating and malicious modifications by the way of watermarking. But till now little efforts have been made to solve the same problems for AVS audio. In this paper, we present a novel robust watermarking algorithm which can protect the AVS audio from the above problems. The watermark is embedded into the AVS compressed bit stream. At the extracting end, the watermark bits can be extracted from the compressed bit stream directly without any computation. This algorithm achieves robustness to decoding/recoding attacks, and low complexity of both embedding and extracting while preserves the quality of the audio signals.

  20. Detecting Image Splicing Using Merged Features in Chroma Space

    Directory of Open Access Journals (Sweden)

    Bo Xu

    2014-01-01

    Full Text Available Image splicing is an image editing method to copy a part of an image and paste it onto another image, and it is commonly followed by postprocessing such as local/global blurring, compression, and resizing. To detect this kind of forgery, the image rich models, a feature set successfully used in the steganalysis is evaluated on the splicing image dataset at first, and the dominant submodel is selected as the first kind of feature. The selected feature and the DCT Markov features are used together to detect splicing forgery in the chroma channel, which is convinced effective in splicing detection. The experimental results indicate that the proposed method can detect splicing forgeries with lower error rate compared to the previous literature.

  1. Principles of Audio Watermarking

    Directory of Open Access Journals (Sweden)

    Martin Hrncar

    2008-01-01

    Full Text Available The article contains a brief overview of modern methods for embedding additional data in audio signals. It could have many reasons - for the purposes of access control or identification related to particular type of audio. This secret information is not “visible” for a user. This concept utilizes the imperfection of human auditory system. Simple data hiding into audio file has been proved in MATLAB.

  2. Active Unsupervised Texture Segmentation on a Diffusion Based Feature Space

    OpenAIRE

    Rousson, Mikaël; Brox, Thomas; Deriche, Rachid

    2003-01-01

    In this report, we propose a novel and efficient approach for active unsurpervised texture segmentation. First, we show how we can extract a small set of good features for texture segmentation based on the structure tensor and nonlinear diffusion. Then, we propose a variational framework that allows to incorporate these features in a level set based unsupervised segmentation process that adaptively takes into account their estimated statistical information inside and outside the region to seg...

  3. Web Audio/Video Streaming Tool

    Science.gov (United States)

    Guruvadoo, Eranna K.

    2003-01-01

    In order to promote NASA-wide educational outreach program to educate and inform the public of space exploration, NASA, at Kennedy Space Center, is seeking efficient ways to add more contents to the web by streaming audio/video files. This project proposes a high level overview of a framework for the creation, management, and scheduling of audio/video assets over the web. To support short-term goals, the prototype of a web-based tool is designed and demonstrated to automate the process of streaming audio/video files. The tool provides web-enabled users interfaces to manage video assets, create publishable schedules of video assets for streaming, and schedule the streaming events. These operations are performed on user-defined and system-derived metadata of audio/video assets stored in a relational database while the assets reside on separate repository. The prototype tool is designed using ColdFusion 5.0.

  4. Histology image retrieval in optimised multi-feature spaces.

    Science.gov (United States)

    Zhang, Qianni; Izquierdo, Ebroul

    2013-01-01

    Content based histology image retrieval systems have shown great potential in supporting decision making in clinical activities, teaching, and biological research. In content based image retrieval, feature combination plays a key role. It aims at enhancing the descriptive power of visual features corresponding to semantically meaningful queries. It is particularly valuable in histology image analysis where intelligent mechanisms are needed for interpreting varying tissue composition and architecture into histological concepts. This paper presents an approach to automatically combine heterogeneous visual features for histology image retrieval. The aim is to obtain the most representative fusion model for a particular keyword that is associated to multiple query images. The core of this approach is a multi-objective learning method, which aims to understand an optimal visual-semantic matching function by jointly considering the different preferences of the group of query images. The task is posed as an optimisation problem, and a multi-objective optimisation strategy is employed in order to handle potential contradictions in the query images associated to the same keyword. Experiments were performed on two different collections of histology images. The results show that it is possible to improve a system for content based histology image retrieval by using an appropriately defined multi-feature fusion model, which takes careful consideration of the structure and distribution of visual features.

  5. Roundtable Audio Discussion

    Directory of Open Access Journals (Sweden)

    Chris Bigum

    2007-01-01

    Full Text Available RoundTable on Technology, Teaching and Tools. This is a roundtable audio interview conducted by James Farmer, founder of Edublogs, with Anne Bartlett-Bragg (University of Technology Sydney and Chris Bigum (Deakin University. Skype was used to make and record the audio conference and the resulting sound file was edited by Andrew McLauchlan.

  6. AUTOMATIC SEGMENTATION OF BROADCAST AUDIO SIGNALS USING AUTO ASSOCIATIVE NEURAL NETWORKS

    Directory of Open Access Journals (Sweden)

    P. Dhanalakshmi

    2010-12-01

    Full Text Available In this paper, we describe automatic segmentation methods for audio broadcast data. Today, digital audio applications are part of our everyday lives. Since there are more and more digital audio databases in place these days, the importance of effective management for audio databases have become prominent. Broadcast audio data is recorded from the Television which comprises of various categories of audio signals. Efficient algorithms for segmenting the audio broadcast data into predefined categories are proposed. Audio features namely Linear prediction coefficients (LPC, Linear prediction cepstral coefficients, and Mel frequency cepstral coefficients (MFCC are extracted to characterize the audio data. Auto Associative Neural Networks are used to segment the audio data into predefined categories using the extracted features. Experimental results indicate that the proposed algorithms can produce satisfactory results.

  7. Spatial audio quality perception (part 1)

    DEFF Research Database (Denmark)

    Conetta, R.; Brookes, T.; Rumsey, F.

    2015-01-01

    Spatial audio processes (SAPs) commonly encountered in consumer audio reproduction systems are known to produce a range of impairments to spatial quality. By way of two listening tests, this paper investigated the degree of degradation of the spatial quality of six 5-channel audio recordings...... resulting from 48 such SAPs. Perceived degradation also depends on the particular listeners, the program content, and the listening location. For example, combining off-center listener with another SAP can reduce spatial quality significantly when compared to listening to that SAP from a central location....... The choice of the SAP can have a large influence on the degree of degradation. Taken together these findings and the quality-annotated database can guide the development of a regression model of perceived overall spatial audio quality, incorporating previously developed spatially-relevant feature...

  8. EVALUASI KEPUASAN PENGGUNA TERHADAP APLIKASI AUDIO BOOKS

    Directory of Open Access Journals (Sweden)

    Raditya Maulana Anuraga

    2017-02-01

    Full Text Available Listeno is the first application audio books in Indonesia so that the users can get the book in audio form like listen to music, Listeno have problems in a feature request Listeno offline mode that have not been released, a security problem mp3 files that must be considered, and the target Listeno not yet reached 100,000 active users. This research has the objective to evaluate user satisfaction to Audio Books with research method approach, Nielsen. The analysis in this study using Importance Performance Analysis (IPA is combined with the index of User Satisfaction (IKP based on the indicators used are: Benefit (Usefulness, Utility (Utility, Usability (Usability, easy to understand (Learnability, Efficient (efficiency , Easy to remember (Memorability, Error (Error, and satisfaction (satisfaction. The results showed Applications User Satisfaction Audio books are quite satisfied with the results of the calculation IKP 69.58%..

  9. Watermarking-Based Digital Audio Data Authentication

    Directory of Open Access Journals (Sweden)

    Jana Dittmann

    2003-09-01

    Full Text Available Digital watermarking has become an accepted technology for enabling multimedia protection schemes. While most efforts concentrate on user authentication, recently interest in data authentication to ensure data integrity has been increasing. Existing concepts address mainly image data. Depending on the necessary security level and the sensitivity to detect changes in the media, we differentiate between fragile, semifragile, and content-fragile watermarking approaches for media authentication. Furthermore, invertible watermarking schemes exist while each bit change can be recognized by the watermark which can be extracted and the original data can be reproduced for high-security applications. Later approaches can be extended with cryptographic approaches like digital signatures. As we see from the literature, only few audio approaches exist and the audio domain requires additional strategies for time flow protection and resynchronization. To allow different security levels, we have to identify relevant audio features that can be used to determine content manipulations. Furthermore, in the field of invertible schemes, there are a bunch of publications for image and video data but no approaches for digital audio to ensure data authentication for high-security applications. In this paper, we introduce and evaluate two watermarking algorithms for digital audio data, addressing content integrity protection. In our first approach, we discuss possible features for a content-fragile watermarking scheme to allow several postproduction modifications. The second approach is designed for high-security applications to detect each bit change and reconstruct the original audio by introducing an invertible audio watermarking concept. Based on the invertible audio scheme, we combine digital signature schemes and digital watermarking to provide a public verifiable data authentication and a reproduction of the original, protected with a secret key.

  10. Robust speech recognition based on joint model and feature space optimization of hidden Markov models.

    Science.gov (United States)

    Moon, S; Hwang, J N

    1997-01-01

    The hidden Markov model (HMM) inversion algorithm, based on either the gradient search or the Baum-Welch reestimation of input speech features, is proposed and applied to the robust speech recognition tasks under general types of mismatch conditions. This algorithm stems from the gradient-based inversion algorithm of an artificial neural network (ANN) by viewing an HMM as a special type of ANN. Given input speech features s, the forward training of an HMM finds the model parameters lambda subject to an optimization criterion. On the other hand, the inversion of an HMM finds speech features, s, subject to an optimization criterion with given model parameters lambda. The gradient-based HMM inversion and the Baum-Welch HMM inversion algorithms can be successfully integrated with the model space optimization techniques, such as the robust MINIMAX technique, to compensate the mismatch in the joint model and feature space. The joint space mismatch compensation technique achieves better performance than the single space, i.e. either the model space or the feature space alone, mismatch compensation techniques. It is also demonstrated that approximately 10-dB signal-to-noise ratio (SNR) gain is obtained in the low SNR environments when the joint model and feature space mismatch compensation technique is used.

  11. Temporal processing of speech in a time-feature space

    Science.gov (United States)

    Avendano, Carlos

    1997-09-01

    The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of

  12. Audio-visual perception of new wind parks

    OpenAIRE

    Yu, T.; Behm, H.; Bill, R.; Kang, J.

    2017-01-01

    Previous studies have reported negative impacts of wind parks on the public. These studies considered the noise levels or visual levels separately but not audio-visual interactive factors. This study investigated the audio-visual impact of a new wind park using virtual technology that combined audio and visual features of the environment. Participants were immersed through Google Cardboard in an actual landscape without wind parks (ante operam) and in the same landscape with wind parks (post ...

  13. Structure Learning in Audio

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch

    By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach us......-Gaussian source distributions allowing a much wider use of the method. All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis....

  14. MedlinePlus FAQ: Is audio description available for videos on MedlinePlus?

    Science.gov (United States)

    ... audiodescription.html Question: Is audio description available for videos on MedlinePlus? To use the sharing features on ... page, please enable JavaScript. Answer: Audio description of videos helps make the content of videos accessible to ...

  15. Representing the Meanings of Object and Action Words: The Featural and Unitary Semantic Space Hypothesis

    Science.gov (United States)

    Vigliocco, Gabriella; Vinson, David P.; Lewis, William; Garrett, Merrill F.

    2004-01-01

    This paper presents the Featural and Unitary Semantic Space (FUSS) hypothesis of the meanings of object and action words. The hypothesis, implemented in a statistical model, is based on the following assumptions: First, it is assumed that the meanings of words are grounded in conceptual featural representations, some of which are organized…

  16. Feature recognition of metal salt spray corrosion based on color spaces statistics analysis

    Science.gov (United States)

    Zou, Zhi; Ma, Liqun; Fan, Qiuqin; Gan, Xiaochuan; Qiao, Lei

    2017-09-01

    The article proposed a method to quantify corrosion characteristics of high strength alloy steel samples using digital image processing technique in color spaces. The distribution histograms in different channels of different spaces in corrosion images are plotted and analyzed. Select the proper color channel to extract the corrosion characteristics among three different spaces of RGB space, HSV space, YCbCr space. Combined the theory of corrosion generation, the data of color channels is processed and the feature of metal material salt spray corrosion is recognized. Through processing several sample color images of alloy steel, it is proved that the feature extracted by this procedure has better accuracy and the corrosion degree is quantifiable and the precision of discriminating the corrosion is improved.

  17. An alternative to scale-space representation for extracting local features in image recognition

    DEFF Research Database (Denmark)

    Andersen, Hans Jørgen; Nguyen, Phuong Giang

    2012-01-01

    In image recognition, the common approach for extracting local features using a scale-space representation has usually three main steps; first interest points are extracted at different scales, next from a patch around each interest point the rotation is calculated with corresponding orientation...... and compensation, and finally a descriptor is computed for the derived patch (i.e. feature of the patch). To avoid the memory and computational intensive process of constructing the scale-space, we use a method where no scale-space is required This is done by dividing the given image into a number of triangles...

  18. Improving audio chord transcription by exploiting harmonic and metric knowledge

    NARCIS (Netherlands)

    de Haas, W.B.; Rodrigues Magalhães, J.P.; Wiering, F.

    2012-01-01

    We present a new system for chord transcription from polyphonic musical audio that uses domain-specific knowledge about tonal harmony and metrical position to improve chord transcription performance. Low-level pulse and spectral features are extracted from an audio source using the Vamp plugin

  19. An Efficient Audio Classification Approach Based on Support Vector Machines

    OpenAIRE

    Lhoucine Bahatti; Omar Bouattane; My Elhoussine Echhibat; Mohamed Hicham Zaggaf

    2016-01-01

    In order to achieve an audio classification aimed to identify the composer, the use of adequate and relevant features is important to improve performance especially when the classification algorithm is based on support vector machines. As opposed to conventional approaches that often use timbral features based on a time-frequency representation of the musical signal using constant window, this paper deals with a new audio classification method which improves the features extraction according ...

  20. Agency Video, Audio and Imagery Library

    Science.gov (United States)

    Grubbs, Rodney

    2015-01-01

    The purpose of this presentation was to inform the ISS International Partners of the new NASA Agency Video, Audio and Imagery Library (AVAIL) website. AVAIL is a new resource for the public to search for and download NASA-related imagery, and is not intended to replace the current process by which the International Partners receive their Space Station imagery products.

  1. DAFX Digital Audio Effects

    CERN Document Server

    2011-01-01

    The rapid development in various fields of Digital Audio Effects, or DAFX, has led to new algorithms and this second edition of the popular book, DAFX: Digital Audio Effects has been updated throughout to reflect progress in the field. It maintains a unique approach to DAFX with a lecture-style introduction into the basics of effect processing. Each effect description begins with the presentation of the physical and acoustical phenomena, an explanation of the signal processing techniques to achieve the effect, followed by a discussion of musical applications and the control of effect parameter

  2. Perceptual Audio Hashing Functions

    Directory of Open Access Journals (Sweden)

    Emin Anarım

    2005-07-01

    Full Text Available Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.

  3. 3D Point Correspondence by Minimum Description Length in Feature Space.

    Science.gov (United States)

    Chen, Jiun-Hung; Zheng, Ke Colin; Shapiro, Linda G

    2010-01-01

    Finding point correspondences plays an important role in automatically building statistical shape models from a training set of 3D surfaces. For the point correspondence problem, Davies et al. [1] proposed a minimum-description-length-based objective function to balance the training errors and generalization ability. A recent evaluation study [2] that compares several well-known 3D point correspondence methods for modeling purposes shows that the MDL-based approach [1] is the best method. We adapt the MDL-based objective function for a feature space that can exploit nonlinear properties in point correspondences, and propose an efficient optimization method to minimize the objective function directly in the feature space, given that the inner product of any vector pair can be computed in the feature space. We further employ a Mercer kernel [3] to define the feature space implicitly. A key aspect of our proposed framework is the generalization of the MDL-based objective function to kernel principal component analysis (KPCA) [4] spaces and the design of a gradient-descent approach to minimize such an objective function. We compare the generalized MDL objective function on KPCA spaces with the original one and evaluate their abilities in terms of reconstruction errors and specificity. From our experimental results on different sets of 3D shapes of human body organs, the proposed method performs significantly better than the original method.

  4. Large anterior temporal Virchow-Robin spaces: unique MR imaging features

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Anthony T. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Chandra, Ronil V. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Monash University, Department of Surgery, Faculty of Medicine, Nursing and Health Sciences, Melbourne (Australia); Trost, Nicholas M. [St Vincent' s Hospital, Neuroradiology Service, Melbourne (Australia); McKelvie, Penelope A. [St Vincent' s Hospital, Anatomical Pathology, Melbourne (Australia); Stuckey, Stephen L. [Monash University, Neuroradiology Service, Monash Imaging, Monash Health, Melbourne, Victoria (Australia); Monash University, Southern Clinical School, Faculty of Medicine, Nursing and Health Sciences, Melbourne (Australia)

    2015-05-01

    Large Virchow-Robin (VR) spaces may mimic cystic tumor. The anterior temporal subcortical white matter is a recently described preferential location, with only 18 reported cases. Our aim was to identify unique MR features that could increase prospective diagnostic confidence. Thirty-nine cases were identified between November 2003 and February 2014. Demographic, clinical data and the initial radiological report were retrospectively reviewed. Two neuroradiologists reviewed all MR imaging; a neuropathologist reviewed histological data. Median age was 58 years (range 24-86 years); the majority (69 %) was female. There were no clinical symptoms that could be directly referable to the lesion. Two thirds were considered to be VR spaces on the initial radiological report. Mean maximal size was 9 mm (range 5-17 mm); majority (79 %) had perilesional T2 or fluid-attenuated inversion recovery (FLAIR) hyperintensity. The following were identified as potential unique MR features: focal cortical distortion by an adjacent branch of the middle cerebral artery (92 %), smaller adjacent VR spaces (26 %), and a contiguous cerebrospinal fluid (CSF) intensity tract (21 %). Surgery was performed in three asymptomatic patients; histopathology confirmed VR spaces. Unique MR features were retrospectively identified in all three patients. Large anterior temporal lobe VR spaces commonly demonstrate perilesional T2 or FLAIR signal and can be misdiagnosed as cystic tumor. Potential unique MR features that could increase prospective diagnostic confidence include focal cortical distortion by an adjacent branch of the middle cerebral artery, smaller adjacent VR spaces, and a contiguous CSF intensity tract. (orig.)

  5. Structuring feature space: a non-parametric method for volumetric transfer function generation.

    Science.gov (United States)

    Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S

    2009-01-01

    The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial

  6. Enhancing Navigation Skills through Audio Gaming.

    Science.gov (United States)

    Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi

    2010-01-01

    We present the design, development and initial cognitive evaluation of an Audio-based Environment Simulator (AbES). This software allows a blind user to navigate through a virtual representation of a real space for the purposes of training orientation and mobility skills. Our findings indicate that users feel satisfied and self-confident when interacting with the audio-based interface, and the embedded sounds allow them to correctly orient themselves and navigate within the virtual world. Furthermore, users are able to transfer spatial information acquired through virtual interactions into real world navigation and problem solving tasks.

  7. Dynamic Bayesian Networks for Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Liang Luhong

    2002-01-01

    Full Text Available The use of visual features in audio-visual speech recognition (AVSR is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM and the factorial HMM (FHMM, and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

  8. Teledesic Global Wireless Broadband Network: Space Infrastructure Architecture, Design Features and Technologies

    Science.gov (United States)

    Stuart, James R.

    1995-01-01

    The Teledesic satellites are a new class of small satellites which demonstrate the important commercial benefits of using technologies developed for other purposes by U.S. National Laboratories. The Teledesic satellite architecture, subsystem design features, and new technologies are described. The new Teledesic satellite manufacturing, integration, and test approaches which use modern high volume production techniques and result in surprisingly low space segment costs are discussed. The constellation control and management features and attendant software architecture features are addressed. After briefly discussing the economic and technological impact on the USA commercial space industries of the space communications revolution and such large constellation projects, the paper concludes with observations on the trend toward future system architectures using networked groups of much smaller satellites.

  9. Editing Audio with Audacity

    Directory of Open Access Journals (Sweden)

    Brandon Walsh

    2016-08-01

    Full Text Available For those interested in audio, basic sound editing skills go a long way. Being able to handle and manipulate the materials can help you take control of your object of study: you can zoom in and extract particular moments to analyze, process the audio, and upload the materials to a server to compliment a blog post on the topic. On a more practical level, these skills could also allow you to record and package recordings of yourself or others for distribution. That guest lecture taking place in your department? Record it and edit it yourself! Doing so is a lightweight way to distribute resources among various institutions, and it also helps make the materials more accessible for readers and listeners with a wide variety of learning needs. In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files. Sound editing platforms are often expensive and offer extensive capabilities that can be overwhelming to the first-time user, but Audacity is a free and open source alternative that offers powerful capabilities for sound editing with a low barrier for entry. For this lesson we will work with two audio files: a recording of Bach’s Goldberg Variations available from MusOpen and another recording of your own voice that will be made in the course of the lesson. This tutorial uses Audacity 2.1.2, released January 2016.

  10. Portable Audio Design

    DEFF Research Database (Denmark)

    Groth, Sanne Krogh

    2014-01-01

    attention to the specific genre; a grasping of the complex relationship between site and time, the actual and the virtual; and getting aquatint with the specific site’s soundscape by approaching it both intuitively and systematically. These steps will finally lead to an audio production that not only...

  11. Audio Feedback -- Better Feedback?

    Science.gov (United States)

    Voelkel, Susanne; Mello, Luciane V.

    2014-01-01

    National Student Survey (NSS) results show that many students are dissatisfied with the amount and quality of feedback they get for their work. This study reports on two case studies in which we tried to address these issues by introducing audio feedback to one undergraduate (UG) and one postgraduate (PG) class, respectively. In case study one…

  12. Circuit Bodging : Audio Multiplexer

    NARCIS (Netherlands)

    Roeling, E.; Allen, B.

    2010-01-01

    Audio amplifiers usually come with a single, glaring design flaw: Not enough auxiliary inputs. Not only that, but you’re usually required to press a button to switch between the amplifier’s limited number of inputs. This is unacceptable - we have better things to do than change input channels! In

  13. The Space-Time Fractal Feature of Deformation at Convex Corner of Deep Foundation Pit

    Directory of Open Access Journals (Sweden)

    ZHAO Shun-li

    2016-03-01

    Full Text Available The study on the space-time feature of foundation pit deformation has important significance to ensure the stability of foundation pit engineering. In the present study, the relationship between the space-time fractal feature of foundation pit deformation and the stability of the foundation pit is expressed with simple indexes, such as the time and position of the maximum value of deformation. By combining the concrete engineering example, the fractal theory is introduced, and the correlation dimension is calculated with the measured deformation data for a period of time. By combining the concrete engineering example, the fractal theory was introduced, and used the correlation dimension calculated with the measured deformation data to analyze the space-time fractal feature of deformation at convex corner. Further researched on the relationship between the correlation dimension of the foundation pit deformation and the stability of foundation pit. The research showed that the correlation dimension could reveal the complex space-time feature of the foundation pit deformation. From the aspects of time, the correlation dimension is related to the foundation pit condition, construction disturbance, the change of supporting structure and so on, and has a certain degree of decline with time. From the aspects of space, the difference of correlation dimension between stable and unstable regions is relatively large while there is little difference in the stability region. With the correlation dimension, it is more easily to identify the stable and the unstable regions of the foundation pit, compared with the accumulated deformation.

  14. The method of narrow-band audio classification based on universal noise background model

    Science.gov (United States)

    Rui, Rui; Bao, Chang-chun

    2013-03-01

    Audio classification is the basis of content-based audio analysis and retrieval. The conventional classification methods mainly depend on feature extraction of audio clip, which certainly increase the time requirement for classification. An approach for classifying the narrow-band audio stream based on feature extraction of audio frame-level is presented in this paper. The audio signals are divided into speech, instrumental music, song with accompaniment and noise using the Gaussian mixture model (GMM). In order to satisfy the demand of actual environment changing, a universal noise background model (UNBM) for white noise, street noise, factory noise and car interior noise is built. In addition, three feature schemes are considered to optimize feature selection. The experimental results show that the proposed algorithm achieves a high accuracy for audio classification, especially under each noise background we used and keep the classification time less than one second.

  15. Digital-audio/MIDI sequencers

    National Research Council Canada - National Science Library

    Christopher Breen

    1998-01-01

    .... With these upgrades, both programs now support digital-audio fade and cross-fade transitions. If you are looking for the most complete MIDI/digital-audio solution right out of the box, consider Digital Performer...

  16. Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

    Science.gov (United States)

    Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

    2018-02-06

    Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. The audio expert everything you need to know about audio

    CERN Document Server

    Winer, Ethan

    2012-01-01

    The Audio Expert is a comprehensive reference that covers all aspects of audio, with many practical, as well as theoretical, explanations. Providing in-depth descriptions of how audio really works, using common sense plain-English explanations and mechanical analogies with minimal math, the book is written for people who want to understand audio at the deepest, most technical level, without needing an engineering degree. It's presented in an easy-to-read, conversational tone, and includes more than 400 figures and photos augmenting the text.The Audio Expert takes th

  18. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post......-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels...

  19. A systematic exploration of the micro-blog feature space for teens stress detection.

    Science.gov (United States)

    Zhao, Liang; Li, Qi; Xue, Yuanyuan; Jia, Jia; Feng, Ling

    2016-01-01

    In the modern stressful society, growing teenagers experience severe stress from different aspects from school to friends, from self-cognition to inter-personal relationship, which negatively influences their smooth and healthy development. Being timely and accurately aware of teenagers psychological stress and providing effective measures to help immature teenagers to cope with stress are highly valuable to both teenagers and human society. Previous work demonstrates the feasibility to sense teenagers' stress from their tweeting contents and context on the open social media platform-micro-blog. However, a tweet is still too short for teens to express their stressful status in a comprehensive way. Considering the topic continuity from the tweeting content to the follow-up comments and responses between the teenager and his/her friends, we combine the content of comments and responses under the tweet to supplement the tweet content. Also, such friends' caring comments like "what happened?", "Don't worry!", "Cheer up!", etc. provide hints to teenager's stressful status. Hence, in this paper, we propose to systematically explore the micro-blog feature space, comprised of four kinds of features [tweeting content features (FW), posting features (FP), interaction features (FI), and comment-response features (FC) between teenagers and friends] for teenager' stress category and stress level detection. We extract and analyze these feature values and their impacts on teens stress detection. We evaluate the framework through a real user study of 36 high school students aged 17. Different classifiers are employed to detect potential stress categories and corresponding stress levels. Experimental results show that all the features in the feature space positively affect stress detection, and linguistic negative emotion, proportion of negative sentences, friends' caring comments and teen's reply rate play more significant roles than the rest features. Micro-blog platform provides

  20. A linear feature space for simultaneous learning of spatio-spectral filters in BCI

    NARCIS (Netherlands)

    Farquhar, J.D.R.

    2009-01-01

    It is shown how two of the most common types of feature mapping used for classification of single trial Electroencephalography (EEG), i.e. spatial and frequency filtering, can be equivalently performed as linear operations in the space of frequency-specific detector covariance tensors. Thus by first

  1. Creating unreal audio

    OpenAIRE

    Rudsengen, Mathias Flaten

    2014-01-01

    Creating unreal audio” refers to the act of designing a sound effect that is intended to sound like a completely fictional object. This thesis is a practical venture into digital audio design. During the process of creating a sound effect anchored in a specific thematic framework, I will describe my work process and the challenges and problems faced, showing my personal work process and how modern digital sound effect creation can be undertaken. To provide context, I will also describe and re...

  2. Audio Spatial Representation Around the Body.

    Science.gov (United States)

    Aggius-Vella, Elena; Campus, Claudio; Finocchietti, Sara; Gori, Monica

    2017-01-01

    Studies have found that portions of space around our body are differently coded by our brain. Numerous works have investigated visual and auditory spatial representation, focusing mostly on the spatial representation of stimuli presented at head level, especially in the frontal space. Only few studies have investigated spatial representation around the entire body and its relationship with motor activity. Moreover, it is still not clear whether the space surrounding us is represented as a unitary dimension or whether it is split up into different portions, differently shaped by our senses and motor activity. To clarify these points, we investigated audio localization of dynamic and static sounds at different body levels. In order to understand the role of a motor action in auditory space representation, we asked subjects to localize sounds by pointing with the hand or the foot, or by giving a verbal answer. We found that the audio sound localization was different depending on the body part considered. Moreover, a different pattern of response was observed when subjects were asked to make actions with respect to the verbal responses. These results suggest that the audio space around our body is split in various spatial portions, which are perceived differently: front, back, around chest, and around foot, suggesting that these four areas could be differently modulated by our senses and our actions.

  3. High-Order Sparse Linear Predictors for Audio Processing

    DEFF Research Database (Denmark)

    Giacobello, Daniele; van Waterschoot, Toon; Christensen, Mads Græsbøll

    2010-01-01

    of interesting features that make the idea of using it in audio processing not far fetched, e.g., the strong ability of modeling the spectral peaks that play a dominant role in perception. In this paper, we provide some preliminary conjectures and experiments on the use of high-order sparse linear predictors......Linear prediction has generally failed to make a breakthrough in audio processing, as it has done in speech processing. This is mostly due to its poor modeling performance, since an audio signal is usually an ensemble of different sources. Nevertheless, linear prediction comes with a whole set...... in audio processing. These predictors, successfully implemented in modeling the short-term and long-term redundancies present in speech signals, will be used to model tonal audio signals, both monophonic and polyphonic. We will show how the sparse predictors are able to model efficiently the different...

  4. The Influences of Landscape Features on Visitation of Hospital Green Spaces-A Choice Experiment Approach.

    Science.gov (United States)

    Chang, Kaowen Grace; Chien, Hungju

    2017-07-05

    Studies have suggested that visiting and viewing landscaping at hospitals accelerates patient's recovery from surgery and help staff's recovery from mental fatigue. To plan and construct such landscapes, we need to unravel landscape features desirable to different groups so that the space can benefit a wide range of hospital users. Using discrete choice modeling, we developed experimental choice sets to investigate how landscape features influence the visitations of different users in a large regional hospital in Taiwan. The empirical survey provides quantitative estimates of the influence of each landscape feature on four user groups, including patients, caregivers, staff, and neighborhood residents. Our findings suggest that different types of features promote visits from specific user groups. Landscape features facilitating physical activities effectively encourage visits across user groups especially for caregivers and staff. Patients in this study specify a strong need for contact with nature. The nearby community favors the features designed for children's play and family activities. People across user groups value the features that provide a mitigated microclimate of comfort, such as a shelter. Study implications and limitations are also discussed. Our study provides information essential for creating a better healing environment in a hospital setting.

  5. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  6. Medical X-ray Image Hierarchical Classification Using a Merging and Splitting Scheme in Feature Space.

    Science.gov (United States)

    Fesharaki, Nooshin Jafari; Pourghassem, Hossein

    2013-07-01

    Due to the daily mass production and the widespread variation of medical X-ray images, it is necessary to classify these for searching and retrieving proposes, especially for content-based medical image retrieval systems. In this paper, a medical X-ray image hierarchical classification structure based on a novel merging and splitting scheme and using shape and texture features is proposed. In the first level of the proposed structure, to improve the classification performance, similar classes with regard to shape contents are grouped based on merging measures and shape features into the general overlapped classes. In the next levels of this structure, the overlapped classes split in smaller classes based on the classification performance of combination of shape and texture features or texture features only. Ultimately, in the last levels, this procedure is also continued forming all the classes, separately. Moreover, to optimize the feature vector in the proposed structure, we use orthogonal forward selection algorithm according to Mahalanobis class separability measure as a feature selection and reduction algorithm. In other words, according to the complexity and inter-class distance of each class, a sub-space of the feature space is selected in each level and then a supervised merging and splitting scheme is applied to form the hierarchical classification. The proposed structure is evaluated on a database consisting of 2158 medical X-ray images of 18 classes (IMAGECLEF 2005 database) and accuracy rate of 93.6% in the last level of the hierarchical structure for an 18-class classification problem is obtained.

  7. System requirements and design features of Space Station Remote Manipulator System mechanisms

    Science.gov (United States)

    Kumar, Rajnish; Hayes, Robert

    1991-01-01

    The Space Station Remote Manipulator System (SSRMS) is a long robotic arm for handling large objects/payloads on the International Space Station Freedom. The mechanical components of the SSRMS include seven joints, two latching end effectors (LEEs), and two boom assemblies. The joints and LEEs are complex aerospace mechanisms. The system requirements and design features of these mechanisms are presented. All seven joints of the SSRMS have identical functional performance. The two LEES are identical. This feature allows either end of the SSRMS to be used as tip or base. As compared to the end effector of the Shuttle Remote Manipulator System, the LEE has a latch and umbilical mechanism in addition to the snare and rigidize mechanisms. The latches increase the interface preload and allow large payloads (up to 116,000 Kg) to be handled. The umbilical connectors provide power, data, and video signal transfer capability to/from the SSRMS.

  8. Halftone information hiding technology based on phase feature of space filling curves

    Science.gov (United States)

    Hu, Jianhua; Cao, Peng; Dong, Zhihong; Cao, Xiaohe

    2017-08-01

    To solve the problems of the production of interference fringes (namely moiré in printing) and improve the image quality in printing process of halftone screening for information hiding, a halftone screening security technique based on the phase feature of space filling curves is studied in this paper. This method effectively solves the problem of moire and optimizes the quality of the screening, so that the images presented after screening have achieved good visual effect. The pseudo-random scrambling encryption of the plaintext information and the halftone screening technique based on the phase feature of the space filling curves are carried out when screening,which not only eliminates the common moire in the screening but also improves the image quality and the security of information.

  9. Quality Enhancement of Compressed Audio Based on Statistical Conversion

    Directory of Open Access Journals (Sweden)

    Chris Kyriakakis

    2008-07-01

    Full Text Available Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher quality home audio systems, it is becoming evident that higher bit rates may be required to maintain transparency. We propose a novel method that enhances low bit rate encoded audio segments by applying multiband audio resynthesis methods in a postprocessing stage. Our algorithm employs the highly flexible Generalized Gaussian mixture model which offers a more accurate representation of audio features than the Gaussian mixture model. A novel residual conversion technique is applied which proves to significantly improve the enhancement performance without excessive overhead. In addition, both cepstral and residual errors are dramatically decreased by a feature-alignment scheme that employs a sorting transformation. Some improvements regarding the quantization step are also described that enable us to further reduce the algorithm overhead. Signal enhancement examples are presented and the results show that the overhead size incurred by the algorithm is a fraction of the uncompressed signal size. Our results show that the resulting audio quality is comparable to that of a standard perceptual codec operating at approximately the same bit rate.

  10. Quality Enhancement of Compressed Audio Based on Statistical Conversion

    Directory of Open Access Journals (Sweden)

    Mouchtaris Athanasios

    2008-01-01

    Full Text Available Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher quality home audio systems, it is becoming evident that higher bit rates may be required to maintain transparency. We propose a novel method that enhances low bit rate encoded audio segments by applying multiband audio resynthesis methods in a postprocessing stage. Our algorithm employs the highly flexible Generalized Gaussian mixture model which offers a more accurate representation of audio features than the Gaussian mixture model. A novel residual conversion technique is applied which proves to significantly improve the enhancement performance without excessive overhead. In addition, both cepstral and residual errors are dramatically decreased by a feature-alignment scheme that employs a sorting transformation. Some improvements regarding the quantization step are also described that enable us to further reduce the algorithm overhead. Signal enhancement examples are presented and the results show that the overhead size incurred by the algorithm is a fraction of the uncompressed signal size. Our results show that the resulting audio quality is comparable to that of a standard perceptual codec operating at approximately the same bit rate.

  11. Small signal audio design

    CERN Document Server

    Self, Douglas

    2014-01-01

    Learn to use inexpensive and readily available parts to obtain state-of-the-art performance in all the vital parameters of noise, distortion, crosstalk and so on. With ample coverage of preamplifiers and mixers and a new chapter on headphone amplifiers, this practical handbook provides an extensive repertoire of circuits that can be put together to make almost any type of audio system.A resource packed full of valuable information, with virtually every page revealing nuggets of specialized knowledge not found elsewhere. Essential points of theory that bear on practical performance are lucidly

  12. Feature-space clustering for fMRI meta-analysis

    DEFF Research Database (Denmark)

    Goutte, C.; Hansen, L.K.; Liptrot, Matthew George

    2001-01-01

    Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability...... of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields fMRI...... sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single...

  13. Hiding Data in Audio Signal

    Science.gov (United States)

    Bhattacharyya, Debnath; Dutta, Poulami; Balitanas, Maricel O.; Kim, Tai-Hoon; Das, Purnendu

    This paper describes the LSB technique for secure data transfer. Secret information can be hidden inside all sorts of cover information: text, images, audio, video and more. Embedding secret messages in digital sound is usually a more difficult process. Varieties of techniques for embedding information in digital audio have been established. These are parity coding, phase coding, spread spectrum, echo hiding, LSB. Least significant bits (LSB) insertion is one of the simplest approaches to embedding information in audio file.

  14. Statistical Lip-Appearance Models Trained Automatically Using Audio Information

    Directory of Open Access Journals (Sweden)

    Daubias Philippe

    2002-01-01

    Full Text Available We aim at modeling the appearance of the lower face region to assist visual feature extraction for audio-visual speech processing applications. In this paper, we present a neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes. This model requires labeled examples to be trained, and we propose to label images automatically by employing a lip-shape model and a red-hue energy function. To improve the performance of lip-tracking, we propose to use blue marked-up image sequences of the same subject uttering the identical sentences as natural nonmarked-up ones. The easily extracted lip shapes from blue images are then mapped to the natural ones using acoustic information. The lip-shape estimates obtained simplify lip-tracking on the natural images, as they reduce the parameter space dimensionality in the red-hue energy minimization, thus yielding better contour shape and location estimates. We applied the proposed method to a small audio-visual database of three subjects, achieving errors in pixel classification around 6%, compared to 3% for hand-placed contours and 20% for filtered red-hue.

  15. Detection of Coronal Mass Ejections Using Multiple Features and Space-Time Continuity

    Science.gov (United States)

    Zhang, Ling; Yin, Jian-qin; Lin, Jia-ben; Feng, Zhi-quan; Zhou, Jin

    2017-07-01

    Coronal Mass Ejections (CMEs) release tremendous amounts of energy in the solar system, which has an impact on satellites, power facilities and wireless transmission. To effectively detect a CME in Large Angle Spectrometric Coronagraph (LASCO) C2 images, we propose a novel algorithm to locate the suspected CME regions, using the Extreme Learning Machine (ELM) method and taking into account the features of the grayscale and the texture. Furthermore, space-time continuity is used in the detection algorithm to exclude the false CME regions. The algorithm includes three steps: i) define the feature vector which contains textural and grayscale features of a running difference image; ii) design the detection algorithm based on the ELM method according to the feature vector; iii) improve the detection accuracy rate by using the decision rule of the space-time continuum. Experimental results show the efficiency and the superiority of the proposed algorithm in the detection of CMEs compared with other traditional methods. In addition, our algorithm is insensitive to most noise.

  16. Audio visual information fusion for human activity analysis

    OpenAIRE

    Thagadur Shivappa, Shankar

    2010-01-01

    Human activity analysis in unconstrained environments using far-field sensors is a challenging task. The fusion of audio and visual cues enables us to build robust and efficient human activity analysis systems. Traditional fusion schemes including feature-level, classifier-level and decision-level fusion have been explored in task- specific contexts to provide robustness to sensor and environmental noise. However, human activity analysis involves the extraction of information from audio and v...

  17. A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

    Directory of Open Access Journals (Sweden)

    Yongjun Piao

    2015-01-01

    Full Text Available Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.

  18. Nanoscale Analysis of Space-Weathering Features in Soils from Itokawa

    Science.gov (United States)

    Thompson, M. S.; Christoffersen, R.; Zega, T. J.; Keller, L. P.

    2014-01-01

    Space weathering alters the spectral properties of airless body surface materials by redden-ing and darkening their spectra and attenuating characteristic absorption bands, making it challenging to characterize them remotely [1,2]. It also causes a discrepency between laboratory analysis of meteorites and remotely sensed spectra from asteroids, making it difficult to associate meteorites with their parent bodies. The mechanisms driving space weathering include mi-crometeorite impacts and the interaction of surface materials with solar energetic ions, particularly the solar wind. These processes continuously alter the microchemical and structural characteristics of exposed grains on airless bodies. The change of these properties is caused predominantly by the vapor deposition of reduced Fe and FeS nanoparticles (npFe(sup 0) and npFeS respectively) onto the rims of surface grains [3]. Sample-based analysis of space weathering has tra-ditionally been limited to lunar soils and select asteroidal and lunar regolith breccias [3-5]. With the return of samples from the Hayabusa mission to asteroid Itoka-wa [6], for the first time we are able to compare space-weathering features on returned surface soils from a known asteroidal body. Analysis of these samples will contribute to a more comprehensive model for how space weathering varies across the inner solar system. Here we report detailed microchemical and microstructal analysis of surface grains from Itokawa.

  19. The Lowdown on Audio Downloads

    Science.gov (United States)

    Farrell, Beth

    2010-01-01

    First offered to public libraries in 2004, downloadable audiobooks have grown by leaps and bounds. According to the Audio Publishers Association, their sales today account for 21% of the spoken-word audio market. It hasn't been easy, however. WMA. DRM. MP3. AAC. File extensions small on letters but very big on consequences for librarians,…

  20. Efficient audio power amplification - challenges

    Energy Technology Data Exchange (ETDEWEB)

    Andersen, Michael A.E.

    2005-07-01

    For more than a decade efficient audio power amplification has evolved and today switch-mode audio power amplification in various forms are the state-of-the-art. The technical steps that lead to this evolution are described and in addition many of the challenges still to be faced and where extensive research and development are needed is covered. (au)

  1. Efficient Audio Power Amplification - Challenges

    DEFF Research Database (Denmark)

    Andersen, Michael Andreas E.

    2005-01-01

    For more than a decade efficient audio power amplification has evolved and today switch-mode audio power amplification in various forms are the state-of-the-art. The technical steps that lead to this evolution are described and in addition many of the challenges still to be faced and where...... extensive research and development are needed is covered....

  2. Strategies for Characterizing the Sensory Environment: Objective and Subjective Evaluation Methods using the VisiSonic Real Space 64/5 Audio-Visual Panoramic Camera

    Science.gov (United States)

    2017-11-01

    procedures for operation of the VRAP hardware and software by using an illustrative use case where the VRAP was deployed in the field. The use case...representativeness of a set of sounds in a given environment. The use case evaluates the impact of high- quality panoramic audiovisual captures of the...turned on, as depicted in Fig. 7. VisiSonics recommends enabling Lenovo Turbo Boost prior to running the RealSpace acquisition software . Turbo boost

  3. Beyond podcasting: creative approaches to designing educational audio

    Directory of Open Access Journals (Sweden)

    Andrew Middleton

    2009-12-01

    Full Text Available This paper discusses a university-wide pilot designed to encourage academics to creatively explore learner-centred applications for digital audio. Participation in the pilot was diverse in terms of technical competence, confidence and contextual requirements and there was little prior experience of working with digital audio. Many innovative approaches were taken to using audio in a blended context including student-generated vox pops, audio feedback models, audio conversations and task-setting. A podcast was central to the pilot itself, providing a common space for the 25 participants, who were also supported by materials in several other formats. An analysis of podcast interviews involving pilot participants provided the data informing this case study. This paper concludes that audio has the potential to promote academic creativity in engaging students through media intervention. However, institutional scalability is dependent upon the availability of suitable timely support mechanisms that can address the lack of technical confidence evident in many staff. If that is in place, audio can be widely adopted by anyone seeking to add a new layer of presence and connectivity through the use of voice.

  4. Between technical features and analytic capabilities: Charting a relational affordance space for digital social analytics

    Directory of Open Access Journals (Sweden)

    Anders Koed Madsen

    2015-01-01

    Full Text Available Digital social analytics is a subset of Big Data methods that is used to understand the social environment in which people and organizations have to act. This paper presents an analysis of eight projects that are experimenting with the use of these methods for various purposes. It shows that two specific technological features influence the work with such methods in all the cases. The first concerns the need to distribute choices about the structure of data to third-party actors and the second concerns the need to balance machine intelligence and human intuition when automating the analysis. These features set specific conditions for knowledge production, and the paper identifies two opposite approaches for engaging with each of these conditions. These features and approaches are finally combined into a two-dimensional affordance space that illustrates how there is flexibility in the way project leaders interact with the features of the data environment. It thereby also shows how digital social analytics come to have different affordances for different projects.

  5. On pose determination using point features. [vision system for space robots

    Science.gov (United States)

    Hwang, Vincent; Keizer, Richard; Winkert, Tom; Spidaliere, Peter

    1992-01-01

    Consideration is given to the vision subsystem of an Orbital Replacement Unit (ORU) that was placed onto its base at Goddard Space Flight Center by a PUMA 762 robot equipped with a wrist-mounted CCD camera and a wrist-mounted force sensor. It is found that a simple adaptive thresholding method works quite well for images taken under various lighting conditions. The pose computed using the quadrangle method is reasonable for real images. In the presence of image feature noise the accuracy of the computed pose can be considerably reduced. This problem can be solved by using a 3D marker and an alternative pose computation algorithm.

  6. Supervised pixel classification using a feature space derived from an artificial visual system

    Science.gov (United States)

    Baxter, Lisa C.; Coggins, James M.

    1991-01-01

    Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.

  7. New features of electron phase space holes observed by the THEMIS mission.

    Science.gov (United States)

    Andersson, L; Ergun, R E; Tao, J; Roux, A; Lecontel, O; Angelopoulos, V; Bonnell, J; McFadden, J P; Larson, D E; Eriksson, S; Johansson, T; Cully, C M; Newman, D L; Newman, D N; Goldman, M V; Glassmeier, K-H; Baumjohann, W

    2009-06-05

    Observations of electron phase-space holes (EHs) in Earth's plasma sheet by the THEMIS satellites include the first detection of a magnetic perturbation (deltaB_{ parallel}) parallel to the ambient magnetic field (B0). EHs with a detectable deltaB_{ parallel} have several distinguishing features including large electric field amplitudes, a magnetic perturbation perpendicular to B0, high speeds ( approximately 0.3c) along B0, and sizes along B0 of tens of Debye lengths. These EHs have a significant center potential (Phi approximately k_{B}T_{e}/e), suggesting strongly nonlinear behavior nearby such as double layers or magnetic reconnection.

  8. Advances in audio watermarking based on singular value decomposition

    CERN Document Server

    Dhar, Pranab Kumar

    2015-01-01

    This book introduces audio watermarking methods for copyright protection, which has drawn extensive attention for securing digital data from unauthorized copying. The book is divided into two parts. First, an audio watermarking method in discrete wavelet transform (DWT) and discrete cosine transform (DCT) domains using singular value decomposition (SVD) and quantization is introduced. This method is robust against various attacks and provides good imperceptible watermarked sounds. Then, an audio watermarking method in fast Fourier transform (FFT) domain using SVD and Cartesian-polar transformation (CPT) is presented. This method has high imperceptibility and high data payload and it provides good robustness against various attacks. These techniques allow media owners to protect copyright and to show authenticity and ownership of their material in a variety of applications.   ·         Features new methods of audio watermarking for copyright protection and ownership protection ·         Outl...

  9. Analysis of Audio Fingerprinting Techniques

    Science.gov (United States)

    Siva Sankaran, Satish Kumar

    The goal of this thesis is to compare various audio fingerprinting algorithms under a common framework. An audio fingerprint is a compact content-based signature that uniquely summarizes an audio recording. In this thesis, acoustic fingerprints are based on prominent peaks extracted from the spectrogram of the audio signal in question. A spectrogram is a visual representation of the spectrum of frequencies in an audio signal as it varies with time. Some of the applications of audio fingerprinting include but are not limited to music identification, advertisement detection, channel identification in TV and radio broadcasts. Currently, there are several fingerprinting techniques that employ different fingerprinting algorithms. However, there is no study or concrete proof that suggests one algorithm is better in comparison with the other algorithms. In this thesis, some of the feasible techniques employed in audio fingerprint extraction such as Same-Band Frequency analysis, Cross-Band Frequency analysis, use of Mel Frequency Banks, and use of Mel Frequency Cepstral Coefficients (MFCC) are analyzed and compared under the same framework.

  10. ENERGY STAR Certified Audio Video

    Data.gov (United States)

    U.S. Environmental Protection Agency — Certified models meet all ENERGY STAR requirements as listed in the Version 3.0 ENERGY STAR Program Requirements for Audio Video Equipment that are effective as of...

  11. A centralized audio presentation manager

    Energy Technology Data Exchange (ETDEWEB)

    Papp, A.L. III; Blattner, M.M.

    1994-05-16

    The centralized audio presentation manager addresses the problems which occur when multiple programs running simultaneously attempt to use the audio output of a computer system. Time dependence of sound means that certain auditory messages must be scheduled simultaneously, which can lead to perceptual problems due to psychoacoustic phenomena. Furthermore, the combination of speech and nonspeech audio is examined; each presents its own problems of perceptibility in an acoustic environment composed of multiple auditory streams. The centralized audio presentation manager receives abstract parameterized message requests from the currently running programs, and attempts to create and present a sonic representation in the most perceptible manner through the use of a theoretically and empirically designed rule set.

  12. WLAN Technologies for Audio Delivery

    Directory of Open Access Journals (Sweden)

    Nicolas-Alexander Tatlas

    2007-01-01

    Full Text Available Audio delivery and reproduction for home or professional applications may greatly benefit from the adoption of digital wireless local area network (WLAN technologies. The most challenging aspect of such integration relates the synchronized and robust real-time streaming of multiple audio channels to multipoint receivers, for example, wireless active speakers. Here, it is shown that current WLAN solutions are susceptible to transmission errors. A detailed study of the IEEE802.11e protocol (currently under ratification is also presented and all relevant distortions are assessed via an analytical and experimental methodology. A novel synchronization scheme is also introduced, allowing optimized playback for multiple receivers. The perceptual audio performance is assessed for both stereo and 5-channel applications based on either PCM or compressed audio signals.

  13. Tourism research and audio methods

    DEFF Research Database (Denmark)

    Jensen, Martin Trandberg

    2016-01-01

    Audio methods enriches sensuous tourism ethnographies. • The note suggests five research avenues for future auditory scholarship. • Sensuous tourism research has neglected the role of sounds in embodied tourism experiences.......• Audio methods enriches sensuous tourism ethnographies. • The note suggests five research avenues for future auditory scholarship. • Sensuous tourism research has neglected the role of sounds in embodied tourism experiences....

  14. Audio Steganography with Embedded Text

    Science.gov (United States)

    Teck Jian, Chua; Chai Wen, Chuah; Rahman, Nurul Hidayah Binti Ab.; Hamid, Isredza Rahmi Binti A.

    2017-08-01

    Audio steganography is about hiding the secret message into the audio. It is a technique uses to secure the transmission of secret information or hide their existence. It also may provide confidentiality to secret message if the message is encrypted. To date most of the steganography software such as Mp3Stego and DeepSound use block cipher such as Advanced Encryption Standard or Data Encryption Standard to encrypt the secret message. It is a good practice for security. However, the encrypted message may become too long to embed in audio and cause distortion of cover audio if the secret message is too long. Hence, there is a need to encrypt the message with stream cipher before embedding the message into the audio. This is because stream cipher provides bit by bit encryption meanwhile block cipher provide a fixed length of bits encryption which result a longer output compare to stream cipher. Hence, an audio steganography with embedding text with Rivest Cipher 4 encryption cipher is design, develop and test in this project.

  15. Complete fold annotation of the human proteome using a novel structural feature space.

    Science.gov (United States)

    Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  16. Complete fold annotation of the human proteome using a novel structural feature space

    Science.gov (United States)

    Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

    2017-04-01

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  17. Modeling Audio Fingerprints : Structure, Distortion, Capacity

    NARCIS (Netherlands)

    Doets, P.J.O.

    2010-01-01

    An audio fingerprint is a compact low-level representation of a multimedia signal. An audio fingerprint can be used to identify audio files or fragments in a reliable way. The use of audio fingerprints for identification consists of two phases. In the enrollment phase known content is fingerprinted,

  18. Linear sign in cystic brain lesions ≥5 mm: A suggestive feature of perivascular space.

    Science.gov (United States)

    Sung, Jinkyeong; Jang, Jinhee; Choi, Hyun Seok; Jung, So-Lyung; Ahn, Kook-Jin; Kim, Bum-Soo

    2017-11-01

    To determine the prevalence of a linear sign within enlarged perivascular space (EPVS) and chronic lacunar infarction (CLI) ≥ 5 mm on T2-weighted imaging (T2WI) and time-of-flight (TOF) magnetic resonance angiography (MRA), and to evaluate the diagnostic value of the linear signs for EPVS over CLI. This study included 101 patients with cystic lesions ≥ 5 mm on brain MRI including TOF MRA. After classification of cystic lesions into EPVS or CLI, two readers assessed linear signs on T2WI and TOF MRA. We compared the prevalence and the diagnostic performance of linear signs. Among 46 EPVS and 51 CLI, 84 lesions (86.6%) were in basal ganglia. The prevalence of T2 and TOF linear signs was significantly higher in the EPVS than in the CLI (P linear signs showed high sensitivity (> 80%). TOF linear sign showed significantly higher specificity (100%) and accuracy (92.8% and 90.7%) than T2 linear sign (P linear signs were more frequently observed in EPVS than CLI. They showed high sensitivity in differentiation of them, especially for basal ganglia. TOF sign showed higher specificity and accuracy than T2 sign. • Linear sign is a suggestive feature of EPVS. • Time-of-flight magnetic resonance angiography can reveal the lenticulostriate artery within perivascular spaces. • Linear sign helps differentiation of EPVS and CLI, especially in basal ganglia.

  19. [The new method monitoring crop water content based on NIR-Red spectrum feature space].

    Science.gov (United States)

    Cheng, Xiao-juan; Xu, Xin-gang; Chen, Tian-en; Yang, Gui-jun; Li, Zhen-hai

    2014-06-01

    Moisture content is an important index of crop water stress condition, timely and effective monitoring of crop water content is of great significance for evaluating crop water deficit balance and guiding agriculture irrigation. The present paper was trying to build a new crop water index for winter wheat vegetation water content based on NIR-Red spectral space. Firstly, canopy spectrums of winter wheat with narrow-band were resampled according to relative spectral response function of HJ-CCD and ZY-3. Then, a new index (PWI) was set up to estimate vegetation water content of winter wheat by improveing PDI (perpendicular drought index) and PVI (perpendicular vegetation index) based on NIR-Red spectral feature space. The results showed that the relationship between PWI and VWC (vegetation water content) was stable based on simulation of wide-band multispectral data HJ-CCD and ZY-3 with R2 being 0.684 and 0.683, respectively. And then VWC was estimated by using PWI with the R2 and RMSE being 0.764 and 0.764, 3.837% and 3.840%, respectively. The results indicated that PWI has certain feasibility to estimate crop water content. At the same time, it provides a new method for monitoring crop water content using remote sensing data HJ-CCD and ZY-3.

  20. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  1. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  2. Audiovisual laughter detection based on temporal features

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    2008-01-01

    Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audio-visual approach to distinguishing laugh- ter from speech based on temporal features and we show that integrating the information from audio and video chan- nels leads

  3. Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

    Directory of Open Access Journals (Sweden)

    Warrick eRoseboom

    2013-04-01

    Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  4. Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap.

    Science.gov (United States)

    Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya

    2013-01-01

    It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  5. WebGL and web audio software lightweight components for multimedia education

    Science.gov (United States)

    Chang, Xin; Yuksel, Kivanc; Skarbek, Władysław

    2017-08-01

    The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.

  6. Location audio simplified capturing your audio and your audience

    CERN Document Server

    Miles, Dean

    2014-01-01

    From the basics of using camera, handheld, lavalier, and shotgun microphones to camera calibration and mixer set-ups, Location Audio Simplified unlocks the secrets to clean and clear broadcast quality audio no matter what challenges you face. Author Dean Miles applies his twenty-plus years of experience as a professional location operator to teach the skills, techniques, tips, and secrets needed to produce high-quality production sound on location. Humorous and thoroughly practical, the book covers a wide array of topics, such as:* location selection* field mixing* boo

  7. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....

  8. Using Simplified Thermal Inertia to Determine the Theoretical Dry Line in Feature Space for Evapotranspiration Retrieval

    Directory of Open Access Journals (Sweden)

    Sujuan Mi

    2015-08-01

    Full Text Available With the development of quantitative remote sensing, regional evapotranspiration (ET modeling based on the feature space has made substantial progress. Among those feature space based evapotranspiration models, accurate determination of the dry/wet lines remains a challenging task. This paper reports the development of a new model, named DDTI (Determination of Dry line by Thermal Inertia, which determines the theoretical dry line based on the relationship between the thermal inertia and the soil moisture. The Simplified Thermal Inertia value estimated in the North China Plain is consistent with the value measured in the laboratory. Three evaluation methods, which are based on the comparison of the locations of the theoretical dry line determined by two models (DDTI model and the heat energy balance model, the comparison of ET results, and the comparison of the evaporative fraction between the estimates from the two models and the in situ measurements, were used to assess the performance of the new model DDTI. The location of the theoretical dry line determined by DDTI is more reasonable than that determined by the heat energy balance model. ET estimated from DDTI has an RMSE (Root Mean Square Error of 56.77 W/m2 and a bias of 27.17 W/m2; while the heat energy balance model estimated ET with an RMSE of 83.36 W/m2 and a bias of −38.42 W/m2. When comparing the coeffcient of determination for the two models with the observations from Yucheng, DDTI demonstrated ET with an R2 of 0.9065; while the heat energy balance model has an R2 of 0.7729. When compared with the in situ measurements of evaporative fraction (EF at Yucheng Experimental Station, the ET model based on DDTI reproduces the pixel scale EF with an RMSE of 0.149, much lower than that based on the heat energy balance model which has an RMSE of 0.220. Also, the EF bias between the DDTI model and the in situ measurements is 0.064, lower than the EF bias of the heat energy balance model

  9. Evaluation of Audio Compression Artifacts

    Directory of Open Access Journals (Sweden)

    M. Herrera Martinez

    2007-01-01

    Full Text Available This paper deals with subjective evaluation of audio-coding systems. From this evaluation, it is found that, depending on the type of signal and the algorithm of the audio-coding system, different types of audible errors arise. These errors are called coding artifacts. Although three kinds of artifacts are perceivable in the auditory domain, the author proposes that in the coding domain there is only one common cause for the appearance of the artifact, inefficient tracking of transient-stochastic signals. For this purpose, state-of-the art audio coding systems use a wide range of signal processing techniques, including application of the wavelet transform, which is described here. 

  10. Audio power amplifier design handbook

    CERN Document Server

    Self, Douglas

    2013-01-01

    This book is essential for audio power amplifier designers and engineers for one simple reason...it enables you as a professional to develop reliable, high-performance circuits. The Author Douglas Self covers the major issues of distortion and linearity, power supplies, overload, DC-protection and reactive loading. He also tackles unusual forms of compensation and distortion produced by capacitors and fuses. This completely updated fifth edition includes four NEW chapters including one on The XD Principle, invented by the author, and used by Cambridge Audio. Cro

  11. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

    NARCIS (Netherlands)

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal

  12. Towards a universal representation for audio information retrieval and analysis

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand; Troelsgaard, Rasmus; Larsen, Jan

    2013-01-01

    A fundamental and general representation of audio and music which integrates multi-modal data sources is important for both application and basic research purposes. In this paper we address this challenge by proposing a multi-modal version of the Latent Dirichlet Allocation model which provides...... a joint latent representation. We evaluate this representation on the Million Song Dataset by integrating three fundamentally different modalities, namely tags, lyrics, and audio features. We show how the resulting representation is aligned with common 'cognitive' variables such as tags, and provide some...

  13. Planning Schools for Use of Audio-Visual Materials. No. 3: The Audio-Visual Materials Center.

    Science.gov (United States)

    National Education Association, Washington, DC. Dept. of Audiovisual Instruction.

    This manual discusses the role, organizational patterns, expected services, and space and housing needs of the audio-visual instructional materials center. In considering the housing of basic functions, photographs, floor layouts, diagrams, and specifications of equipment are presented. An appendix includes a 77-item bibliography, a 7-page list of…

  14. Note-accurate audio segmentation based on MPEG-7

    Science.gov (United States)

    Wellhausen, Jens

    2003-12-01

    Segmenting audio data into the smallest musical components is the basis for many further meta data extraction algorithms. For example, an automatic music transcription system needs to know where the exact boundaries of each tone are. In this paper a note accurate audio segmentation algorithm based on MPEG-7 low level descriptors is introduced. For a reliable detection of different notes, both features in the time and the frequency domain are used. Because of this, polyphonic instrument mixes and even melodies characterized by human voices can be examined with this alogrithm. For testing and verification of the note accurate segmentation, a simple music transcription system was implemented. The dominant frequency within each segment is used to build a MIDI file representing the processed audio data.

  15. Audio signal recognition for speech, music, and environmental sounds

    Science.gov (United States)

    Ellis, Daniel P. W.

    2003-10-01

    Human listeners are very good at all kinds of sound detection and identification tasks, from understanding heavily accented speech to noticing a ringing phone underneath music playing at full blast. Efforts to duplicate these abilities on computer have been particularly intense in the area of speech recognition, and it is instructive to review which approaches have proved most powerful, and which major problems still remain. The features and models developed for speech have found applications in other audio recognition tasks, including musical signal analysis, and the problems of analyzing the general ``ambient'' audio that might be encountered by an auditorily endowed robot. This talk will briefly review statistical pattern recognition for audio signals, giving examples in several of these domains. Particular emphasis will be given to common aspects and lessons learned.

  16. The Audio-Visual Man.

    Science.gov (United States)

    Babin, Pierre, Ed.

    A series of twelve essays discuss the use of audiovisuals in religious education. The essays are divided into three sections: one which draws on the ideas of Marshall McLuhan and other educators to explore the newest ideas about audiovisual language and faith, one that describes how to learn and use the new language of audio and visual images, and…

  17. Audio-Visual Materials Catalog.

    Science.gov (United States)

    Anderson (M.D.) Hospital and Tumor Inst., Houston, TX.

    This catalog lists 27 audiovisual programs produced by the Department of Medical Communications of the University of Texas M. D. Anderson Hospital and Tumor Institute for public distribution. Video tapes, 16 mm. motion pictures and slide/audio series are presented dealing mostly with cancer and related subjects. The programs are intended for…

  18. Digital Augmented Reality Audio Headset

    Directory of Open Access Journals (Sweden)

    Jussi Rämö

    2012-01-01

    Full Text Available Augmented reality audio (ARA combines virtual sound sources with the real sonic environment of the user. An ARA system can be realized with a headset containing binaural microphones. Ideally, the ARA headset should be acoustically transparent, that is, it should not cause audible modification to the surrounding sound. A practical implementation of an ARA mixer requires a low-latency headphone reproduction system with additional equalization to compensate for the attenuation and the modified ear canal resonances caused by the headphones. This paper proposes digital IIR filters to realize the required equalization and evaluates a real-time prototype ARA system. Measurements show that the throughput latency of the digital prototype ARA system can be less than 1.4 ms, which is sufficiently small in practice. When the direct and processed sounds are combined in the ear, a comb filtering effect is brought about and appears as notches in the frequency response. The comb filter effect in speech and music signals was studied in a listening test and it was found to be inaudible when the attenuation is 20 dB. Insert ARA headphones have a sufficient attenuation at frequencies above about 1 kHz. The proposed digital ARA system enables several immersive audio applications, such as a virtual audio tourist guide and audio teleconferencing.

  19. Haptic and Audio Interaction Design

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 5th International Workshop on Haptic and Audio Interaction Design, HAID 2010 held in Copenhagen, Denmark, in September 2010. The 21 revised full papers presented were carefully reviewed and selected for inclusion in the book. The papers are or...

  20. Children's Use of Audio Media.

    Science.gov (United States)

    Christenson, Peter G.; And Others

    1985-01-01

    Summarizes current research on children's use of audio equipment and argues that records, radio, and tapes play an important role in the personal and social lives of many children. Examines issues and promising approaches in the study of listening in children's lives. (PD)

  1. Four-quadrant flyback converter for direct audio power amplification

    Energy Technology Data Exchange (ETDEWEB)

    Ljusev, P.; Andersen, Michael A.E.

    2005-07-01

    This paper presents a bidirectional, four-quadrant yback converter for use in direct audio power amplication. When compared to the standard Class-D switching-mode audio power amplier with separate power supply, the proposed four-quadrant flyback converter provides simple and compact solution with high efciency, higher level of integration, lower component count, less board space and eventually lower cost. Both peak and average current-mode control for use with 4Q flyback power converters are described and compared. Integrated magnetics is presented which simplies the construction of the auxiliary power supplies for control biasing and isolated gate drives. The feasibility of the approach is proven on audio power amplier prototype for subwoofer applications. (au)

  2. Audio watermark a comprehensive foundation using Matlab

    CERN Document Server

    Lin, Yiqing

    2015-01-01

    This book illustrates the commonly used and novel approaches of audio watermarking for copyrights protection. The author examines the theoretical and practical step by step guide to the topic of data hiding in audio signal such as music, speech, broadcast. The book covers new techniques developed by the authors are fully explained and MATLAB programs, for audio watermarking and audio quality assessments and also discusses methods for objectively predicting the perceptual quality of the watermarked audio signals. Explains the theoretical basics of the commonly used audio watermarking techniques Discusses the methods used to objectively and subjectively assess the quality of the audio signals Provides a comprehensive well tested MATLAB programs that can be used efficiently to watermark any audio media

  3. Comparative study of digital audio steganography techniques

    National Research Council Canada - National Science Library

    Djebbar, Fatiha; Ayad, Beghdad; Meraim, Karim Abed; Hamam, Habib

    2012-01-01

    .... We focus in this paper on digital audio steganography, which has emerged as a prominent source of data hiding across novel telecommunication technologies such as covered voice-over-IP, audio conferencing, etc...

  4. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    Science.gov (United States)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  5. Vascular lesions of the lumbar epidural space: magnetic resonance imaging features of epidural cavernous hemangioma and epidural hematoma

    Directory of Open Access Journals (Sweden)

    Basile Júnior Roberto

    1999-01-01

    Full Text Available The authors report the magnetic resonance imaging diagnostic features in two cases with respectively lumbar epidural hematoma and cavernous hemangioma of the lumbar epidural space. Enhanced MRI T1-weighted scans show a hyperintense signal rim surrounding the vascular lesion. Non-enhanced T2-weighted scans showed hyperintense signal.

  6. [Audio-visual aids and tropical medicine].

    Science.gov (United States)

    Morand, J J

    1989-01-01

    The author presents a list of the audio-visual productions about Tropical Medicine, as well as of their main characteristics. He thinks that the audio-visual educational productions are often dissociated from their promotion; therefore, he invites the future creator to forward his work to the Audio-Visual Health Committee.

  7. Audio-visual Materials and Rural Libraries

    Science.gov (United States)

    Escolar-Sobrino, Hipolito

    1972-01-01

    Audio-visual materials enlarge the educational work being done in the classroom and the library. This article examines the various types of audio-visual material and equipment and suggests ways in which audio-visual media can be used economically and efficiently in rural libraries. (Author)

  8. Audio Frequency Analysis in Mobile Phones

    Science.gov (United States)

    Aguilar, Horacio Munguía

    2016-01-01

    A new experiment using mobile phones is proposed in which its audio frequency response is analyzed using the audio port for inputting external signal and getting a measurable output. This experiment shows how the limited audio bandwidth used in mobile telephony is the main cause of the poor speech quality in this service. A brief discussion is…

  9. Bit rates in audio source coding

    NARCIS (Netherlands)

    Veldhuis, Raymond N.J.

    1992-01-01

    The goal is to introduce and solve the audio coding optimization problem. Psychoacoustic results such as masking and excitation pattern models are combined with results from rate distortion theory to formulate the audio coding optimization problem. The solution of the audio optimization problem is a

  10. Presence and the utility of audio spatialization

    DEFF Research Database (Denmark)

    Bormann, Karsten

    2005-01-01

    The primary concern of this paper is whether the utility of audio spatialization, as opposed to the fidelity of audio spatialization, impacts presence. An experiment is reported that investigates the presence-performance relationship by decoupling spatial audio fidelity (realism) from task...... or not, while the presence questionnaire used by Slater and coworkers (see Tromp et al., 1998) was more sensitive to whether audio was fully spatialized or not. Finally, having the sound source active positively impacts the assessment of the audio while negatively impacting subjects' assessment...

  11. REPRESENTING URBAN SPACE ACCORDING TO THE FEATURES OF THE IDEAL CITY

    Directory of Open Access Journals (Sweden)

    MARIA ELIZA DULAMĂ

    2012-01-01

    Full Text Available This study focused on high school students’ representing the real and ideal urban spaces on plans and it also focused on their representing of these spaces in texts. Students worked in groups and we presented their results: the city plans created for three ideal cities. We analysed the represented geographical elements, the functions of those cities, and the difficulties that students had in perceiving and representing geographical space.

  12. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    Science.gov (United States)

    Gebru, Israel; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2017-01-05

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semisupervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  13. Data Hiding Through Media Audio

    OpenAIRE

    Sumi Khairani

    2017-01-01

    Audio watermarking can use with various ways. Firstly, it have used for proving of ownership, production of information, copyright information in a form of a watermark, and it have routed directly in the recording. Specific owners have different insertion information. It can also be used for controlling access, watermark becomes a trigger to play music. Keeping track of unauthorized copies is a very important application. Personal information have inserted into the music. It used as numbers f...

  14. Using Touch Screen Audio-CASI to Obtain Data on Sensitive Topics.

    Science.gov (United States)

    Cooley, Philip C; Rogers, Susan M; Turner, Charles F; Al-Tayyib, Alia A; Willis, Gordon; Ganapathi, Laxminarayana

    2001-05-01

    This paper describes a new interview data collection system that uses a laptop personal computer equipped with a touch-sensitive video monitor. The touch-screen-based audio computer-assisted self-interviewing system, or touch screen audio-CASI, enhances the ease of use of conventional audio CASI systems while simultaneously providing the privacy of self-administered questionnaires. We describe touch screen audio-CASI design features and operational characteristics. In addition, we present data from a recent clinic-based experiment indicating that the touch audio-CASI system is stable, robust, and suitable for administering relatively long and complex questionnaires on sensitive topics, including drug use and sexual behaviors associated with HIV and other sexually transmitted diseases.

  15. Functionality of system components: Conservation of protein function in protein feature space

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Ussery, David; Brunak, Søren

    2003-01-01

    Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating...... they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform...... well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea...

  16. AudioRegent: Exploiting SimpleADL and SoX for Digital Audio Delivery

    Directory of Open Access Journals (Sweden)

    Nitin Arora

    2010-06-01

    Full Text Available AudioRegent is a command-line Python script currently being used by the University of Alabama Libraries’ Digital Services to create web-deliverable MP3s from regions within archival audio files. In conjunction with a small-footprint XML file called SimpleADL and SoX, an open-source command-line audio editor, AudioRegent batch processes archival audio files, allowing for one or many user-defined regions, particular to each audio file, to be extracted with additional audio processing in a transparent manner that leaves the archival audio file unaltered. Doing so has alleviated many of the tensions of cumbersome workflows, complicated documentation, preservation concerns, and reliance on expensive closed-source GUI audio applications.

  17. NFL Films audio, video, and film production facilities

    Science.gov (United States)

    Berger, Russ; Schrag, Richard C.; Ridings, Jason J.

    2003-04-01

    The new NFL Films 200,000 sq. ft. headquarters is home for the critically acclaimed film production that preserves the NFL's visual legacy week-to-week during the football season, and is also the technical plant that processes and archives football footage from the earliest recorded media to the current network broadcasts. No other company in the country shoots more film than NFL Films, and the inclusion of cutting-edge video and audio formats demands that their technical spaces continually integrate the latest in the ever-changing world of technology. This facility houses a staggering array of acoustically sensitive spaces where music and sound are equal partners with the visual medium. Over 90,000 sq. ft. of sound critical technical space is comprised of an array of sound stages, music scoring stages, audio control rooms, music writing rooms, recording studios, mixing theaters, video production control rooms, editing suites, and a screening theater. Every production control space in the building is designed to monitor and produce multi channel surround sound audio. An overview of the architectural and acoustical design challenges encountered for each sophisticated listening, recording, viewing, editing, and sound critical environment will be discussed.

  18. Probing features in inflaton potential and reionization history with future CMB space observations

    Science.gov (United States)

    Hazra, Dhiraj Kumar; Paoletti, Daniela; Ballardini, Mario; Finelli, Fabio; Shafieloo, Arman; Smoot, George F.; Starobinsky, Alexei A.

    2018-02-01

    We consider the prospects of probing features in the primordial power spectrum with future Cosmic Microwave Background (CMB) polarization measurements. In the scope of the inflationary scenario, such features in the spectrum can be produced by local non-smooth pieces in an inflaton potential (smooth and quasi-flat in general) which in turn may originate from fast phase transitions during inflation in other quantum fields interacting with the inflaton. They can fit some outliers in the CMB temperature power spectrum which are unaddressed within the standard inflationary ΛCDM model. We consider Wiggly Whipped Inflation (WWI) as a theoretical framework leading to improvements in the fit to the Planck 2015 temperature and polarization data in comparison with the standard inflationary models, although not at a statistically significant level. We show that some type of features in the potential within the WWI models, leading to oscillations in the primordial power spectrum that extend to intermediate and small scales can be constrained with high confidence (at 3σ or higher confidence level) by an instrument as the Cosmic ORigins Explorer (CORE). In order to investigate the possible confusion between inflationary features and footprints from the reionization era, we consider an extended reionization history with monotonic increase of free electrons with decrease in redshift. We discuss the present constraints on this model of extended reionization and future predictions with CORE. We also project, to what extent, this extended reionization can create confusion in identifying inflationary features in the data.

  19. Mining potential biomarkers associated with space flight in Caenorhabditis elegans experienced Shenzhou-8 mission with multiple feature selection techniques.

    Science.gov (United States)

    Zhao, Lei; Gao, Ying; Mi, Dong; Sun, Yeqing

    To identify the potential biomarkers associated with space flight, a combined algorithm, which integrates the feature selection techniques, was used to deal with the microarray datasets of Caenorhabditis elegans obtained in the Shenzhou-8 mission. Compared with the ground control treatment, a total of 86 differentially expressed (DE) genes in responses to space synthetic environment or space radiation environment were identified by two filter methods. And then the top 30 ranking genes were selected by the random forest algorithm. Gene Ontology annotation and functional enrichment analyses showed that these genes were mainly associated with metabolism process. Furthermore, clustering analysis showed that 17 genes among these are positive, including 9 for space synthetic environment and 8 for space radiation environment only. These genes could be used as the biomarkers to reflect the space environment stresses. In addition, we also found that microgravity is the main stress factor to change the expression patterns of biomarkers for the short-duration spaceflight. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Feature Space Dimensionality Reduction for Real-Time Vision-Based Food Inspection

    Directory of Open Access Journals (Sweden)

    Mai Moussa CHETIMA

    2009-03-01

    Full Text Available Machine vision solutions are becoming a standard for quality inspection in several manufacturing industries. In the processed-food industry where the appearance attributes of the product are essential to customer’s satisfaction, visual inspection can be reliably achieved with machine vision. But such systems often involve the extraction of a larger number of features than those actually needed to ensure proper quality control, making the process less efficient and difficult to tune. This work experiments with several feature selection techniques in order to reduce the number of attributes analyzed by a real-time vision-based food inspection system. Identifying and removing as much irrelevant and redundant information as possible reduces the dimensionality of the data and allows classification algorithms to operate faster. In some cases, accuracy on classification can even be improved. Filter-based and wrapper-based feature selectors are experimentally evaluated on different bakery products to identify the best performing approaches.

  1. Open spaces and urban form: interpreting features and conflict at Florianópolis (SC

    Directory of Open Access Journals (Sweden)

    Alina Gonçalves Santiago

    2014-06-01

    Full Text Available Florianópolis, according to the IBGE, is among the cities  of Santa Catarina that presents significant population growth. This, due to the migration of people from the interior regions of the state itself and as well as several parts of the country, attracted by the quality of life and employment opportunities arising from the existence of public institutions and the provision of goods and services. Fact confirmed with the increase of approximately 23% of the population in the last decade. Consequently, there has been urbanization processes that affect the morphological structure of the landscape, responsable for considerable spatial conflicts, specially on the system of open spaces. Therefore, this study aims to identify public and private spaces in the city according to the environmental laws of the Brazilian Forest Code and the Master Plan. In addition, space syntax studies were performed, so further information concerning the relations of the environment with the urban pattern and its degree of integration. Complementing the research, temporal series analysis were made identifying producer agents of private and public spaces over the past 75 years. Thus, it can be seen that urban growth, mainly due to property speculation and irregular subdivision process, are among the active agents on shaping the space and, consequently, as generators of conflict.

  2. Evolving artificial neural networks for cross-adaptive audio effects

    OpenAIRE

    Jordal, Iver

    2017-01-01

    Cross-adaptive audio effects have many applications within music technology, including for automatic mixing and live music. Commonly used methods of signal analysis capture the acoustical and mathematical features of the signal well, but struggle to capture the musical meaning. Together with the vast number of possible signal interactions, this makes manual exploration of signal interactions difficult and tedious. This project investigates Artificial Intelligence (AI) methods for finding usef...

  3. Audio-visual gender recognition

    Science.gov (United States)

    Liu, Ming; Xu, Xun; Huang, Thomas S.

    2007-11-01

    Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.

  4. FEATURES OF PSYCHOLOGICAL SPACE SOVEREIGNTY MAINTAINED BY PEOPLE WITH DIFFERENT ATTITUDE TO SOLITUDE

    Directory of Open Access Journals (Sweden)

    Nadezhda Alekseevna Garipova

    2017-06-01

    Practical implications. The results can be useful for developing psychocorrection sessions and trainings. The data can be helpful for specialists of Family Psychological Support centers and for instructors of “Ecological Psychology”, “Family Relations Psychology” disciplines. The study carried out is likely to be highly educational since many respondents participating in the survey admitted that they had never considered personal boundaries violation to be the reason for marital conflicts. They also lacked information concerning psychological space, how to regulate personal space boundaries and how to respond to other family members behavior in an adequate manner.

  5. Modified BTC Algorithm for Audio Signal Coding

    Directory of Open Access Journals (Sweden)

    TOMIC, S.

    2016-11-01

    Full Text Available This paper describes modification of a well-known image coding algorithm, named Block Truncation Coding (BTC and its application in audio signal coding. BTC algorithm was originally designed for black and white image coding. Since black and white images and audio signals have different statistical characteristics, the application of this image coding algorithm to audio signal presents a novelty and a challenge. Several implementation modifications are described in this paper, while the original idea of the algorithm is preserved. The main modifications are performed in the area of signal quantization, by designing more adequate quantizers for audio signal processing. The result is a novel audio coding algorithm, whose performance is presented and analyzed in this research. The performance analysis indicates that this novel algorithm can be successfully applied in audio signal coding.

  6. Local Control of Audio Environment: A Review of Methods and Applications

    Directory of Open Access Journals (Sweden)

    Jussi Kuutti

    2014-02-01

    Full Text Available The concept of a local audio environment is to have sound playback locally restricted such that, ideally, adjacent regions of an indoor or outdoor space could exhibit their own individual audio content without interfering with each other. This would enable people to listen to their content of choice without disturbing others next to them, yet, without any headphones to block conversation. In practice, perfect sound containment in free air cannot be attained, but a local audio environment can still be satisfactorily approximated using directional speakers. Directional speakers may be based on regular audible frequencies or they may employ modulated ultrasound. Planar, parabolic, and array form factors are commonly used. The directivity of a speaker improves as its surface area and sound frequency increases, making these the main design factors for directional audio systems. Even directional speakers radiate some sound outside the main beam, and sound can also reflect from objects. Therefore, directional speaker systems perform best when there is enough ambient noise to mask the leaking sound. Possible areas of application for local audio include information and advertisement audio feed in commercial facilities, guiding and narration in museums and exhibitions, office space personalization, control room messaging, rehabilitation environments, and entertainment audio systems.

  7. MODIS: an audio motif discovery software

    OpenAIRE

    Catanese, Laurence; Souviraà-Labastie, Nathan; Qu, Bingqing; Campion, Sébastien; Gravier, Guillaume; Vincent, Emmanuel; Bimbot, Frédéric

    2013-01-01

    International audience; MODIS is a free speech and audio motif discovery software developed at IRISA Rennes. Motif discovery is the task of discovering and collecting occurrences of repeating patterns in the absence of prior knowledge, or training material. MODIS is based on a generic approach to mine repeating audio sequences, with tolerance to motif variability. The algorithm implementation allows to process large audio streams at a reasonable speed where motif discovery often requires huge...

  8. Making the Switch to Digital Audio

    Directory of Open Access Journals (Sweden)

    Shannon Gwin Mitchell

    2004-12-01

    Full Text Available In this article, the authors describe the process of converting from analog to digital audio data. They address the step-by-step decisions that they made in selecting hardware and software for recording and converting digital audio, issues of system integration, and cost considerations. The authors present a brief description of how digital audio is being used in their current research project and how it has enhanced the “quality” of their qualitative research.

  9. Non-retinotopic feature processing in the absence of retinotopic spatial layout and the construction of perceptual space from motion.

    Science.gov (United States)

    Ağaoğlu, Mehmet N; Herzog, Michael H; Oğmen, Haluk

    2012-10-15

    The spatial representation of a visual scene in the early visual system is well known. The optics of the eye map the three-dimensional environment onto two-dimensional images on the retina. These retinotopic representations are preserved in the early visual system. Retinotopic representations and processing are among the most prevalent concepts in visual neuroscience. However, it has long been known that a retinotopic representation of the stimulus is neither sufficient nor necessary for perception. Saccadic Stimulus Presentation Paradigm and the Ternus-Pikler displays have been used to investigate non-retinotopic processes with and without eye movements, respectively. However, neither of these paradigms eliminates the retinotopic representation of the spatial layout of the stimulus. Here, we investigated how stimulus features are processed in the absence of a retinotopic layout and in the presence of retinotopic conflict. We used anorthoscopic viewing (slit viewing) and pitted a retinotopic feature-processing hypothesis against a non-retinotopic feature-processing hypothesis. Our results support the predictions of the non-retinotopic feature-processing hypothesis and demonstrate the ability of the visual system to operate non-retinotopically at a fine feature processing level in the absence of a retinotopic spatial layout. Our results suggest that perceptual space is actively constructed from the perceptual dimension of motion. The implications of these findings for normal ecological viewing conditions are discussed. 2012 Elsevier Ltd. All rights reserved

  10. The effects of hearing protectors on auditory localization: evidence from audio-visual target acquisition.

    Science.gov (United States)

    Bolia, R S; McKinley, R L

    2000-01-01

    Response times (RT) in an audio-visual target acquisition task were collected from 3 participants while wearing either circumaural earmuffs, foam earplugs, or no hearing protection. Analyses revealed that participants took significantly longer to locate and identify an audio-visual target in both hearing protector conditions than they did in the unoccluded condition, suggesting a disturbance of the cues used by listeners to localize sounds in space. RTs were significantly faster in both hearing protector conditions than in a non-audio control condition, indicating that auditory localization was not completely disrupted. Results are discussed in terms of safety issues involved with wearing hearing protectors in an occupational environment.

  11. Language Transfer of Audio Word2Vec: Learning Audio Segment Representations without Target Language Data

    OpenAIRE

    Shen, Chia-Hao; Sung, Janet Y.; Lee, Hung-yi

    2017-01-01

    Audio Word2Vec offers vector representations of fixed dimensionality for variable-length audio segments using Sequence-to-sequence Autoencoder (SA). These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with real world applications such as query-by-example Spoken Term Detection (STD). This paper examines the capability of language transfer of Audio Word2Vec. We train SA from one language (source language) and use it to ex...

  12. A study of some features of ac and dc electric power systems for a space station

    Science.gov (United States)

    Hanania, J. I.

    1983-01-01

    This study analyzes certain selected topics in rival dc and high frequency ac electric power systems for a Space Station. The interaction between the Space Station and the plasma environment is analyzed, leading to a limit on the voltage for the solar array and a potential problem with resonance coupling at high frequencies. Certain problems are pointed out in the concept of a rotary transformer, and further development work is indicated in connection with dc circuit switching, special design of a transmission conductor for the ac system, and electric motors. The question of electric shock hazards, particularly at high frequency, is also explored. and a problem with reduced skin resistance and therefore increased hazard with high frequency ac is pointed out. The study concludes with a comparison of the main advantages and disadvantages of the two rival systems, and it is suggested that the choice between the two should be made after further studies and development work are completed.

  13. Features of Virchow-Robin spaces in newly diagnosed multiple sclerosis patients

    Energy Technology Data Exchange (ETDEWEB)

    Etemadifar, Masoud [Department of Clinical and Biological Sciences, Division of Neurology, San Luigi Gonzaga School of Medicine, Orbassano (Torino), Turin (Italy); Department of Neurology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Isfahan Research Committee of Multiple Sclerosis (IRCOMS), Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Hekmatnia, Ali; Tayari, Nazila [Department of Radiology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Kazemi, Mojtaba [Department of Neurology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Ghazavi, Amirhossein [Department of Radiology, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Akbari, Mojtaba [Department of Epidemiology and Statistics, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Maghzi, Amir-Hadi, E-mail: maghzi@edc.mui.ac.ir [Isfahan Research Committee of Multiple Sclerosis (IRCOMS), Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of); Neuroimmunology Unit, Centre for Neuroscience and Trauma, Blizard Institute of Cell and Molecular Science, Barts and the London School of Medicine and Dentistry, London (United Kingdom); Isfahan Neurosciences Research Center, Isfahan University of Medical Sciences, Isfahan (Iran, Islamic Republic of)

    2011-11-15

    Background: Virchow-Robin spaces (VRSs) are perivascular pia-lined extensions of the subarachnoid space around the arteries and veins as they enter the brain parenchyma. These spaces are responsible for inflammatory processes within the brain. Objectives: This study was designed to shed more light on the location, size and shape of VRSs on 3 mm slice thickness, 1.5 Tesla MRI scans of newly diagnosed MS patients in Isfahan, Iran and compare the results with healthy age- and sex-matched controls. Methods: We evaluated MRI scans of 73 MS patients obtained within 3 months of MS onset and compared them with MRI scans from 73 age- and sex-matched healthy volunteers. Three mm section proton density, T2W and FLAIR MR images were obtained for all subjects. The location, size and shape of VRSs were compared between the two groups. Results: The total number of VRSs was significantly more in the MS group (p < 0.001). The distribution of VRSs were significantly more located in the high convexity areas in the MS group (p < 0.001), while there was no significant differences in other regions. The round shaped VRSs were significantly more detected on MRI scans of MS patients, and curvilinear shapes were significantly more frequently observed in healthy volunteers, however there were no significant differences for oval shaped VRSs between the two groups. The number of VRSs with the size over than 2 mm were significantly more observed in the MS groups compared to controls. We also observed some differences in the characteristics of VRSs between the genders in the MS group. Conclusion: The results of this study shed more light on the usefulness of VRSs as an MRI marker for the disease. In addition, according to our results VRSs might also have implication to determine the prognosis of the disease. However, larger studies with more advanced MRI techniques are required to confirm our results.

  14. A Visual Analytics Approach Using the Exploration of Multidimensional Feature Spaces for Content-Based Medical Image Retrieval.

    Science.gov (United States)

    Kumar, Ashnil; Nette, Falk; Klein, Karsten; Fulham, Michael; Kim, Jinman

    2015-09-01

    Content-based image retrieval (CBIR) is a search technique based on the similarity of visual features and has demonstrated potential benefits for medical diagnosis, education, and research. However, clinical adoption of CBIR is partially hindered by the difference between the computed image similarity and the user's search intent, the semantic gap, with the end result that relevant images with outlier features may not be retrieved. Furthermore, most CBIR algorithms do not provide intuitive explanations as to why the retrieved images were considered similar to the query (e.g., which subset of features were similar), hence, it is difficult for users to verify if relevant images, with a small subset of outlier features, were missed. Users, therefore, resort to examining irrelevant images and there are limited opportunities to discover these "missed" images. In this paper, we propose a new approach to medical CBIR by enabling a guided visual exploration of the search space through a tool, called visual analytics for medical image retrieval (VAMIR). The visual analytics approach facilitates interactive exploration of the entire dataset using the query image as a point-of-reference. We conducted a user study and several case studies to demonstrate the capabilities of VAMIR in the retrieval of computed tomography images and multimodality positron emission tomography and computed tomography images.

  15. Audio-Tutorial Programming with Exceptional Children

    Science.gov (United States)

    Hofmeister, Alan

    1973-01-01

    The findings from the application of audio-tutorial programing in three curriculum areas with three groups of exceptional children are reported. The findings suggest that audio-tutorial programing has qualities capable of meeting some of the instructional needs of exceptional children. (Author)

  16. A listening test system for automotive audio

    DEFF Research Database (Denmark)

    Christensen, Flemming; Geoff, Martin; Minnaar, Pauli

    2005-01-01

    This paper describes a system for simulating automotive audio through headphones for the purposes of conducting listening experiments in the laboratory. The system is based on binaural technology and consists of a component for reproducing the sound of the audio system itself and a component...

  17. Audio-Visual Technician | IDRC - International Development ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Controls the inventory of portable audio-visual equipment and mobile telephones within IDRC's loans library. Delivers, installs, uninstalls and removes equipment reserved by IDRC staff through the automated booking system. Participates in the planning process for upgrade and /or acquisition of new audio-visual ...

  18. Audio-Tutorial Instruction in Medicine.

    Science.gov (United States)

    Boyle, Gloria J.; Herrick, Merlyn C.

    This progress report concerns an audio-tutorial approach used at the University of Missouri-Columbia School of Medicine. Instructional techniques such as slide-tape presentations, compressed speech audio tapes, computer-assisted instruction (CAI), motion pictures, television, microfiche, and graphic and printed materials have been implemented,…

  19. Evaluation of the Audio Bracelet for Blind Interaction for improving mobility and spatial cognition in early blind children - A pilot study.

    Science.gov (United States)

    Finocchietti, Sara; Cappagli, Giulia; Ben Porquis, Lope; Baud-Bovy, Gabriel; Cocchi, Elena; Gori, Monica

    2015-08-01

    This study was designed to assess the effectiveness of the Audio Bracelet for Blind Interaction (ABBI) system for improving mobility and spatial cognition in visually impaired children. The bracelet is worn on the wrist and the key feature is to provide an audio feedback about body movements to help visually impaired children to build a sense of space. Nine early blind children took part at this study. The study lasted 12 weeks. Once per week each child participated in a 45-minutes ABBI rehabilitation with trained professionals. He also had to use it one hour per day at home alone or with one relative. The mobility and spatial cognition abilities were measured before and after a 12-weeks rehabilitation program with three different tests. Results showed that the use of the Audio Bracelet for Blind Interaction allowed the early blind children to significantly improve their mobility and spatial abilities. Although an extended study including a larger number of participants is needed to confirm these data, the present results are encouraging. They do suggest that ABBI could be used for rehabilitate the sense of space in visually impaired children.

  20. Visualization and clustering of sleep states in a frequency domain feature space.

    Science.gov (United States)

    Vivaldi, Ennio A; Bassi, Alejandro; Diaz, Javier; Duque, Natalia

    2010-01-01

    Sleep studies assess the recurrent manifestation of stereotype configurations of relevant biosignals. These configurations are known as states (Wake, REM sleep and NonREM sleep) and stages (N1-N3 within NREM sleep). These two fundamental descriptive domains, time course and variable configuration, can be readily rendered available through improved visualization techniques. Time course is summarized by EEG spectrograms, instantaneous frequency analysis of cardio-respiratory signals and other sleep dependent variables. State and stage configurations can be further evidenced as clusters in 2D or 3D spaces whose axis are sleep-relevant extracted variables. The latter techniques also allows for visualization of transition process as pathways from one cluster to another.

  1. “Wrapping” X3DOM around Web Audio API

    Directory of Open Access Journals (Sweden)

    Andreas Stamoulias

    2015-12-01

    Full Text Available Spatial sound has a conceptual role in the Web3D environments, due to highly realism scenes that can provide. Lately the efforts are concentrated on the extension of the X3D/ X3DOM through spatial sound attributes. This paper presents a novel method for the introduction of spatial sound components in the X3DOM framework, based on X3D specification and Web Audio API. The proposed method incorporates the introduction of enhanced sound nodes for X3DOM which are derived by the implementation of the X3D standard components, enriched with accessional features of Web Audio API. Moreover, several examples-scenarios developed for the evaluation of our approach. The implemented examples established the achievability of new registered nodes in X3DOM, for spatial sound characteristics in Web3D virtual worlds.

  2. Digital signal processor for silicon audio playback devices; Silicon audio saisei kikiyo digital signal processor

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-03-01

    The digital audio signal processor (DSP) TC9446F series has been developed silicon audio playback devices with a memory medium of, e.g., flash memory, DVD players, and AV devices, e.g., TV sets. It corresponds to AAC (advanced audio coding) (2ch) and MP3 (MPEG1 Layer3), as the audio compressing techniques being used for transmitting music through an internet. It also corresponds to compressed types, e.g., Dolby Digital, DTS (digital theater system) and MPEG2 audio, being adopted for, e.g., DVDs. It can carry a built-in audio signal processing program, e.g., Dolby ProLogic, equalizer, sound field controlling, and 3D sound. TC9446XB has been lined up anew. It adopts an FBGA (fine pitch ball grid array) package for portable audio devices. (translated by NEDO)

  3. Linear sign in cystic brain lesions ≥5 mm. A suggestive feature of perivascular space

    Energy Technology Data Exchange (ETDEWEB)

    Sung, Jinkyeong [The Catholic University of Korea, Department of Radiology, Seoul St. Mary' s Hospital, College of Medicine, Seoul (Korea, Republic of); The Catholic University of Korea, Department of Radiology, St. Vincent' s Hospital, College of Medicine, Seoul (Korea, Republic of); Jang, Jinhee; Choi, Hyun Seok; Jung, So-Lyung; Ahn, Kook-Jin; Kim, Bum-soo [The Catholic University of Korea, Department of Radiology, Seoul St. Mary' s Hospital, College of Medicine, Seoul (Korea, Republic of)

    2017-11-15

    To determine the prevalence of a linear sign within enlarged perivascular space (EPVS) and chronic lacunar infarction (CLI) ≥ 5 mm on T2-weighted imaging (T2WI) and time-of-flight (TOF) magnetic resonance angiography (MRA), and to evaluate the diagnostic value of the linear signs for EPVS over CLI. This study included 101 patients with cystic lesions ≥ 5 mm on brain MRI including TOF MRA. After classification of cystic lesions into EPVS or CLI, two readers assessed linear signs on T2WI and TOF MRA. We compared the prevalence and the diagnostic performance of linear signs. Among 46 EPVS and 51 CLI, 84 lesions (86.6%) were in basal ganglia. The prevalence of T2 and TOF linear signs was significantly higher in the EPVS than in the CLI (P <.001). For the diagnosis of EPVS, T2 and TOF linear signs showed high sensitivity (> 80%). TOF linear sign showed significantly higher specificity (100%) and accuracy (92.8% and 90.7%) than T2 linear sign (P <.001). T2 and TOF linear signs were more frequently observed in EPVS than CLI. They showed high sensitivity in differentiation of them, especially for basal ganglia. TOF sign showed higher specificity and accuracy than T2 sign. (orig.)

  4. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space.

    Science.gov (United States)

    Akbar, Shahid; Hayat, Maqsood; Iqbal, Muhammad; Jan, Mian Ahmad

    2017-06-01

    Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Estimating Species Distributions Across Space Through Time and with Features of the Environment

    Energy Technology Data Exchange (ETDEWEB)

    Kelling, S. [Cornell Lab of Ornithology; Fink, D. [Cornell Lab of Ornithology; Hochachka, W. [Cornell Lab of Ornithology; Rosenberg, K. [Cornell Lab of Ornithology; Cook, R. [Oak Ridge National Laboratory (ORNL); Damoulas, C. [Department of Computer Science, Cornell University; Silva, C. [Department of Computer Science, Polytechnic Institute of New York; Michener, W. [DataONE, University of New Mexico

    2013-01-01

    Complete guidance for mastering the tools and techniques of the digital revolution With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections. Emphasizing data-intensive thinking and interdisciplinary collaboration, The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book: Outlines the concepts and rationale for implementing data-intensive computing in organizations Covers from the ground up problem-solving strategies for data analysis in a data-rich world Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL Features in-depth case studies in customer relations, environmental hazards, seismology, and more Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering Includes sample program snippets throughout the text as well as additional materials on a companion website The Data Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing.

  6. Robot Command Interface Using an Audio-Visual Speech Recognition System

    Science.gov (United States)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  7. Audio-video decision support for patients: the documentary genré as a basis for decision aids.

    Science.gov (United States)

    Volandes, Angelo E; Barry, Michael J; Wood, Fiona; Elwyn, Glyn

    2013-09-01

    Decision support tools are increasingly using audio-visual materials. However, disagreement exists about the use of audio-visual materials as they may be subjective and biased. This is a literature review of the major texts for documentary film studies to extrapolate issues of objectivity and bias from film to decision support tools. The key features of documentary films are that they attempt to portray real events and that the attempted reality is always filtered through the lens of the filmmaker. The same key features can be said of decision support tools that use audio-visual materials. Three concerns arising from documentary film studies as they apply to the use of audio-visual materials in decision support tools include whose perspective matters (stakeholder bias), how to choose among audio-visual materials (selection bias) and how to ensure objectivity (editorial bias). Decision science needs to start a debate about how audio-visual materials are to be used in decision support tools. Simply because audio-visual materials may be subjective and open to bias does not mean that we should not use them. Methods need to be found to ensure consensus around balance and editorial control, such that audio-visual materials can be used. © 2011 John Wiley & Sons Ltd.

  8. Inconspicuous portable audio/visual recording: transforming an IV pole into a mobile video capture stand.

    Science.gov (United States)

    Pettineo, Christopher M; Vozenilek, John A; Kharasch, Morris; Wang, Ernest; Aitchison, Pam; Arreguin, Andrew

    2008-01-01

    Although a traditional simulation laboratory may have excellent installed audio/visual capabilities, often large classes overwhelm the limited space in the laboratory. With minimal monetary investment, it is possible to create a portable audio/visual stand from an old IV pole. An IV pole was transformed into an audio/visual stand to overcome the burden of transporting individual electronic components during a patient safety research project conducted in an empty patient room with a standardized patient. The materials and methods for making the modified IV pole are outlined in this article. The limiting factor of production is access to an old IV pole; otherwise a few purchases from an electronics store complete the audio/visual IV pole. The modified IV pole is a cost-effective and portable solution to limited space or the need for audio/visual capabilities outside of a simulation laboratory. The familiarity of an IV pole in a clinical setting reduces the visual disturbance of relocated audio/visual equipment in a room previously void of such instrumentation.

  9. Implementing Audio-CASI on Windows' Platforms.

    Science.gov (United States)

    Cooley, Philip C; Turner, Charles F

    1998-01-01

    Audio computer-assisted self interviewing (Audio-CASI) technologies have recently been shown to provide important and sometimes dramatic improvements in the quality of survey measurements. This is particularly true for measurements requiring respondents to divulge highly sensitive information such as their sexual, drug use, or other sensitive behaviors. However, DOS-based Audio-CASI systems that were designed and adopted in the early 1990s have important limitations. Most salient is the poor control they provide for manipulating the video presentation of survey questions. This article reports our experiences adapting Audio-CASI to Microsoft Windows 3.1 and Windows 95 platforms. Overall, our Windows-based system provided the desired control over video presentation and afforded other advantages including compatibility with a much wider array of audio devices than our DOS-based Audio-CASI technologies. These advantages came at the cost of increased system requirements --including the need for both more RAM and larger hard disks. While these costs will be an issue for organizations converting large inventories of PCS to Windows Audio-CASI today, this will not be a serious constraint for organizations and individuals with small inventories of machines to upgrade or those purchasing new machines today.

  10. High-Fidelity Piezoelectric Audio Device

    Science.gov (United States)

    Woodward, Stanley E.; Fox, Robert L.; Bryant, Robert G.

    2003-01-01

    ModalMax is a very innovative means of harnessing the vibration of a piezoelectric actuator to produce an energy efficient low-profile device with high-bandwidth high-fidelity audio response. The piezoelectric audio device outperforms many commercially available speakers made using speaker cones. The piezoelectric device weighs substantially less (4 g) than the speaker cones which use magnets (10 g). ModalMax devices have extreme fabrication simplicity. The entire audio device is fabricated by lamination. The simplicity of the design lends itself to lower cost. The piezoelectric audio device can be used without its acoustic chambers and thereby resulting in a very low thickness of 0.023 in. (0.58 mm). The piezoelectric audio device can be completely encapsulated, which makes it very attractive for use in wet environments. Encapsulation does not significantly alter the audio response. Its small size (see Figure 1) is applicable to many consumer electronic products, such as pagers, portable radios, headphones, laptop computers, computer monitors, toys, and electronic games. The audio device can also be used in automobile or aircraft sound systems.

  11. Implementing Audio-CASI on Windows’ Platforms

    Science.gov (United States)

    Cooley, Philip C.; Turner, Charles F.

    2011-01-01

    Audio computer-assisted self interviewing (Audio-CASI) technologies have recently been shown to provide important and sometimes dramatic improvements in the quality of survey measurements. This is particularly true for measurements requiring respondents to divulge highly sensitive information such as their sexual, drug use, or other sensitive behaviors. However, DOS-based Audio-CASI systems that were designed and adopted in the early 1990s have important limitations. Most salient is the poor control they provide for manipulating the video presentation of survey questions. This article reports our experiences adapting Audio-CASI to Microsoft Windows 3.1 and Windows 95 platforms. Overall, our Windows-based system provided the desired control over video presentation and afforded other advantages including compatibility with a much wider array of audio devices than our DOS-based Audio-CASI technologies. These advantages came at the cost of increased system requirements --including the need for both more RAM and larger hard disks. While these costs will be an issue for organizations converting large inventories of PCS to Windows Audio-CASI today, this will not be a serious constraint for organizations and individuals with small inventories of machines to upgrade or those purchasing new machines today. PMID:22081743

  12. Turkish Music Genre Classification using Audio and Lyrics Features

    National Research Council Canada - National Science Library

    Önder ÇOBAN

    2017-01-01

    .... In this context, researchers have developed music information systems to find solutions for such major problems as automatic playlist creation, hit song detection, and music genre or mood classification...

  13. Key features of human episodic recollection in the cross-episode retrieval of rat hippocampus representations of space.

    Directory of Open Access Journals (Sweden)

    Eduard Kelemen

    2013-07-01

    Full Text Available Neurophysiological studies focus on memory retrieval as a reproduction of what was experienced and have established that neural discharge is replayed to express memory. However, cognitive psychology has established that recollection is not a verbatim replay of stored information. Recollection is constructive, the product of memory retrieval cues, the information stored in memory, and the subject's state of mind. We discovered key features of constructive recollection embedded in the rat CA1 ensemble discharge during an active avoidance task. Rats learned two task variants, one with the arena stable, the other with it rotating; each variant defined a distinct behavioral episode. During the rotating episode, the ensemble discharge of CA1 principal neurons was dynamically organized to concurrently represent space in two distinct codes. The code for spatial reference frame switched rapidly between representing the rat's current location in either the stationary spatial frame of the room or the rotating frame of the arena. The code for task variant switched less frequently between a representation of the current rotating episode and the stable episode from the rat's past. The characteristics and interplay of these two hippocampal codes revealed three key properties of constructive recollection. (1 Although the ensemble representations of the stable and rotating episodes were distinct, ensemble discharge during rotation occasionally resembled the stable condition, demonstrating cross-episode retrieval of the representation of the remote, stable episode. (2 This cross-episode retrieval at the level of the code for task variant was more likely when the rotating arena was about to match its orientation in the stable episode. (3 The likelihood of cross-episode retrieval was influenced by preretrieval information that was signaled at the level of the code for spatial reference frame. Thus key features of episodic recollection manifest in rat hippocampal

  14. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  15. CERN automatic audio-conference service

    CERN Document Server

    Sierra Moral, R

    2010-01-01

    Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first Euro...

  16. Definición de audio

    OpenAIRE

    Montañez, Luis A.; Cabrera, Juan G.

    2015-01-01

    Descripción del significado de Audio como objeto de estudio por distintos autores, y su diferenciación con el significado de Sonido. De esta forma se define Audio como una señal eléctrica con características similares en su forma de onda en comparación a la de una señal sonora, teniendo en cuenta la señal sonora corresponde a presión en u medio físico, mientras que la señal de Audio es una tensión o voltaje definida como señal análoga. En este orden de ideas, el Audio se concibe como una seña...

  17. Spatial audio reproduction with primary ambient extraction

    CERN Document Server

    He, JianJun

    2017-01-01

    This book first introduces the background of spatial audio reproduction, with different types of audio content and for different types of playback systems. A literature study on the classical and emerging Primary Ambient Extraction (PAE) techniques is presented. The emerging techniques aim to improve the extraction performance and also enhance the robustness of PAE approaches in dealing with more complex signals encountered in practice. The in-depth theoretical study helps readers to understand the rationales behind these approaches. Extensive objective and subjective experiments validate the feasibility of applying PAE in spatial audio reproduction systems. These experimental results, together with some representative audio examples and MATLAB codes of the key algorithms, illustrate clearly the differences among various approaches and also help readers gain insights on selecting different approaches for different applications.

  18. Virtual Microphones for Multichannel Audio Resynthesis

    Directory of Open Access Journals (Sweden)

    Athanasios Mouchtaris

    2003-09-01

    Full Text Available Multichannel audio offers significant advantages for music reproduction, including the ability to provide better localization and envelopment, as well as reduced imaging distortion. On the other hand, multichannel audio is a demanding media type in terms of transmission requirements. Often, bandwidth limitations prohibit transmission of multiple audio channels. In such cases, an alternative is to transmit only one or two reference channels and recreate the rest of the channels at the receiving end. Here, we propose a system capable of synthesizing the required signals from a smaller set of signals recorded in a particular venue. These synthesized “virtual” microphone signals can be used to produce multichannel recordings that accurately capture the acoustics of that venue. Applications of the proposed system include transmission of multichannel audio over the current Internet infrastructure and, as an extension of the methods proposed here, remastering existing monophonic and stereophonic recordings for multichannel rendering.

  19. Using audio visuals to illustrate concepts

    OpenAIRE

    Hodgson, Tom

    2005-01-01

    This short pedagogic paper investigates the use of audio visual presentation techniques to enhance teaching and learning in the classroom. It looks at the current 'MTV' generation of students who find it difficult to concentrate for long periods of time.

  20. Augmenting Environmental Interaction in Audio Feedback Systems

    Directory of Open Access Journals (Sweden)

    Seunghun Kim

    2016-04-01

    Full Text Available Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by enriching the influence of existing acoustic information such as room response and ambient noise. This ecological approach to audio feedback emphasizes mutual sonic interaction between signal processing and the acoustic environment. Mappings from analyses of the received signal to signal-processing parameters are designed to emphasize this specificity as an aesthetic goal. Our feedback system presents four types of mappings: approximate analyses of room reverberation to tempo-scale characteristics, ambient noise to amplitude and two different approximations of resonances to timbre. These mappings are validated computationally and evaluated experimentally in different acoustic conditions.

  1. CERN automatic audio-conference service

    CERN Multimedia

    Sierra Moral, R

    2009-01-01

    Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first Euro...

  2. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  3. K-OPLS package: kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space.

    Science.gov (United States)

    Bylesjö, Max; Rantalainen, Mattias; Nicholson, Jeremy K; Holmes, Elaine; Trygg, Johan

    2008-02-19

    Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation. We demonstrate an implementation of the K-OPLS algorithm for MATLAB and R, licensed under the GNU GPL and available at http://www.sourceforge.net/projects/kopls/. The package includes essential functionality and documentation for model evaluation (using cross-validation), training and prediction of future samples. Incorporated is also a set of diagnostic tools and plot functions to simplify the visualisation of data, e.g. for detecting trends or for identification of outlying samples. The utility of the software package is demonstrated by means of a metabolic profiling data set from a biological study of hybrid aspen. The properties of the K-OPLS method are well suited for analysis of biological data, which in conjunction with the availability of the outlined open-source package provides a comprehensive solution for kernel-based analysis in bioinformatics applications.

  4. Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space.

    Directory of Open Access Journals (Sweden)

    Wei Ji Ma

    Full Text Available Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness, one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

  5. Design and implementation of an audio indicator

    Science.gov (United States)

    Zheng, Shiyong; Li, Zhao; Li, Biqing

    2017-04-01

    This page proposed an audio indicator which designed by using C9014, LED by operational amplifier level indicator, the decimal count/distributor of CD4017. The experimental can control audibly neon and holiday lights through the signal. Input audio signal after C9014 composed of operational amplifier for power amplifier, the adjust potentiometer extraction amplification signal input voltage CD4017 distributors make its drive to count, then connect the LED display running situation of the circuit. This simple audio indicator just use only U1 and can produce two colors LED with the audio signal tandem come pursuit of the running effect, from LED display the running of the situation takes can understand the general audio signal. The variation in the audio and the frequency of the signal and the corresponding level size. In this light can achieve jump to change, slowly, atlas, lighting four forms, used in home, hotel, discos, theater, advertising and other fields, and a wide range of USES, rU1h life in a modern society.

  6. PENGGUNAAN MEDIA AUDIO DALAM PEMBELAJARAN STENOGRAFI

    Directory of Open Access Journals (Sweden)

    S Martono

    2011-06-01

    Full Text Available The objective this study is to know the effectivenes of using audio media in stenografi typing learning. The population  of this research was 30 students that divided into two groups; experimental and controlled group consisted of 15 students. Based on the first score in stenografi subject that the two groups have the same abillity but they were given different treatment. For experimental group, they got a treatment of audio media whereas the controlled group didn’t use audio media. The technique of collecting data were documentation technique and experimental tecnique. The instrument was stenografi speed typing. The final result showed that the using of audio media was more effective and can improve the study result better than controlled group. This result was expected to  give significance for the stenografi teachers to apply audio media in learning and input for the students that stenografi was not a memorizing subject but it was a skill subject that must be trained by joining the lesson. Thus, people can use stenografi typing to record each talk. Keywords: Learning, Audio Media, Stenografi

  7. PENGGUNAAN MEDIA AUDIO DALAM PEMBELAJARAN STENOGRAFI

    Directory of Open Access Journals (Sweden)

    S Martono

    2007-06-01

    Full Text Available The objective this study is to know the effectivenes of using audio media in stenografi typing learning. The population  of this research was 30 students that divided into two groups; experimental and controlled group consisted of 15 students. Based on the first score in stenografi subject that the two groups have the same abillity but they were given different treatment. For experimental group, they got a treatment of audio media whereas the controlled group didn’t use audio media. The technique of collecting data were documentation technique and experimental tecnique. The instrument was stenografi speed typing. The final result showed that the using of audio media was more effective and can improve the study result better than controlled group. This result was expected to  give significance for the stenografi teachers to apply audio media in learning and input for the students that stenografi was not a memorizing subject but it was a skill subject that must be trained by joining the lesson. Thus, people can use stenografi typing to record each talk. Keywords: Learning, Audio Media, Stenografi

  8. Audio Mining with emphasis on Music Genre Classification

    DEFF Research Database (Denmark)

    Meng, Anders

    2004-01-01

    in searching / retrieving audio effectively is needed. Currently, search engines such as e.g. Google, AltaVista etc. do not search into audio files, but uses either the textual information attached to the audio file or the textual information around the audio. Also in the hearing aid industries around...

  9. Navigation for the Blind through Audio-Based Virtual Environments.

    Science.gov (United States)

    Sánchez, Jaime; Sáenz, Mauricio; Pascual-Leone, Alvaro; Merabet, Lotfi

    2010-01-01

    We present the design, development and an initial study changes and adaptations related to navigation that take place in the brain, by incorporating an Audio-Based Environments Simulator (AbES) within a neuroimaging environment. This virtual environment enables a blind user to navigate through a virtual representation of a real space in order to train his/her orientation and mobility skills. Our initial results suggest that this kind of virtual environment could be highly efficient as a testing, training and rehabilitation platform for learning and navigation.

  10. Haptic and Visual feedback in 3D Audio Mixing Interfaces

    DEFF Research Database (Denmark)

    Gelineck, Steven; Overholt, Daniel

    2015-01-01

    This paper describes the implementation and informal evaluation of a user interface that explores haptic feedback for 3D audio mixing. The implementation compares different approaches using either the LEAP Motion for mid-air hand gesture control, or the Novint Falcon for active haptic feed- back...... in order to augment the perception of the 3D space. We compare different interaction paradigms implemented using these interfaces, aiming to increase speed and accuracy and reduce the need for constant visual feedback. While the LEAP Motion relies upon visual perception and proprioception, users can forego...

  11. Using multiple visual tandem streams in audio-visual speech recognition

    OpenAIRE

    Topkaya, İbrahim Saygın; Topkaya, Ibrahim Saygin; Erdoğan, Hakan; Erdogan, Hakan

    2011-01-01

    The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers ...

  12. The research on image encryption method based on parasitic audio watermark

    Science.gov (United States)

    Gao, Pei-pei; Zhu, Yao-ting; Zhang, Shi-tao

    2010-11-01

    In order to improve image encryption strength, an image encryption method based on parasitic audio watermark was proposed in this paper, which relies on double messages such as image domain and speech domain to do image encryption protection. The method utilizes unique Chinese phonetics synthesis algorithm to complete audio synthesis with embedded text, then separate this sentence information into prosodic phrase, obtains complete element set of initial consonant and compound vowel that reflects audio feature of statement. By sampling and scrambling the initial consonant and compound vowel element, synthesizing them with image watermark, and embedding the compound into the image to be encrypted in frequency domain, the processed image contains image watermark information and parasitizes audio feature information. After watermark extraction, using the same phonetics synthesis algorithm the audio information is synthesized and compared with the original. Experiments show that any decryption method in image domain or speech domain could not break encryption protection and image gains higher encryption strength and security level by double encryption.

  13. Audio stream classification for multimedia database search

    Science.gov (United States)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  14. Determination of over current protection thresholds for class D audio amplifiers

    DEFF Research Database (Denmark)

    Nyboe, Flemming; Risbo, L; Andreani, Pietro

    2005-01-01

    Monolithic class-D audio amplifiers typically feature built-in over current protection circuitry that shuts down the amplifier in case of a short circuit on the output speaker terminals. To minimize cost, the threshold at which the device shuts down must be set just above the maximum current...

  15. Near-field Localization of Audio

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2014-01-01

    Localization of audio sources using microphone arrays has been an important research problem for more than two decades. Many traditional methods for solving the problem are based on a two-stage procedure: first, information about the audio source, such as time differences-of-arrival (TDOAs......) and gain ratios-of-arrival (GROAs) between microphones is estimated, and, second, this knowledge is used to localize the audio source. These methods often have a low computational complexity, but this comes at the cost of a limited estimation accuracy. Therefore, we propose a new localization approach......, where the desired signal is modeled using TDOAs and GROAs, which are determined by the source location. This facilitates the derivation of one-stage, maximum likelihood methods under a white Gaussian noise assumption that is applicable in both near- and far-field scenarios. Simulations show...

  16. Audio Description as a Pedagogical Tool

    Directory of Open Access Journals (Sweden)

    Georgina Kleege

    2015-05-01

    Full Text Available Audio description is the process of translating visual information into words for people who are blind or have low vision. Typically such description has focused on films, museum exhibitions, images and video on the internet, and live theater. Because it allows people with visual impairments to experience a variety of cultural and educational texts that would otherwise be inaccessible, audio description is a mandated aspect of disability inclusion, although it remains markedly underdeveloped and underutilized in our classrooms and in society in general. Along with increasing awareness of disability, audio description pushes students to practice close reading of visual material, deepen their analysis, and engage in critical discussions around the methodology, standards and values, language, and role of interpretation in a variety of academic disciplines. We outline a few pedagogical interventions that can be customized to different contexts to develop students' writing and critical thinking skills through guided description of visual material.

  17. Spatial audio quality perception (part 2)

    DEFF Research Database (Denmark)

    Conetta, R.; Brookes, T.; Rumsey, F.

    2015-01-01

    encountered in consumer audio reproduction. A generalizable model employing just five metrics and two principal components performs well in its prediction of the quality over a range of program types. Commonly-encountered SAPs can have a large deleterious effect on several spatial attributes including source...... location, envelopment, coverage angle, ensemble width, and spaciousness. They can also impact timbre, and changes to timbre can then influence spatial perception. Previously obtained data was used to build a regression model of perceived spatial audio quality in terms of spatial and timbral metrics......The QESTRAL (Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener) system is intended to be an artificial-listener-based evaluation system capable of predicting the perceived spatial quality degradations resulting from SAPs (Spatial Audio Processes) commonly...

  18. Evaluation of Perceived Spatial Audio Quality

    Directory of Open Access Journals (Sweden)

    Jan Berg

    2006-04-01

    Full Text Available The increased use of audio applications capable of conveying enhanced spatial quality puts focus on how such a quality should be evaluated. Different approaches to evaluation of perceived quality are briefly discussed and a new technique is introduced. In a series of experiment, attributes were elicited from subjects, tested and subsequently used for derivation of evaluation scales that were feasible for subjective evaluation of the spatial quality of certain multichannel stimuli. The findings of these experiments led to the development of a novel method for evaluation of spatial audio in surround sound systems. Parts of the method were subsequently implemented in the OPAQUE software prototype designed to facilitate the elicitation process. The prototype was successfully tested in a pilot experiment. The experiments show that attribute scales derived from subjects' personal constructs are functional for evaluation of perceived spatial audio quality. Finally, conclusions on the importance of spatial quality evaluation of new applications are made.

  19. Audio Technology and Mobile Human Computer Interaction

    DEFF Research Database (Denmark)

    Chamberlain, Alan; Bødker, Mads; Hazzard, Adrian

    2017-01-01

    Audio-based mobile technology is opening up a range of new interactive possibilities. This paper brings some of those possibilities to light by offering a range of perspectives based in this area. It is not only the technical systems that are developing, but novel approaches to the design and und...... and understanding of audio-based mobile systems are evolving to offer new perspectives on interaction and design and support such systems to be applied in areas, such as the humanities.......Audio-based mobile technology is opening up a range of new interactive possibilities. This paper brings some of those possibilities to light by offering a range of perspectives based in this area. It is not only the technical systems that are developing, but novel approaches to the design...

  20. Feature-space assessment of electrical impedance tomography coregistered with computed tomography in detecting multiple contrast targets.

    Science.gov (United States)

    Krishnan, Kalpagam; Liu, Jeff; Kohli, Kirpal

    2014-06-01

    Fusion of electrical impedance tomography (EIT) with computed tomography (CT) can be useful as a clinical tool for providing additional physiological information about tissues, but requires suitable fusion algorithms and validation procedures. This work explores the feasibility of fusing EIT and CT images using an algorithm for coregistration. The imaging performance is validated through feature space assessment on phantom contrast targets. EIT data were acquired by scanning a phantom using a circuit, configured for injecting current through 16 electrodes, placed around the phantom. A conductivity image of the phantom was obtained from the data using electrical impedance and diffuse optical tomography reconstruction software (EIDORS). A CT image of the phantom was also acquired. The EIT and CT images were fused using a region of interest (ROI) coregistration fusion algorithm. Phantom imaging experiments were carried out on objects of different contrasts, sizes, and positions. The conductive medium of the phantoms was made of a tissue-mimicking bolus material that is routinely used in clinical radiation therapy settings. To validate the imaging performance in detecting different contrasts, the ROI of the phantom was filled with distilled water and normal saline. Spatially separated cylindrical objects of different sizes were used for validating the imaging performance in multiple target detection. Analyses of the CT, EIT and the EIT/CT phantom images were carried out based on the variations of contrast, correlation, energy, and homogeneity, using a gray level co-occurrence matrix (GLCM). A reference image of the phantom was simulated using EIDORS, and the performances of the CT and EIT imaging systems were evaluated and compared against the performance of the EIT/CT system using various feature metrics, detectability, and structural similarity index measures. In detecting distilled and normal saline water in bolus medium, EIT as a stand-alone imaging system showed

  1. The perceptual influence of the cabin acoustics on the reproduced sound of a car audio system

    DEFF Research Database (Denmark)

    Kaplanis, Neofytos; Bech, Søren; Sakari, Tervo

    2015-01-01

    A significant element of audio evaluation experiments is the availability of verbal descriptors that can accurately characterize the perceived auditory events. In terms of room acoustics, understanding the perceptual effects of the physical properties of the space would enable a better understand...

  2. Synchronization and comparison of Lifelog audio recordings

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch; Hansen, Lars Kai

    2008-01-01

    We investigate concurrent ‘Lifelog’ audio recordings to locate segments from the same environment. We compare two techniques earlier proposed for pattern recognition in extended audio recordings, namely cross-correlation and a fingerprinting technique. If successful, such alignment can be used...... as a preprocessing step to select and synchronize recordings before further processing. The two methods perform similarly in classification, but fingerprinting scales better with the number of recordings, while cross-correlation can offer sample resolution synchronization. We propose and investigate the benefits...... of combining the two. In particular we show that the combination allows sample resolution synchronization and scalability....

  3. Frequency Hopping Method for Audio Watermarking

    Directory of Open Access Journals (Sweden)

    A. Anastasijević

    2012-11-01

    Full Text Available This paper evaluates the degradation of audio content for a perceptible removable watermark. Two different approaches to embedding the watermark in the spectral domain were investigated. The frequencies for watermark embedding are chosen according to a pseudorandom sequence making the methods robust. Consequentially, the lower quality audio can be used for promotional purposes. For a fee, the watermark can be removed with a secret watermarking key. Objective and subjective testing was conducted in order to measure degradation level for the watermarked music samples and to examine residual distortion for different parameters of the watermarking algorithm and different music genres.

  4. Nonlinear dynamic macromodeling techniques for audio systems

    Science.gov (United States)

    Ogrodzki, Jan; Bieńkowski, Piotr

    2015-09-01

    This paper develops a modelling method and a models identification technique for the nonlinear dynamic audio systems. Identification is performed by means of a behavioral approach based on a polynomial approximation. This approach makes use of Discrete Fourier Transform and Harmonic Balance Method. A model of an audio system is first created and identified and then it is simulated in real time using an algorithm of low computational complexity. The algorithm consists in real time emulation of the system response rather than in simulation of the system itself. The proposed software is written in Python language using object oriented programming techniques. The code is optimized for a multithreads environment.

  5. Music information retrieval in compressed audio files: a survey

    Science.gov (United States)

    Zampoglou, Markos; Malamos, Athanasios G.

    2014-07-01

    In this paper, we present an organized survey of the existing literature on music information retrieval systems in which descriptor features are extracted directly from the compressed audio files, without prior decompression to pulse-code modulation format. Avoiding the decompression step and utilizing the readily available compressed-domain information can significantly lighten the computational cost of a music information retrieval system, allowing application to large-scale music databases. We identify a number of systems relying on compressed-domain information and form a systematic classification of the features they extract, the retrieval tasks they tackle and the degree in which they achieve an actual increase in the overall speed-as well as any resulting loss in accuracy. Finally, we discuss recent developments in the field, and the potential research directions they open toward ultra-fast, scalable systems.

  6. Audio Streaming with Silence Detection Using 802.15.4 Radios

    OpenAIRE

    A. W. Rohankar; Shantanu Pathak; Mrinal K. Naskar; Amitava Mukherjee

    2012-01-01

    Short-range radios with low data rate are gaining popularity due to their abundant commercial availability. It is imperative that high-speed multimedia would be an attractive application field with these radios. Audio over 802.15.4 compliant radios is a challenging task to achieve. This paper describes a real-time implementation of audio communication using 802.15.4 radios. Silence detection and soft ADPCM are the main features of our work. Our results show that silence detection improves ban...

  7. A High-Voltage Class D Audio Amplifier for Dielectric Elastomer Transducers

    DEFF Research Database (Denmark)

    Nielsen, Dennis; Knott, Arnold; Andersen, Michael A. E.

    2014-01-01

    Dielectric Elastomer (DE) transducers have emerged as a very interesting alternative to the traditional electrodynamic transducer. Lightweight, small size and high maneuverability are some of the key features of the DE transducer. An amplifier for the DE transducer suitable for audio applications...... is proposed and analyzed. The amplifier addresses the issue of a high impedance load, ensuring a linear response over the midrange region of the audio bandwidth (100 Hz – 3.5 kHz). THD+N below 0.1% are reported for the ± 300 V prototype amplifier producing a maximum of 125 Var at a peak efficiency of 95 %....

  8. A High-Voltage Class D Audio Amplifier for Dielectric Elastomer Transducers

    OpenAIRE

    Nielsen, Dennis; Knott, Arnold; Andersen, Michael A. E.

    2014-01-01

    Dielectric Elastomer (DE) transducers have emerged as a very interesting alternative to the traditional electrodynamic transducer. Lightweight, small size and high maneuverability are some of the key features of the DE transducer. An amplifier for the DE transducer suitable for audio applications is proposed and analyzed. The amplifier addresses the issue of a high impedance load, ensuring a linear response over the midrange region of the audio bandwidth (100 Hz – 3.5 kHz). THD+N below 0.1% a...

  9. Audio wiring guide how to wire the most popular audio and video connectors

    CERN Document Server

    Hechtman, John

    2012-01-01

    Whether you're a pro or an amateur, a musician or into multimedia, you can't afford to guess about audio wiring. The Audio Wiring Guide is a comprehensive, easy-to-use guide that explains exactly what you need to know. No matter the size of your wiring project or installation, this handy tool provides you with the essential information you need and the techniques to use it. Using The Audio Wiring Guide is like having an expert at your side. By following the clear, step-by-step directions, you can do professional-level work at a fraction of the cost.

  10. Audio Haptic Videogaming for Developing Wayfinding Skills in Learners Who are Blind.

    Science.gov (United States)

    Sánchez, Jaime; de Borba Campos, Marcia; Espinoza, Matías; Merabet, Lotfi B

    2014-01-01

    Interactive digital technologies are currently being developed as a novel tool for education and skill development. Audiopolis is an audio and haptic based videogame designed for developing orientation and mobility (O&M) skills in people who are blind. We have evaluated the cognitive impact of videogame play on O&M skills by assessing performance on a series of behavioral tasks carried out in both indoor and outdoor virtual spaces. Our results demonstrate that the use of Audiopolis had a positive impact on the development and use of O&M skills in school-aged learners who are blind. The impact of audio and haptic information on learning is also discussed.

  11. Frequency Compensation of an Audio Power Amplifier

    NARCIS (Netherlands)

    van der Zee, Ronan A.R.; van Heeswijk, R.

    2006-01-01

    A car audio power amplifier is presented that uses a frequency compensation scheme which avoids large compensation capacitors around the MOS power transistors, while retaining the bandwidth and stable load range of nested miller compensation. THD is 0.005%@(1kHz, 10W), SNR is 108dB, and the

  12. An ESL Audio-Script Writing Workshop

    Science.gov (United States)

    Miller, Carla

    2012-01-01

    The roles of dialogue, collaborative writing, and authentic communication have been explored as effective strategies in second language writing classrooms. In this article, the stages of an innovative, multi-skill writing method, which embeds students' personal voices into the writing process, are explored. A 10-step ESL Audio Script Writing Model…

  13. Progressive Audio-Lingual Drills in English.

    Science.gov (United States)

    Stieglitz, Francine

    This manual comprises the transcript of the recordings for "Progressive Audio-Lingual Drills in English." These drills are a grammar practice supplement for any basic course in English as a second language. Although intended for use by the instructor, the manual may be used by the student in individual study situations. Work with the recordings…

  14. Consuming audio: an introduction to Tweak Theory

    NARCIS (Netherlands)

    Perlman, Marc

    2014-01-01

    abstractAudio technology is a medium for music, and when we pay attention to it we tend to speculate about its effects on the music it transmits. By now there are well-established traditions of commentary (many of them critical) about the impact of musical reproduction on musical production.

  15. Audible Aliasing Distortion in Digital Audio Synthesis

    Directory of Open Access Journals (Sweden)

    J. Schimmel

    2012-04-01

    Full Text Available This paper deals with aliasing distortion in digital audio signal synthesis of classic periodic waveforms with infinite Fourier series, for electronic musical instruments. When these waveforms are generated in the digital domain then the aliasing appears due to its unlimited bandwidth. There are several techniques for the synthesis of these signals that have been designed to avoid or reduce the aliasing distortion. However, these techniques have high computing demands. One can say that today's computers have enough computing power to use these methods. However, we have to realize that today’s computer-aided music production requires tens of multi-timbre voices generated simultaneously by software synthesizers and the most of the computing power must be reserved for hard-disc recording subsystem and real-time audio processing of many audio channels with a lot of audio effects. Trivially generated classic analog synthesizer waveforms are therefore still effective for sound synthesis. We cannot avoid the aliasing distortion but spectral components produced by the aliasing can be masked with harmonic components and thus made inaudible if sufficient oversampling ratio is used. This paper deals with the assessment of audible aliasing distortion with the help of a psychoacoustic model of simultaneous masking and compares the computing demands of trivial generation using oversampling with those of other methods.

  16. All About Audio Equalization: Solutions and Frontiers

    Directory of Open Access Journals (Sweden)

    Vesa Välimäki

    2016-05-01

    Full Text Available Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in audio equalization, their relative merits and applications. Digital signal processing techniques for modifying the spectral balance in audio signals and applications of these techniques are reviewed, ranging from classic equalizers to emerging designs based on new advances in signal processing and machine learning. Emphasis is placed on putting the range of approaches within a common mathematical and conceptual framework. The application areas discussed herein are diverse, and include well-defined, solvable problems of filter design subject to constraints, as well as newly emerging challenges that touch on problems in semantics, perception and human computer interaction. Case studies are given in order to illustrate key concepts and how they are applied in practice. We also recommend preferred signal processing approaches for important audio equalization problems. Finally, we discuss current challenges and the uncharted frontiers in this field. The source code for methods discussed in this paper is made available at https://code.soundsoftware.ac.uk/projects/allaboutaudioeq.

  17. Structuring Broadcast Audio for Information Access

    Science.gov (United States)

    Gauvain, Jean-Luc; Lamel, Lori

    2003-12-01

    One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks.

  18. Building Digital Audio Preservation Infrastructure and Workflows

    Science.gov (United States)

    Young, Anjanette; Olivieri, Blynne; Eckler, Karl; Gerontakos, Theodore

    2010-01-01

    In 2009 the University of Washington (UW) Libraries special collections received funding for the digital preservation of its audio indigenous language holdings. The university libraries, where the authors work in various capacities, had begun digitizing image and text collections in 1997. Because of this, at the onset of the project, workflows (a…

  19. Audio Technology and Mobile Human Computer Interaction

    DEFF Research Database (Denmark)

    Chamberlain, Alan; Bødker, Mads; Hazzard, Adrian

    2017-01-01

    Audio-based mobile technology is opening up a range of new interactive possibilities. This paper brings some of those possibilities to light by offering a range of perspectives based in this area. It is not only the technical systems that are developing, but novel approaches to the design and und...

  20. Transparency benchmarking on audio watermarks and steganography

    Science.gov (United States)

    Kraetzer, Christian; Dittmann, Jana; Lang, Andreas

    2006-02-01

    The evaluation of transparency plays an important role in the context of watermarking and steganography algorithms. This paper introduces a general definition of the term transparency in the context of steganography, digital watermarking and attack based evaluation of digital watermarking algorithms. For this purpose the term transparency is first considered individually for each of the three application fields (steganography, digital watermarking and watermarking algorithm evaluation). From the three results a general definition for the overall context is derived in a second step. The relevance and applicability of the definition given is evaluated in practise using existing audio watermarking and steganography algorithms (which work in time, frequency and wavelet domain) as well as an attack based evaluation suite for audio watermarking benchmarking - StirMark for Audio (SMBA). For this purpose selected attacks from the SMBA suite are modified by adding transparency enhancing measures using a psychoacoustic model. The transparency and robustness of the evaluated audio watermarking algorithms by using the original and modifid attacks are compared. The results of this paper show hat transparency benchmarking will lead to new information regarding the algorithms under observation and their usage. This information can result in concrete recommendations for modification, like the ones resulting from the tests performed here.

  1. Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

    Directory of Open Access Journals (Sweden)

    Butko Taras

    2011-01-01

    Full Text Available Abstract Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzín-2010 evaluation campaign. That evaluation consisted of segmenting audio from the 3/24 Catalan TV channel into five acoustic classes: music, speech, speech over music, speech over noise, and the other. The evaluation results displayed the difficulty of this segmentation task. In this article, after presenting the database and metric, as well as the feature extraction methods and segmentation techniques used by the submitted systems, the experimental results are analyzed and compared, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.

  2. Integration of top-down and bottom-up information for audio organization and retrieval

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand

    The increasing availability of digital audio and music calls for methods and systems to analyse and organize these digital objects. This thesis investigates three elements related to such systems focusing on the ability to represent and elicit the user's view on the multimedia object and the system...... is applied in the eld of music emotion modelling and optimization of a parametric audio system with high-dimensional input spaces. The final aspect, considered in the thesis, concerns the general context of users, such as location and social context. This is important in understanding user behavior...... output. The aim is to provide organization and processing, which aligns with the understanding and needs of the users. Audio and music is often characterized by the large amount of heterogenous information. The rst aspect investigated is the integration of such multi-variate and multi-modal information...

  3. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  4. AUDIO CRYPTANALYSIS- AN APPLICATION OF SYMMETRIC KEY CRYPTOGRAPHY AND AUDIO STEGANOGRAPHY

    National Research Council Canada - National Science Library

    Smita Paira; Sourabh Chandra

    2016-01-01

    .... Steganography is the art that meets one of the basic limitations of Cryptography. In this paper, a new algorithm has been proposed based on both Symmetric Key Cryptography and Audio Steganography...

  5. Predicting the overall spatial quality of automotive audio systems.

    OpenAIRE

    Koya, Daisuke

    2017-01-01

    The spatial quality of automotive audio systems is often compromised due to their unideal listening environments. Automotive audio systems need to be developed quickly due to industry demands. A suitable perceptual model could evaluate the spatial quality of automotive audio systems with similar reliability to formal listening tests but take less time. Such a model is developed in this research project by adapting an existing model of spatial quality for automotive audio use. The requirem...

  6. A high performance switching audio amplifier using sliding mode control

    OpenAIRE

    Pillonnet, Gael; Cellier, Rémy; Abouchi, Nacer; Chiollaz, Monique

    2008-01-01

    International audience; The switching audio amplifiers are widely used in various portable and consumer electronics due to their high efficiency, but suffers from low audio performances due to inherent nonlinearity. This paper presents an integrated class D audio amplifier with low consumption and high audio performances. It includes a power stage and an efficient control based on sliding mode technique. This monolithic class D amplifier is capable of delivering up to 1W into 8Ω load at less ...

  7. An informed synchronization scheme for audio data hiding

    Science.gov (United States)

    LoboGuerrero, Alejandro; Bas, Patrick; Lienard, Joel

    2004-06-01

    This paper deals with the problem of synchronization in the particular case of audio data hiding. In this kind of application the goal is to increase the information of an audio data set by inserting an imperceptible message. An innovating synchronization scheme that uses informed coding theory is proposed. The goal is to realize a complementary approach from two different techniques in order to obtain an enhanced synchronization system. To that end, the analysis of the classical spread spectrum synchronization is done and this classical scheme is improved by the use of side information. Informed coding theory is presented and revisited taking into account the problem of synchronization to enable the selection of signal realizations called Feature Time Points (FTP) which are correlated with a code. Such considerations yield to the definition of informed synchronization. The proposed scheme and the definition of FTP are after presented taking into account the robustness criterion. Finally, results and comparison with classical spread spectrum synchronization schemes are presented.

  8. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

    Directory of Open Access Journals (Sweden)

    W. Bastiaan Kleijn

    2005-06-01

    Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.

  9. Audio Books in the Nigerian Higher Educational System: To be ...

    African Journals Online (AJOL)

    This study discusses audio books from the point of view of an innovation. It discusses the advantages and disadvantages of audio books. It examined students' familiarization with audio books and their perception about its being introduced into the school system. It was found out that Nigerian students are already familiar ...

  10. Enhancement of LSB based Steganography for Hiding Image in Audio

    OpenAIRE

    Pradeep Kumar Singh; R.K.Aggrawal

    2010-01-01

    In this paper we will take an in-depth look on steganography by proposing a new method of Audio Steganography. Emphasize will be on the proposed scheme of image hiding in audio and its comparison with simple Least Significant Bit insertion method for data hiding in audio.

  11. Features of motivation of the crewmembers in an enclosed space at atmospheric pressure changes during breathing inert gases.

    Science.gov (United States)

    Komarevcev, Sergey

    Since the 1960s, our psychologists are working on experimenting with small groups in isolation .It was associated with the beginning of spaceflight and necessity to study of human behaviors in ways different from the natural habitat of man .Those, who study human behavior especially in isolation, know- that the behavior in isolation markedly different from that in the natural situations. It associated with the development of new, more adaptive behaviors (1) What are the differences ? First of all , isolation is achieved by the fact ,that the group is in a closed space. How experiments show - the crew members have changed the basic personality traits, such as motivation Statement of the problem and methods. In our experimentation we were interested in changing the features of human motivation (strength, stability and direction of motivation) in terms of a closed group in the modified atmosphere pressure and breathing inert gases. Also, we were interested in particular external and internal motivation of the individual in the circumstances. To conduct experimentation , we used an experimental barocomplex GVK -250 , which placed a group of six mans. A task was to spend fifteen days in isolation on barokomplex when breathing oxigen - xenon mixture of fifteen days in isolation on the same complex when breathing oxygen- helium mixture and fifteen days of isolation on the same complex when breathing normal air All this time, the subjects were isolated under conditions of atmospheric pressure changes , closer to what you normally deal divers. We assumed that breathing inert mixtures can change the strength and stability , and with it , the direction and stability of motivation. To check our results, we planned on using the battery of psychological techniques : 1. Schwartz technique that measures personal values and behavior in society, DORS procedure ( measurement of fatigue , monotony , satiety and stress ) and riffs that give the test once a week. Our assumption is

  12. Stream/Bounce Event Perception Reveals a Temporal Limit of Motion Correspondence Based on Surface Feature over Space and Time

    Directory of Open Access Journals (Sweden)

    Yousuke Kawachi

    2011-06-01

    Full Text Available We examined how stream/bounce event perception is affected by motion correspondence based on the surface features of moving objects passing behind an occlusion. In the stream/bounce display two identical objects moving across each other in a two-dimensional display can be perceived as either streaming through or bouncing off each other at coincidence. Here, surface features such as colour (Experiments 1 and 2 or luminance (Experiment 3 were switched between the two objects at coincidence. The moment of coincidence was invisible to observers due to an occluder. Additionally, the presentation of the moving objects was manipulated in duration after the feature switch at coincidence. The results revealed that a postcoincidence duration of approximately 200 ms was required for the visual system to stabilize judgments of stream/bounce events by determining motion correspondence between the objects across the occlusion on the basis of the surface feature. The critical duration was similar across motion speeds of objects and types of surface features. Moreover, controls (Experiments 4a–4c showed that cognitive bias based on feature (colour/luminance congruency across the occlusion could not fully account for the effects of surface features on the stream/bounce judgments. We discuss the roles of motion correspondence, visual feature processing, and attentive tracking in the stream/bounce judgments.

  13. Predicting the perception of performed dynamics in music audio with ensemble learning.

    Science.gov (United States)

    Elowsson, Anders; Friberg, Anders

    2017-03-01

    By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

  14. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  15. Unsupervised topic modelling on South African parliament audio data

    CSIR Research Space (South Africa)

    Kleynhans, N

    2014-11-01

    Full Text Available is as follows: • The audio is extracted from the video recordings of Parliament and sent through the Audio Diariser, which extracts spoken audio and marks the audio with meta- information such as gender and spoken language. • The processed audio... was then repeated by considering the next adjacent segment. The process stop after all segments were compared. 1http://www.transcoding.org/ 2http://sox.sourceforge.net/ • The combined segments were further classified based on gender (male or female), spoken language...

  16. Calibration of an audio frequency noise generator

    DEFF Research Database (Denmark)

    Diamond, Joseph M.

    1966-01-01

    it is used for measurement purposes. The spectral density of a noise source may be found by measuring its rms output over a known noise bandwidth. Such a bandwidth may be provided by a passive filter using accurately known elements. For example, the parallel resonant circuit with purely parallel damping has...... a noise bandwidth Bn = π/2 × (3dB bandwidth). To apply this method to low audio frequencies, the noise bandwidth of the low Q parallel resonant circuit has been found, including the effects of both series and parallel damping. The method has been used to calibrate a General Radio 1390-B noise generator......A noise generator of known output is very convenient in noise measurement. At low audio frequencies, however, all devices, including noise sources, may be affected by excess noise (1/f noise). It is therefore very desirable to be able to check the spectral density of a noise source before...

  17. Availability of feature-oriented scanning probe microscopy for remote-controlled measurements on board a space laboratory or planet exploration Rover.

    Science.gov (United States)

    Lapshin, Rostislav V

    2009-06-01

    Prospects for a feature-oriented scanning (FOS) approach to investigations of sample surfaces, at the micrometer and nanometer scales, with the use of scanning probe microscopy under space laboratory or planet exploration rover conditions, are examined. The problems discussed include decreasing sensitivity of the onboard scanning probe microscope (SPM) to temperature variations, providing autonomous operation, implementing the capabilities for remote control, self-checking, self-adjustment, and self-calibration. A number of topical problems of SPM measurements in outer space or on board a planet exploration rover may be solved via the application of recently proposed FOS methods.

  18. Person tracking using audio and depth cues

    OpenAIRE

    Liu, Q; deCampos, T; Wang, W.; Jackson, P.; Hilton, H.

    2015-01-01

    In this paper, a novel probabilistic Bayesian tracking scheme is proposed and applied to bimodal measurements consisting of tracking results from the depth sensor and audio recordings collected using binaural microphones. We use random finite sets to cope with varying number of tracking targets. A measurement-driven birth process is integrated to quickly localize any emerging person. A new bimodal fusion method that prioritizes the most confident modality is employed. The approach was tested ...

  19. Personalized Audio Systems - a Bayesian Approach

    DEFF Research Database (Denmark)

    Nielsen, Jens Brehm; Jensen, Bjørn Sand; Hansen, Toke Jansen

    2013-01-01

    , the present paper presents a general inter-active framework for personalization of such audio systems. The framework builds on Bayesian Gaussian process regression in which a model of the users's objective function is updated sequentially. The parameter setting to be evaluated in a given trial is selected...... are optimized using the proposed framework. Twelve test subjects obtain a personalized setting with the framework, and these settings are signicantly preferred to those obtained with random experimentation....

  20. Utilization of Nonlinear Converters for Audio Amplification

    OpenAIRE

    Iversen, Niels; Birch, Thomas; Knott, Arnold

    2012-01-01

    Class D ampliers ts the automotive demands quite well. The traditional buck-based amplier has reduced both the cost and size of ampliers. However the buck topology is not without its limitations. The maximum peak AC output voltage produced by the power stage is only equal the supply voltage. The introduction ofnon-linear converters for audio amplication defeats this limitation. A Cuk converter, designed to deliver an AC peak output voltage twice the supply voltage, is presented in this paper....

  1. Digitisation of the CERN Audio Archives

    CERN Multimedia

    Maximilien Brice

    2006-01-01

    Since the creation of CERN in 1954 until mid 1980s, the audiovisual service has recorded hundreds of hours of moments of life at CERN on audio tapes. These moments range from inaugurations of new facilities to VIP speeches and general interest cultural seminars The preservation process started in June 2005 On these pictures, we see Waltraud Hug working on an open-reel tape.

  2. A Smart Audio on Demand Application on Android Systems

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2015-05-01

    Full Text Available This paper describes a study of the realization of intelligent Audio on Demand (AOD processing in the embedded system environment. This study describes the development of innovative Android software that will enhance user experience of the increasingly popular number of smart mobile devices now available on the market. The application we developed can accumulate records of the songs that are played and automatically analyze the favorite song types of a user. The application can also select sound control playback functions to make operation more convenient. A large number of different types of music genre were collected to create a sound database and build an intelligent AOD processing mechanism. Formant analysis was used to extract voice features and the K-means clustering method and acoustic modeling technology of the Gaussian mixture model (GMM were used to study and develop the application mechanism. The processes we developed run smoothly in the embedded Android platform.

  3. Lost in semantic space: a multi-modal, non-verbal assessment of feature knowledge in semantic dementia

    National Research Council Canada - National Science Library

    Garrard, Peter; Carroll, Erin

    2006-01-01

    A novel, non-verbal test of semantic feature knowledge is introduced, enabling subordinate knowledge of four important concept attributes--colour, sound, environmental context and motion--to be individually probed...

  4. Using Audio-Derived Affective Offset to Enhance TV Recommendation

    DEFF Research Database (Denmark)

    Shepstone, Sven Ewan; Tan, Zheng-Hua; Jensen, Søren Holdt

    2014-01-01

    , by navigating the emotion space to request an alternative match. The final match is then compared to the initial match, in terms of the difference in the items' affective parameterization . This offset is then utilized in future recommendation sessions. The system was evaluated by eliciting three different......This paper introduces the concept of affective offset, which is the difference between a user's perceived affective state and the affective annotation of the content they wish to see. We show how this affective offset can be used within a framework for providing recommendations for TV programs....... First a user's mood profile is determined using 12-class audio-based emotion classifications . An initial TV content item is then displayed to the user based on the extracted mood profile. The user has the option to either accept the recommendation, or to critique the item once or several times...

  5. Exploring the Implementation of Steganography Protocols on Quantum Audio Signals

    Science.gov (United States)

    Chen, Kehan; Yan, Fei; Iliyasu, Abdullah M.; Zhao, Jianping

    2018-02-01

    Two quantum audio steganography (QAS) protocols are proposed, each of which manipulates or modifies the least significant qubit (LSQb) of the host quantum audio signal that is encoded as an FRQA (flexible representation of quantum audio) audio content. The first protocol (i.e. the conventional LSQb QAS protocol or simply the cLSQ stego protocol) is built on the exchanges between qubits encoding the quantum audio message and the LSQb of the amplitude information in the host quantum audio samples. In the second protocol, the embedding procedure to realize it implants information from a quantum audio message deep into the constraint-imposed most significant qubit (MSQb) of the host quantum audio samples, we refer to it as the pseudo MSQb QAS protocol or simply the pMSQ stego protocol. The cLSQ stego protocol is designed to guarantee high imperceptibility between the host quantum audio and its stego version, whereas the pMSQ stego protocol ensures that the resulting stego quantum audio signal is better immune to illicit tampering and copyright violations (a.k.a. robustness). Built on the circuit model of quantum computation, the circuit networks to execute the embedding and extraction algorithms of both QAS protocols are determined and simulation-based experiments are conducted to demonstrate their implementation. Outcomes attest that both protocols offer promising trade-offs in terms of imperceptibility and robustness.

  6. Le registrazioni audio dell’archivio Luigi Nono di Venezia

    Directory of Open Access Journals (Sweden)

    Luca Cossettini

    2009-11-01

    Full Text Available The audio recordings of the Luigi Nono Archive in Venice: guidelines for preservation and critical edition of audio documentsStudying audio recordings brings us back to ancient source verification problems that too often one thinks are overcome by the technical reproduction of sound. Au-dio signal is “fixed” on a specific carrier (tape, disc etc with a specific audio format (speed, number of tracks etc; the choice of support and format during the first “memorizing” process and the following copying processes is a subjective and, in case of copying, an interpretative operation conducted within a continuously evolv-ing audio technology. What we listen to today is the result of a transmission process that unavoidably transforms the original acoustic event and the documents that memorize it. Audio recording is no way a timeless and immutable fixing process. It is therefore necessary to study the transmission processes and to reconstruct the au-dio document tradition. The re-recording of the tapes of the Archivio Luigi Nono, conducted by the Audio Labs of the DAMS Musica of the University of Udine, of-fers clear examples of the technical and musicological interpretative problems one can find when he works with audio recordings.

  7. Exploring the Implementation of Steganography Protocols on Quantum Audio Signals

    Science.gov (United States)

    Chen, Kehan; Yan, Fei; Iliyasu, Abdullah M.; Zhao, Jianping

    2017-10-01

    Two quantum audio steganography (QAS) protocols are proposed, each of which manipulates or modifies the least significant qubit (LSQb) of the host quantum audio signal that is encoded as an FRQA (flexible representation of quantum audio) audio content. The first protocol (i.e. the conventional LSQb QAS protocol or simply the cLSQ stego protocol) is built on the exchanges between qubits encoding the quantum audio message and the LSQb of the amplitude information in the host quantum audio samples. In the second protocol, the embedding procedure to realize it implants information from a quantum audio message deep into the constraint-imposed most significant qubit (MSQb) of the host quantum audio samples, we refer to it as the pseudo MSQb QAS protocol or simply the pMSQ stego protocol. The cLSQ stego protocol is designed to guarantee high imperceptibility between the host quantum audio and its stego version, whereas the pMSQ stego protocol ensures that the resulting stego quantum audio signal is better immune to illicit tampering and copyright violations (a.k.a. robustness). Built on the circuit model of quantum computation, the circuit networks to execute the embedding and extraction algorithms of both QAS protocols are determined and simulation-based experiments are conducted to demonstrate their implementation. Outcomes attest that both protocols offer promising trade-offs in terms of imperceptibility and robustness.

  8. Elicitation of attributes for the evaluation of audio-on audio-interference

    DEFF Research Database (Denmark)

    Francombe, Jon; Mason, R.; Dewhirst, M.

    2014-01-01

    An experiment to determine the perceptual attributes of the experience of listening to a target audio program in the presence of an audio interferer was performed. The first stage was a free elicitation task in which a total of 572 phrases were produced. In the second stage, a consensus vocabulary...... procedure was used to reduce these phrases into a comprehensive set of attributes. Groups of experienced and inexperienced listeners determined nine and eight attributes, respectively. These attribute sets were combined by the listeners to produce a final set of 12 attributes: masking, calming, distraction...

  9. Tactile Earth and Space Science Materials for Students with Visual Impairments: Contours, Craters, Asteroids, and Features of Mars

    Science.gov (United States)

    Rule, Audrey C.

    2011-01-01

    New tactile curriculum materials for teaching Earth and planetary science lessons on rotation=revolution, silhouettes of objects from different views, contour maps, impact craters, asteroids, and topographic features of Mars to 11 elementary and middle school students with sight impairments at a week-long residential summer camp are presented…

  10. On Uncertainties of the Priestley-Taylor/LST-Fc Feature Space Method to Estimate Evapotranspiration: Case Study in an Arid/Semiarid Region in Northwest China

    Directory of Open Access Journals (Sweden)

    Zhansheng Li

    2014-12-01

    Full Text Available Accurate evapotranspiration (ET estimation is very crucial for water resource management, particularly for the arid and semi-arid region. The remote sensing-based Priestley-Taylor method (RS-PT method can estimate ET at regional scale, using the feature space of remotely sensed land surface temperature (LST and vegetation index (VI. This study evaluates the RS-PT feature space method over an arid and semi-arid region in northwest China using satellite data from the moderate-resolution space-borne sensor Advanced Along-Track Scanning Radiometer (AATSR, the observations from the high-resolution airborne sensor Wide-angle Infrared Dual-mode line/area Array Scanner (WiDAS and ground measurements of heat fluxes collected in summer 2008. The results show that the mean difference for latent heat flux (LE estimates resulting from different domain sizes is 69.5 W/m2. When using high-resolution images from airborne measurements, the dry boundary is strongly affected by the pixels of impervious surfaces, which lead to a mean difference of 15.36 W/m2 for LE estimates. In addition, the physically based Surface Energy Balance Index (SEBI model is used to analyze the accuracy of dry/wet boundaries in the RS-PT method. Compared with the SEBI-estimated relative evaporative fraction (Λr, the RS-PT method underestimated Λr by ~0.11. For the RS-PT method, the uncertainty in the determination of the dry/wet boundaries has a significant impact on the accuracy of the ET estimate, not only depending on the size of the area to build the feature space, but also on the land covers.

  11. Eye tracking analysis of minor details in films for audio description

    OpenAIRE

    Orero, Pilar; Vilaró, Anna

    2012-01-01

    This article focuses on the many instances when minute details found in feature films may have direct implications upon the development of both the visual and plot narratives. The main question we would like to ask examines whether very subtle details which may easily go unnoticed by the viewer should be audio described. To assess the visual consciousness of such minute details, a perception experiment was conducted using eye-tracking technology and questionnaires. Though the result is not co...

  12. Non Audio-Video gesture recognition system

    DEFF Research Database (Denmark)

    Craciunescu, Razvan; Mihovska, Albena Dimitrova; Kyriazakos, Sofoklis

    2016-01-01

    Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current research focus includes on the emotion...... recognition from the face and hand gesture recognition. Gesture recognition enables humans to communicate with the machine and interact naturally without any mechanical devices. This paper investigates the possibility to use non-audio/video sensors in order to design a low-cost gesture recognition device...

  13. Mixing audio concepts, practices and tools

    CERN Document Server

    Izhaki, Roey

    2013-01-01

    Your mix can make or break a record, and mixing is an essential catalyst for a record deal. Professional engineers with exceptional mixing skills can earn vast amounts of money and find that they are in demand by the biggest acts. To develop such skills, you need to master both the art and science of mixing. The new edition of this bestselling book offers all you need to know and put into practice in order to improve your mixes. Covering the entire process --from fundamental concepts to advanced techniques -- and offering a multitude of audio samples, tips and tricks, this boo

  14. Audio marketing v ČR

    OpenAIRE

    Timanov, Vladimir

    2015-01-01

    The aim of the work is processing and evaluation of the investment project. The project implies an establishment of the firm in Czech Republic. The branch of the entrepreneurship is sensory marketing or audio-visual marketing. The essence of this field of the marketing is encouragement of sales through the influence on emotional side of the client. Components of the work are market research, analysis of the competitors in this sphere, and the financial plan. As a result, the work will be stru...

  15. Predistortion of a Bidirectional Cuk Audio Amplifier

    DEFF Research Database (Denmark)

    Birch, Thomas Hagen; Nielsen, Dennis; Knott, Arnold

    2014-01-01

    using predistortion. This paper suggests linearizing a nonlinear bidirectional Cuk audio amplifier using an analog predistortion approach. A prototype power stage was built and results show that a voltage gain of up to 9 dB and reduction in THD from 6% down to 3% was obtainable using this approach.......Some non-linear amplifier topologies are capable of providing a larger voltage gain than one from a DC source, which could make them suitable for various applications. However, the non-linearities introduce a significant amount of harmonic distortion (THD). Some of this distortion could be reduced...

  16. Lost in semantic space: a multi-modal, non-verbal assessment of feature knowledge in semantic dementia

    OpenAIRE

    Garrard, P; Carroll, E

    2006-01-01

    A novel, non-verbal test of semantic feature knowledge is introduced, enabling subordinate knowledge of four important concept attributes--colour, sound, environmental context and motion--to be individually probed. This methodology provides more specific information than existing non-verbal semantic tests about the status of attribute knowledge relating to individual concept representations. Performance on this test of a group of 12 patients with semantic dementia (10 male, mean age: 64.4 yea...

  17. Key features of human episodic recollection in the cross-episode retrieval of rat hippocampus representations of space.

    OpenAIRE

    Eduard Kelemen; André A Fenton

    2013-01-01

    Neurophysiological studies focus on memory retrieval as a reproduction of what was experienced and have established that neural discharge is replayed to express memory. However, cognitive psychology has established that recollection is not a verbatim replay of stored information. Recollection is constructive, the product of memory retrieval cues, the information stored in memory, and the subject's state of mind. We discovered key features of constructive recollection embedded in the rat CA1 e...

  18. High-performance combination method of electric network frequency and phase for audio forgery detection in battery-powered devices.

    Science.gov (United States)

    Savari, Maryam; Abdul Wahab, Ainuddin Wahid; Anuar, Nor Badrul

    2016-09-01

    Audio forgery is any act of tampering, illegal copy and fake quality in the audio in a criminal way. In the last decade, there has been increasing attention to the audio forgery detection due to a significant increase in the number of forge in different type of audio. There are a number of methods for forgery detection, which electric network frequency (ENF) is one of the powerful methods in this area for forgery detection in terms of accuracy. In spite of suitable accuracy of ENF in a majority of plug-in powered devices, the weak accuracy of ENF in audio forgery detection for battery-powered devices, especially in laptop and mobile phone, can be consider as one of the main obstacles of the ENF. To solve the ENF problem in terms of accuracy in battery-powered devices, a combination method of ENF and phase feature is proposed. From experiment conducted, ENF alone give 50% and 60% accuracy for forgery detection in mobile phone and laptop respectively, while the proposed method shows 88% and 92% accuracy respectively, for forgery detection in battery-powered devices. The results lead to higher accuracy for forgery detection with the combination of ENF and phase feature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Taenia crassiceps injection into the subarachnoid space of rats simulates radiological and morphological features of racemose neurocysticercosis.

    Science.gov (United States)

    Hamamoto Filho, Pedro Tadao; Fabro, Alexandre Todorovic; Rodrigues, Marianna Vaz; Bazan, Rodrigo; Vulcano, Luiz Carlos; Biondi, Germano Francisco; Zanini, Marco Antônio

    2017-01-01

    Neurocysticercosis is a major public health concern. Although its eradication appears feasible, the disease remains endemic in developing countries and has emerged again in Europe and in the USA. Basic studies on neurocysticercosis are needed to better understand the pathophysiologic mechanisms and, consequently, to improve treatment perspectives. Much has been published on experimental parenchymal neurocysticercosis, but there are no experimental models of racemose neurocysticercosis. Cysts of Taenia crassiceps were injected into the subarachnoid space of 11 rats. After 4 months, magnetic resonance imaging (MRI) was performed to verify the occurrence of ventricular dilatation and the distribution of cysts in the cerebrospinal fluid compartments. The histologic assessment was done focusing on changes in the ependyma, choroid plexus, and brain parenchyma. MRI and histologic assessment confirmed the findings similar to those seen in human racemose neurocysticercosis including enlargement of the basal cisterns, hydrocephalus, and inflammatory infiltration through the ependyma and choroid plexus into cerebrospinal fluid spaces. We developed a simple model of racemose neurocysticercosis by injecting cysts of T. crassiceps into the subarachnoid space of rats. This model can help understand the pathophysiologic mechanisms of the disease.

  20. Real-Time Audio Translation Module Between Iax And Rsw

    OpenAIRE

    Hadeel Saleh Haj Aliwi; Putra. Sumari

    2014-01-01

    At the last few years, multimedia communication has been developed and improved rapidly in order to enable users to communicate between each other over the internet. Generally, multimedia communication consists of audio and video communication. However, this research concentrates on audio conferencing only. The audio translation between protocols is a very critical issue, because it solves the communication problems between any two protocols. So, it enables people around the world to talk wit...

  1. Use of Effective Audio in E-learning Courseware

    OpenAIRE

    Ray, Kisor

    2015-01-01

    E-Learning uses electronic media, information & communication technologies to provide education to the masses. E-learning deliver hypertext, text, audio, images, animation and videos using desktop standalone computer, local area network based intranet and internet based contents. While producing an e-learning content or course-ware, a major decision making factor is whether to use audio for the benefit of the end users. Generally, three types of audio can be used in e-learning: narration, mus...

  2. Performance Improvement of Threshold based Audio Steganography using Parallel Computation

    OpenAIRE

    Muhammad Shoaib; Zakir Khan; Danish Shehzad; Tamer Dag; Arif Iqbal Umar; Noor Ul Amin

    2016-01-01

    Audio steganography is used to hide secret information inside audio signal for the secure and reliable transfer of information. Various steganography techniques have been proposed and implemented to ensure adequate security level. The existing techniques either focus on the payload or security, but none of them has ensured both security and payload at same time. Data Dependency in existing solution was reluctant for the execution of steganography mechanism serially. The audio data and secret ...

  3. Instructional Audio Guidelines: Four Design Principles to Consider for Every Instructional Audio Design Effort

    Science.gov (United States)

    Carter, Curtis W.

    2012-01-01

    This article contends that instructional designers and developers should attend to four particular design principles when creating instructional audio. Support for this view is presented by referencing the limited research that has been done in this area, and by indicating how and why each of the four principles is important to the design process.…

  4. A Model of Distraction in an Audio-on-Audio Interference Situation with Music Program Material

    DEFF Research Database (Denmark)

    Francombe, J.; Mason, R.; Dewhirst, M.

    2015-01-01

    There are many situations in which multiple audio programs are replayed over loudspeakers in the same acoustic environment, allowing listeners to focus on their desired target program. Where this situation is deliberately created and the different program items are centrally controlled, each list...

  5. Quantization and psychoacoustic model in audio coding in advanced audio coding

    Science.gov (United States)

    Brzuchalski, Grzegorz

    2011-10-01

    This paper presents complete optimized architecture of Advanced Audio Coder quantization with Huffman coding. After that psychoacoustic model theory is presented and few algorithms described: standard Two Loop Search, its modifications, Genetic, Just Noticeable Level Difference, Trellis-Based and its modification: Cascaded Trellis-Based Algorithm.

  6. Objective child behavior measurement with naturalistic daylong audio recording and its application to autism identification.

    Science.gov (United States)

    Xu, Dongxin; Gilkerson, Jill; Richards, Jeffrey A

    2012-01-01

    Child behavior in the natural environment is a subject that is relevant for many areas of social science and bio-behavioral research. However, its measurement is currently based mainly on subjective approaches such as parent questionnaires or clinical observation. This study demonstrates an objective and unobtrusive child vocal behavior measurement and monitoring approach using daylong audio recordings of children in the natural home environment. Our previous research has shown significant performance in childhood autism identification. However, there remains the question of why it works. In the previous study, the focus was more on the overall performance and data-driven modeling without regard to the meaning of underlying features. Even if a high risk of autism is predicted, specific information about child behavior that could contribute to the automated categorization was not further explored. This study attempts to clarify this issue by exploring the details of underlying features and uncovering additional behavioral information buried within the audio streams. It was found that much child vocal behavior can be measured automatically by applying signal processing and pattern recognition technologies to daylong audio recordings. By combining many such features, the model achieves an overall autism identification accuracy of 94% (N=226). Similar to many emerging non-invasive and telemonitoring technologies in health care, this approach is believed to have great potential in child development research, clinical practice and parenting.

  7. Newnes audio and Hi-Fi engineer's pocket book

    CERN Document Server

    Capel, Vivian

    2013-01-01

    Newnes Audio and Hi-Fi Engineer's Pocket Book, Second Edition provides concise discussion of several audio topics. The book is comprised of 10 chapters that cover different audio equipment. The coverage of the text includes microphones, gramophones, compact discs, and tape recorders. The book also covers high-quality radio, amplifiers, and loudspeakers. The book then reviews the concepts of sound and acoustics, and presents some facts and formulas relevant to audio. The text will be useful to sound engineers and other professionals whose work involves sound systems.

  8. Audio scene segmentation for video with generic content

    Science.gov (United States)

    Niu, Feng; Goela, Naveen; Divakaran, Ajay; Abdel-Mottaleb, Mohamed

    2008-01-01

    In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.

  9. Semantic Labeling of Nonspeech Audio Clips

    Directory of Open Access Journals (Sweden)

    Xiaojuan Ma

    2010-01-01

    Full Text Available Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the communicative power of auditory nonlinguistic representations. We created a collection of short nonlinguistic auditory clips encoding familiar human activities, objects, animals, natural phenomena, machinery, and social scenes. We presented these sounds to a broad spectrum of anonymous human workers using Amazon Mechanical Turk and collected verbal sound labels. We analyzed the human labels in terms of their lexical and semantic properties to ascertain that the audio clips do evoke the information suggested by their pre-defined captions. We then measured the agreement with the semantically compatible labels for each sound clip. Finally, we examined which kinds of entities and events, when captured by nonlinguistic acoustic clips, appear to be well-suited to elicit information for communication, and which ones are less discriminable. Our work is set against the broader goal of creating resources that facilitate communication for people with some types of language loss. Furthermore, our data should prove useful for future research in machine analysis/synthesis of audio, such as computational auditory scene analysis, and annotating/querying large collections of sound effects.

  10. Performance Characterization of Loctite (Registered Trademark) 242 and 271 Liquid Locking Compounds (LLCs) as a Secondary Locking Feature for International Space Station (ISS) Fasteners

    Science.gov (United States)

    Dube, Michael J.; Gamwell, Wayne R.

    2011-01-01

    Several International Space Station (ISS) hardware components use Loctite (and other polymer based liquid locking compounds (LLCs)) as a means of meeting the secondary (redundant) locking feature requirement for fasteners. The primary locking method is the fastener preload, with the application of the Loctite compound which when cured is intended to resist preload reduction. The reliability of these compounds has been questioned due to a number of failures during ground testing. The ISS Program Manager requested the NASA Engineering and Safety Center (NESC) to characterize and quantify sensitivities of Loctite being used as a secondary locking feature. The findings and recommendations provided in this investigation apply to the anaerobic LLCs Loctite 242 and 271. No other anaerobic LLCs were evaluated for this investigation. This document contains the findings and recommendations of the NESC investigation

  11. Person identification for mobile robot using audio-visual modality

    Science.gov (United States)

    Kim, Young-Ouk; Chin, Sehoon; Lee, Jihoon; Paik, Joonki

    2005-10-01

    Recently, we experienced significant advancement in intelligent service robots. The remarkable features of an intelligent robot include tracking and identification of person using biometric features. The human-robot interaction is very important because it is one of the final goals of an intelligent service robot. Many researches are concentrating in two fields. One is self navigation of a mobile robot and the other is human-robot interaction in natural environment. In this paper we will present an effective person identification method for HRI (Human Robot Interaction) using two different types of expert systems. However, most of mobile robots run under uncontrolled and complicated environment. It means that face and speech information can't be guaranteed under varying conditions, such as lighting, noisy sound, orientation of a robot. According to a value of illumination and signal to noise ratio around mobile a robot, our proposed fuzzy rule make a reasonable person identification result. Two embedded HMM (Hidden Marhov Model) are used for each visual and audio modality to identify person. The performance of our proposed system and experimental results are compared with single modality identification and simply mixed method of two modality.

  12. Spaces

    Directory of Open Access Journals (Sweden)

    Maziar Nekovee

    2010-01-01

    Full Text Available Cognitive radio is being intensively researched as the enabling technology for license-exempt access to the so-called TV White Spaces (TVWS, large portions of spectrum in the UHF/VHF bands which become available on a geographical basis after digital switchover. Both in the US, and more recently, in the UK the regulators have given conditional endorsement to this new mode of access. This paper reviews the state-of-the-art in technology, regulation, and standardisation of cognitive access to TVWS. It examines the spectrum opportunity and commercial use cases associated with this form of secondary access.

  13. Let Their Voices Be Heard! Building a Multicultural Audio Collection.

    Science.gov (United States)

    Tucker, Judith Cook

    1992-01-01

    Discusses building a multicultural audio collection for a library. Gives some guidelines about selecting materials that really represent different cultures. Audio materials that are considered fall roughly into the categories of children's stories, didactic materials, oral histories, poetry and folktales, and music. The goal is an authentic…

  14. DOA Estimation of Audio Sources in Reverberant Environments

    NARCIS (Netherlands)

    Jensen, Jesper Rindom; Nielsen, J.K.; Heusdens, R.; Christensen, M.G.; Dong, Min; Zheng, Thomas Fang

    Reverberation is well-known to have a detrimental impact on many localization methods for audio sources. We address this problem by imposing a model for the early reflections as well as a model for the audio source itself. Using these models, we propose two iterative localization methods that

  15. Multilevel inverter based class D audio amplifier for capacitive transducers

    DEFF Research Database (Denmark)

    Nielsen, Dennis; Knott, Arnold; Andersen, Michael A. E.

    2014-01-01

    The reduced semiconductor voltage stress makes the multilevel inverters especially interesting, when driving capacitive transducers for audio applications. A ± 300 V flying capacitor class D audio amplifier driving a 100 nF load in the midrange region of 0.1-3.5 kHz with Total Harmonic Distortion...

  16. Recent Audio-Visual Materials on the Soviet Union.

    Science.gov (United States)

    Clarke, Edith Campbell

    1981-01-01

    Identifies and describes audio-visual materials (films, filmstrips, and audio cassette tapes) about the Soviet Union which have been produced since 1977. For each entry, information is presented on title, time required, date of release, cost (purchase and rental), and an abstract. (DB)

  17. Selected Audio-Visual Materials for Consumer Education. [New Version.

    Science.gov (United States)

    Johnston, William L.

    Ninety-two films, filmstrips, multi-media kits, slides, and audio cassettes, produced between 1964 and 1974, are listed in this selective annotated bibliography on consumer education. The major portion of the bibliography is devoted to films and filmstrips. The main topics of the audio-visual materials include purchasing, advertising, money…

  18. Content Discovery from Composite Audio : An unsupervised approach

    NARCIS (Netherlands)

    Lu, L.

    2009-01-01

    In this thesis, we developed and assessed a novel robust and unsupervised framework for semantic inference from composite audio signals. We focused on the problem of detecting audio scenes and grouping them into meaningful clusters. Our approach addressed all major steps in a general process of

  19. Using Audio Books to Improve Reading and Academic Performance

    Science.gov (United States)

    Montgomery, Joel R.

    2009-01-01

    This article highlights significant research about what below grade-level reading means in middle school classrooms and suggests a tested approach to improve reading comprehension levels significantly by using audio books. The use of these audio books can improve reading and academic performance for both English language learners (ELLs) and for…

  20. Can audio recording of outpatient consultations improve patient outcome?

    DEFF Research Database (Denmark)

    Wolderslund, Maiken; Kofoed, Poul-Erik; Axboe, Mette

    different departments: Orthopedics, Urology, Internal Medicine and Pediatrics. A total of 5,460 patients will be included from the outpatient clinics. All patients randomized to an intervention group are offered audio recording of their consultation. An Interactive Voice Response platform enables an audio...

  1. Beyond Podcasting: Creative Approaches to Designing Educational Audio

    Science.gov (United States)

    Middleton, Andrew

    2009-01-01

    This paper discusses a university-wide pilot designed to encourage academics to creatively explore learner-centred applications for digital audio. Participation in the pilot was diverse in terms of technical competence, confidence and contextual requirements and there was little prior experience of working with digital audio. Many innovative…

  2. Automated Speech and Audio Analysis for Semantic Access to Multimedia

    NARCIS (Netherlands)

    Jong, F.M.G. de; Ordelman, R.; Huijbregts, M.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  3. Introducing VAST: a Video-Audio Streaming Tester

    Directory of Open Access Journals (Sweden)

    Adrian Sterca

    2010-09-01

    Full Text Available We present a testing package aimed at video and audio streaming across best-effort networks like the Internet. VAST is intended to be a testing framework for protocols transporting audio-video streams across IP networks. It offers the simplicity and predictability of deterministic simulators like ns-2 combined with the testing power of real-world experiments.

  4. Dynamically-Loaded Hardware Libraries (HLL) Technology for Audio Applications

    DEFF Research Database (Denmark)

    Esposito, A.; Lomuscio, A.; Nunzio, L. Di

    2016-01-01

    , we load on-the-fly the specific processor in the FPGA, and we transfer the execution from the CPU to the FPGA-based accelerator. The proposed architecture provides excellent flexibility with respect to the different audio applications implemented, high quality audio, and an energy efficient solution....

  5. Automated speech and audio analysis for semantic access to multimedia

    NARCIS (Netherlands)

    de Jong, Franciska M.G.; Ordelman, Roeland J.F.; Huijbregts, M.A.H.; Avrithis, Y.; Kompatsiaris, Y.; Staab, S.; O' Connor, N.E.

    2006-01-01

    The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to

  6. Parametric Audio Based Decoder and Music Synthesizer for Mobile Applications

    NARCIS (Netherlands)

    Oomen, A.W.J.; Szczerba, M.Z.; Therssen, D.

    2011-01-01

    This paper reviews parametric audio coders and discusses novel technologies introduced in a low-complexity, low-power consumption audiodecoder and music synthesizer platform developed by the authors. Thedecoder uses parametric coding scheme based on the MPEG-4 Parametric Audio standard. In order to

  7. Decision-Level Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, Mannes; Truong, Khiet Phuong; Poppe, Ronald Walter; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is

  8. Decision-level fusion for audio-visual laughter detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is

  9. Effect of Audio vs. Video on Aural Discrimination of Vowels

    Science.gov (United States)

    McCrocklin, Shannon

    2012-01-01

    Despite the growing use of media in the classroom, the effects of using of audio versus video in pronunciation teaching has been largely ignored. To analyze the impact of the use of audio or video training on aural discrimination of vowels, 61 participants (all students at a large American university) took a pre-test followed by two training…

  10. Four-quadrant flyback converter for direct audio power amplification

    DEFF Research Database (Denmark)

    Ljusev, Petar; Andersen, Michael Andreas E.

    2005-01-01

    This paper presents a bidirectional, four-quadrant flyback converter for use in direct audio power amplification. When compared to the standard Class-D switching audio power amplifier with a separate power supply, the proposed four-quadrant flyback converter provides simple solution with better...

  11. Metode Parity Coding Versus Metode Spread Spectrum Pada Audio Steganography

    OpenAIRE

    Saragih, Riko Arlando

    2006-01-01

    Steganography adalah suatu ilmu yang mempelajari cara menyembunyikan informasi rahasia di dalam sebuah pesan. Audio steganography merupakan perkembangan ilmu dari steganography. Audio steganography mempunyai kesulitan yang lebih dibandingkan pada steganography pada gambar atau pada video karena pendengaran manusia lebih peka daripada penglihatan manusia, sehingga pada proses penyisipan data harus dibuat sebaik mungkin agar suara yang telah disisipkan data terdengar sama dengan suara sebelum d...

  12. An Audio Stream Redirector for the Ethernet Speaker

    Science.gov (United States)

    Mandrekar, Ishan; Prevelakis, Vassilis; Turner, David Michael

    2004-01-01

    The authors have developed the "Ethernet Speaker" (ES), a network-enabled single board computer embedded into a conventional audio speaker. Audio streams are transmitted in the local area network using multicast packets, and the ES can select any one of them and play it back. A key requirement for the ES is that it must be capable of playing any…

  13. TVAR modeling of EEG to detect audio distraction during simulated driving

    Science.gov (United States)

    Dahal, Nabaraj; (Nanda Nandagopal, D.; Cocks, Bernadine; Vijayalakshmi, Ramasamy; Dasari, Naga; Gaertner, Paul

    2014-06-01

    Objective. The objective of our current study was to look for the EEG correlates that can reveal the engaged state of the brain while undertaking cognitive tasks. Specifically, we aimed to identify EEG features that could detect audio distraction during simulated driving. Approach. Time varying autoregressive (TVAR) analysis using Kalman smoother was carried out on short time epochs of EEG data collected from participants as they undertook two simulated driving tasks. TVAR coefficients were then used to construct all pole model enabling the identification of EEG features that could differentiate normal driving from audio distracted driving. Main results. Pole analysis of the TVAR model led to the visualization of event related synchronization/desynchronization (ERS/ERD) patterns in the form of pole displacements in pole plots of the temporal EEG channels in the z plane enabling the differentiation of the two driving conditions. ERS in the EEG data has been demonstrated during audio distraction as an associated phenomenon. Significance. Visualizing the ERD/ERS phenomenon in terms of pole displacement is a novel approach. Although ERS/ERD has previously been demonstrated as reliable when applied to motor related tasks, it is believed to be the first time that it has been applied to investigate human cognitive phenomena such as attention and distraction. Results confirmed that distracted/non-distracted driving states can be identified using this approach supporting its applicability to cognition research.

  14. Study of growth and development features of ten ground cover plants in Kish Island green space in warm season

    Directory of Open Access Journals (Sweden)

    S. Shooshtarian

    2016-05-01

    Full Text Available Having special ecological condition, Kish Island has a restricted range of native species of ornamental plants. Expansion of urban green space in this Island is great of importance due to its outstanding touristy position in the South of Iran. The purpose of this study was to investigate the growth and development of groundcover plants planted in four different regions of Kish Island and to recommend the most suitable and adaptable species for each region. Ten groundcover species included Festuca ovina L., Glaucium flavum Crantz., Frankenia thymifolia Desf., Sedum spurium Bieb., Sedum acre L., .Potentilla verna L., Carpobrotus acinaciformis (L. L. Bolus., Achillea millefolium L., Alternanthera dentata Moench. and Lampranthus spectabilis Haw. Evaluation of growth and development had been made by measurement of morphological characteristics such as height, covering area, leaf number and area, dry and fresh total weights and visual scoring. Physiological traits included proline and chlorophyll contents evaluated. This study was designed in factorial layout based on completely randomized blocks design with six replicates. Results showed that in terms of indices such as covering area, visual quality, height, total weight, and chlorophyll content, Pavioon and Sadaf plants had the most and the worst performances, respectively in comparison to other regions’ plants. Based on evaluated characteristics, C. acinaciformis, L. spectabilis and F. thymifolia had the most expansion and growth in all quadruplet regions and are recommend for planting in Kish Island and similar climates.

  15. AUDIO CRYPTANALYSIS- AN APPLICATION OF SYMMETRIC KEY CRYPTOGRAPHY AND AUDIO STEGANOGRAPHY

    Directory of Open Access Journals (Sweden)

    Smita Paira

    2016-09-01

    Full Text Available In the recent trend of network and technology, “Cryptography” and “Steganography” have emerged out as the essential elements of providing network security. Although Cryptography plays a major role in the fabrication and modification of the secret message into an encrypted version yet it has certain drawbacks. Steganography is the art that meets one of the basic limitations of Cryptography. In this paper, a new algorithm has been proposed based on both Symmetric Key Cryptography and Audio Steganography. The combination of a randomly generated Symmetric Key along with LSB technique of Audio Steganography sends a secret message unrecognizable through an insecure medium. The Stego File generated is almost lossless giving a 100 percent recovery of the original message. This paper also presents a detailed experimental analysis of the algorithm with a brief comparison with other existing algorithms and a future scope. The experimental verification and security issues are promising.

  16. Class D audio amplifiers for high voltage capacitive transducers

    DEFF Research Database (Denmark)

    Nielsen, Dennis

    Audio reproduction systems contains two key components, the amplifier and the loudspeaker. In the last 20 – 30 years the technology of audio amplifiers have performed a fundamental shift of paradigm. Class D audio amplifiers have replaced the linear amplifiers, suffering from the well-known issues...... of high volume, weight, and cost. High efficient class D amplifiers are now widely available offering power densities, that their linear counterparts can not match. Unlike the technology of audio amplifiers, the loudspeaker is still based on the traditional electrodynamic transducer invented by C.W. Rice......-of-the-art for class D audio amplifiers driving the electrodynamic transducer is presented. Chapter 3 gives an introduction to the DEAP transducer as a load in loudspeaker systems. The main purpose being to established the frequency response of the DEAP input impedance, but also investigate the large signal...

  17. Audio Haptic Videogaming for Developing Wayfinding Skills in Learners Who are Blind

    Science.gov (United States)

    Sánchez, Jaime; de Borba Campos, Marcia; Espinoza, Matías; Merabet, Lotfi B.

    2014-01-01

    Interactive digital technologies are currently being developed as a novel tool for education and skill development. Audiopolis is an audio and haptic based videogame designed for developing orientation and mobility (O&M) skills in people who are blind. We have evaluated the cognitive impact of videogame play on O&M skills by assessing performance on a series of behavioral tasks carried out in both indoor and outdoor virtual spaces. Our results demonstrate that the use of Audiopolis had a positive impact on the development and use of O&M skills in school-aged learners who are blind. The impact of audio and haptic information on learning is also discussed. PMID:25485312

  18. Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

    NARCIS (Netherlands)

    Pocta, P.; Beerends, J.G.

    2015-01-01

    This paper investigates the impact of different audio codecs typically deployed in current digital audio broadcasting (DAB) systems and web-casting applications, which represent a main source of quality impairment in these systems and applications, on the quality perceived by the end user. Both

  19. Horatio Audio-Describes Shakespeare's "Hamlet": Blind and Low-Vision Theatre-Goers Evaluate an Unconventional Audio Description Strategy

    Science.gov (United States)

    Udo, J. P.; Acevedo, B.; Fels, D. I.

    2010-01-01

    Audio description (AD) has been introduced as one solution for providing people who are blind or have low vision with access to live theatre, film and television content. However, there is little research to inform the process, user preferences and presentation style. We present a study of a single live audio-described performance of Hart House…

  20. Opening the eyes about dictatorship: audio description as a resource of brazilian memory´s maintenance

    Directory of Open Access Journals (Sweden)

    Lucinea Marcelino Villela

    2016-05-01

    Full Text Available In 2014 the Brazilian society had the opportunity to debate the harsh period of its dictatorship after 50 years since the 1964 Military Coup. The main goal of this paper is the presentation of an Audio Description project with remarkable photos and videos from Brazilian Dictatorship and its reflections on Brazilian society. The project was elaborated by the research group “Accessible Media and Audiovisual Translation”, whose main focus is provide accessibility (audio description and subtitles for different audiovisual products. We produced a photo documentary with an overview of some important images and with a script focusing on important information about Brazilian politics from 1964 up to 1989. Many steps were followed during the whole process: selection of photos, historical contextualization, script, narration, final editing of the video. In order to produce the audio description scripts of photos and images selected, we have followed some assumptions about audio description. According to Matamala (2006:330 various competences are required to audio describers such as: “the ability to undertake intersemiotic translations (turning images into words, the ability to summarise information in order to adapt the text to the limited space available, keeping the original meaning, by means of rewording and by using synonyms; the ability to critically select the most relevant information”.

  1. Opening the eyes about dictatorship: audio description as a resource of brazilian memory´s maintenance

    Directory of Open Access Journals (Sweden)

    Lucinea Marcelino Villela

    2016-08-01

    Full Text Available In 2014 the Brazilian society had the opportunity to debate the harsh period of its dictatorship after 50 years since the 1964 Military Coup. The main goal of this paper is the presentation of an Audio Description project with remarkable photos and videos from Brazilian Dictatorship and its reflections on Brazilian society. The project was elaborated by the research group “Accessible Media and Audiovisual Translation”, whose main focus is provide accessibility (audio description and subtitles for different audiovisual products. We produced a photo documentary with an overview of some important images and with a script focusing on important information about Brazilian politics from 1964 up to 1989. Many steps were followed during the whole process: selection of photos, historical contextualization, script, narration, final editing of the video. In order to produce the audio description scripts of photos and images selected, we have followed some assumptions about audio description. According to Matamala (2006:330 various competences are required to audio describers such as: “the ability to undertake intersemiotic translations (turning images into words, the ability to summarise information in order to adapt the text to the limited space available, keeping the original meaning, by means of rewording and by using synonyms; the ability to critically select the most relevant information”.

  2. GK Per (Nova Persei 1901): HUBBLE SPACE TELESCOPE IMAGERY AND SPECTROSCOPY OF THE EJECTA, AND FIRST SPECTRUM OF THE JET-LIKE FEATURE

    Energy Technology Data Exchange (ETDEWEB)

    Shara, Michael M.; Zurek, David; Mizusawa, Trisha [Department of Astrophysics, American Museum of Natural History, Central Park West and 79th street, New York, NY 10024-5192 (United States); De Marco, Orsola [Department of Physics, Macquarie University, Sydney (Australia); Williams, Robert; Livio, Mario [Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218 (United States)

    2012-06-15

    We have imaged the ejecta of GK Persei (Nova Persei 1901 A.D.) with the Hubble Space Telescope (HST), whose 0.1 arcsec resolution reveals hundreds of cometary-like structures with long axes aligned toward GK Per. One or both ends of the structures often show a brightness enhancement relative to the structures' middle sections, but there is no simple regularity to their morphologies (in contrast with, for example, the Helix nebula). Some of structures' morphologies suggest the presence of slow-moving or stationary material with which the ejecta is colliding, while others suggest shaping from a wind emanating from GK Per itself. The most detailed expansion map of any classical nova's ejecta was created by comparing HST images taken in successive years. Wide Field and Planetary Camera 2 narrowband images and Space Telescope Imaging Spectrograph spectra demonstrate that the physical conditions in this nova's ejecta vary strongly on spatial scales much smaller than those of the ejecta. Directly measuring accurate densities and compositions, and hence masses of this and other nova shells, will demand data at least as resolved spatially as those presented here. The filling factor of the ejecta is 1% or less, and the nova ejecta mass must be less than 10{sup -4} M{sub Sun }. A modest fraction of the emission nebulosities vary in brightness by up to a factor of two on timescales of one year. Finally, we present the deepest images yet obtained of a jet-like feature outside the main body of GK Per nebulosity, and the first spectrum of that feature. Dominated by strong, narrow emission lines of [N II], [O II], [O III], and [S II], this feature is probably a shock due to ejected material running into stationary interstellar matter, slowly moving ejecta from a previous nova episode, or circumbinary matter present before 1901. An upper limit to the mass of the jet is of order a few times 10{sup -6} M{sub Sun }. If the jet mass is close to this limit then the

  3. How we give personalised audio feedback after summative OSCEs.

    Science.gov (United States)

    Harrison, Christopher J; Molyneux, Adrian J; Blackwell, Sara; Wass, Valerie J

    2015-04-01

    Students often receive little feedback after summative objective structured clinical examinations (OSCEs) to enable them to improve their performance. Electronic audio feedback has shown promise in other educational areas. We investigated the feasibility of electronic audio feedback in OSCEs. An electronic OSCE system was designed, comprising (1) an application for iPads allowing examiners to mark in the key consultation skill domains, provide "tick-box" feedback identifying strengths and difficulties, and record voice feedback; (2) a feedback website giving students the opportunity to view/listen in multiple ways to the feedback. Acceptability of the audio feedback was investigated, using focus groups with students and questionnaires with both examiners and students. 87 (95%) students accessed the examiners' audio comments; 83 (90%) found the comments useful and 63 (68%) reported changing the way they perform a skill as a result of the audio feedback. They valued its highly personalised, relevant nature and found it much more useful than written feedback. Eighty-nine per cent of examiners gave audio feedback to all students on their stations. Although many found the method easy, lack of time was a factor. Electronic audio feedback provides timely, personalised feedback to students after a summative OSCE provided enough time is allocated to the process.

  4. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  5. The effect of reverberation on personal audio devices.

    Science.gov (United States)

    Simón-Gálvez, Marcos F; Elliott, Stephen J; Cheer, Jordan

    2014-05-01

    Personal audio refers to the creation of a listening zone within which a person, or a group of people, hears a given sound program, without being annoyed by other sound programs being reproduced in the same space. Generally, these different sound zones are created by arrays of loudspeakers. Although these devices have the capacity to achieve different sound zones in an anechoic environment, they are ultimately used in normal rooms, which are reverberant environments. At high frequencies, reflections from the room surfaces create a diffuse pressure component which is uniform throughout the room volume and thus decreases the directional characteristics of the device. This paper shows how the reverberant performance of an array can be modeled, knowing the anechoic performance of the radiator and the acoustic characteristics of the room. A formulation is presented whose results are compared to practical measurements in reverberant environments. Due to reflections from the room surfaces, pressure variations are introduced in the transfer responses of the array. This aspect is assessed by means of simulations where random noise is added to create uncertainties, and by performing measurements in a real environment. These results show how the robustness of an array is increased when it is designed for use in a reverberant environment.

  6. Audio Effects Based on Biorthogonal Time-Varying Frequency Warping

    Directory of Open Access Journals (Sweden)

    Sergio Cavaliere

    2001-03-01

    Full Text Available We illustrate the mathematical background and musical use of a class of audio effects based on frequency warping. These effects alter the frequency content of a signal via spectral mapping. They can be implemented in dispersive tapped delay lines based on a chain of all-pass filters. In a homogeneous line with first-order all-pass sections, the signal formed by the output samples at a given time is related to the input via the Laguerre transform. However, most musical signals require a time-varying frequency modification in order to be properly processed. Vibrato in musical instruments or voice intonation in the case of vocal sounds may be modeled as small and slow pitch variations. Simulation of these effects requires techniques for time-varying pitch and/or brightness modification that are very useful for sound processing. The basis for time-varying frequency warping is a time-varying version of the Laguerre transformation. The corresponding implementation structure is obtained as a dispersive tapped delay line, where each of the frequency dependent delay element has its own phase response. Thus, time-varying warping results in a space-varying, inhomogeneous, propagation structure. We show that time-varying frequency warping is associated to an expansion over biorthogonal sets generalizing the discrete Laguerre basis. Slow time-varying characteristics lead to slowly varying parameter sequences. The corresponding sound transformation does not suffer from discontinuities typical of delay lines based on unit delays.

  7. Current-Driven Switch-Mode Audio Power Amplifiers

    DEFF Research Database (Denmark)

    Knott, Arnold; Buhl, Niels Christian; Andersen, Michael A. E.

    2012-01-01

    The conversion of electrical energy into sound waves by electromechanical transducers is proportional to the current through the coil of the transducer. However virtually all audio power amplifiers provide a controlled voltage through the interface to the transducer. This paper is presenting...... a switch-mode audio power amplifier not only providing controlled current but also being supplied by current. This results in an output filter size reduction by a factor of 6. The implemented prototype shows decent audio performance with THD + N below 0.1 %....

  8. Debugging of Class-D Audio Power Amplifiers

    DEFF Research Database (Denmark)

    Crone, Lasse; Pedersen, Jeppe Arnsdorf; Mønster, Jakob Døllner

    2012-01-01

    Determining and optimizing the performance of a Class-D audio power amplier can be very dicult without knowledge of the use of audio performance measuring equipment and of how the various noise and distortion sources in uence the audio performance. This paper gives an introduction on how to measure...... the performance of the amplier and how to nd the noise and distortion sources and suggests ways to remove them. Throughout the paper measurements of a test amplier are presented along with the relevant theory....

  9. Switching-mode Audio Power Amplifiers with Direct Energy Conversion

    DEFF Research Database (Denmark)

    Ljusev, Petar; Andersen, Michael Andreas E.

    2005-01-01

    This paper presents a new class of switching-mode audio power amplifiers, which are capable of direct energy conversion from the AC mains to the audio output. They represent an ultimate integration of a switching-mode power supply and a Class D audio power amplifier, where the intermediate DC bus...... has been replaced with a high frequency AC link. When compared to the conventional Class D amplifiers with a separate DC power supply, the proposed single conversion stage amplifier provides simple and compact solution with better efficiency and higher level of integration, leading to reduced...

  10. A review of lossless audio compression standards and algorithms

    Science.gov (United States)

    Muin, Fathiah Abdul; Gunawan, Teddy Surya; Kartiwi, Mira; Elsheikh, Elsheikh M. A.

    2017-09-01

    Over the years, lossless audio compression has gained popularity as researchers and businesses has become more aware of the need for better quality and higher storage demand. This paper will analyse various lossless audio coding algorithm and standards that are used and available in the market focusing on Linear Predictive Coding (LPC) specifically due to its popularity and robustness in audio compression, nevertheless other prediction methods are compared to verify this. Advanced representation of LPC such as LSP decomposition techniques are also discussed within this paper.

  11. Musical examination to bridge audio data and sheet music

    Science.gov (United States)

    Pan, Xunyu; Cross, Timothy J.; Xiao, Liangliang; Hei, Xiali

    2015-03-01

    The digitalization of audio is commonly implemented for the purpose of convenient storage and transmission of music and songs in today's digital age. Analyzing digital audio for an insightful look at a specific musical characteristic, however, can be quite challenging for various types of applications. Many existing musical analysis techniques can examine a particular piece of audio data. For example, the frequency of digital sound can be easily read and identified at a specific section in an audio file. Based on this information, we could determine the musical note being played at that instant, but what if you want to see a list of all the notes played in a song? While most existing methods help to provide information about a single piece of the audio data at a time, few of them can analyze the available audio file on a larger scale. The research conducted in this work considers how to further utilize the examination of audio data by storing more information from the original audio file. In practice, we develop a novel musical analysis system Musicians Aid to process musical representation and examination of audio data. Musicians Aid solves the previous problem by storing and analyzing the audio information as it reads it rather than tossing it aside. The system can provide professional musicians with an insightful look at the music they created and advance their understanding of their work. Amateur musicians could also benefit from using it solely for the purpose of obtaining feedback about a song they were attempting to play. By comparing our system's interpretation of traditional sheet music with their own playing, a musician could ensure what they played was correct. More specifically, the system could show them exactly where they went wrong and how to adjust their mistakes. In addition, the application could be extended over the Internet to allow users to play music with one another and then review the audio data they produced. This would be particularly

  12. Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

    DEFF Research Database (Denmark)

    Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

    2014-01-01

    Due to increased computational power, reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between a HRTF enhanced audio system (3D...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations....

  13. Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

    DEFF Research Database (Denmark)

    Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

    2014-01-01

    Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations....

  14. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  15. Mixed-Signal Architectures for High-Efficiency and Low-Distortion Digital Audio Processing and Power Amplification

    Directory of Open Access Journals (Sweden)

    Pierangelo Terreni

    2010-01-01

    Full Text Available The paper addresses the algorithmic and architectural design of digital input power audio amplifiers. A modelling platform, based on a meet-in-the-middle approach between top-down and bottom-up design strategies, allows a fast but still accurate exploration of the mixed-signal design space. Different amplifier architectures are configured and compared to find optimal trade-offs among different cost-functions: low distortion, high efficiency, low circuit complexity and low sensitivity to parameter changes. A novel amplifier architecture is derived; its prototype implements digital processing IP macrocells (oversampler, interpolating filter, PWM cross-point deriver, noise shaper, multilevel PWM modulator, dead time compensator on a single low-complexity FPGA while off-chip components are used only for the power output stage (LC filter and power MOS bridge; no heatsink is required. The resulting digital input amplifier features a power efficiency higher than 90% and a total harmonic distortion down to 0.13% at power levels of tens of Watts. Discussions towards the full-silicon integration of the mixed-signal amplifier in embedded devices, using BCD technology and targeting power levels of few Watts, are also reported.

  16. Analysis of musical expression in audio signals

    Science.gov (United States)

    Dixon, Simon

    2003-01-01

    In western art music, composers communicate their work to performers via a standard notation which specificies the musical pitches and relative timings of notes. This notation may also include some higher level information such as variations in the dynamics, tempo and timing. Famous performers are characterised by their expressive interpretation, the ability to convey structural and emotive information within the given framework. The majority of work on audio content analysis focusses on retrieving score-level information; this paper reports on the extraction of parameters describing the performance, a task which requires a much higher degree of accuracy. Two systems are presented: BeatRoot, an off-line beat tracking system which finds the times of musical beats and tracks changes in tempo throughout a performance, and the Performance Worm, a system which provides a real-time visualisation of the two most important expressive dimensions, tempo and dynamics. Both of these systems are being used to process data for a large-scale study of musical expression in classical and romantic piano performance, which uses artificial intelligence (machine learning) techniques to discover fundamental patterns or principles governing expressive performance.

  17. Minimally radiating sources for personal audio.

    Science.gov (United States)

    Elliott, Stephen J; Cheer, Jordan; Murfet, Harry; Holland, Keith R

    2010-10-01

    In order to reduce annoyance from the audio output of personal devices, it is necessary to maintain the sound level at the user position while minimizing the levels elsewhere. If the dark zone, within which the sound is to be minimized, extends over the whole far field of the source, the problem reduces to that of minimizing the radiated sound power while maintaining the pressure level at the user position. It is shown analytically that the optimum two-source array then has a hypercardioid directivity and gives about 7 dB reduction in radiated sound power, compared with a monopole producing the same on-axis pressure. The performance of other linear arrays is studied using monopole simulations for the motivating example of a mobile phone. The trade-off is investigated between the performance in reducing radiated noise, and the electrical power required to drive the array for different numbers of elements. It is shown for both simulations and experiments conducted on a small array of loudspeakers under anechoic conditions, that both two and three element arrays provide a reasonable compromise between these competing requirements. The implementation of the two-source array in a coupled enclosure is also shown to reduce the electrical power requirements.

  18. Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations

    Directory of Open Access Journals (Sweden)

    Müller Meinard

    2007-01-01

    Full Text Available One major goal of structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches to this problem work well for music, where the repetitions largely agree with respect to instrumentation and tempo, as is typically the case for popular music. For other classes of music such as Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, modulation, articulation, and tempo progression. In this paper, we propose a robust and efficient algorithm for audio structure analysis, which allows to identify musically similar segments even in the presence of large variations in these parameters. To account for such variations, our main idea is to incorporate invariance at various levels simultaneously: we design a new type of statistical features to absorb microvariations, introduce an enhanced local distance measure to account for local variations, and describe a new strategy for structure extraction that can cope with the global variations. Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations.

  19. Audio Arduino - an ALSA (Advanced Linux Sound Architecture) audio driver for FTDI-based Arduinos

    DEFF Research Database (Denmark)

    Dimitrov, Smilen; Serafin, Stefania

    2011-01-01

    be considered to be a system, that encompasses design decisions on both hardware and software levels - that also demand a certain understanding of the architecture of the target PC operating system. This project outlines how an Arduino Duemillanove board (containing a USB interface chip, manufactured by Future...... Technology Devices International Ltd [FTDI] company) can be demonstrated to behave as a full-duplex, mono, 8-bit 44.1 kHz soundcard, through an implementation of: a PC audio driver for ALSA (Advanced Linux Sound Architecture); a matching program for the Arduino's ATmega microcontroller - and nothing more...

  20. A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration

    Directory of Open Access Journals (Sweden)

    Jensen Søren Holdt

    2005-01-01

    Full Text Available Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.

  1. Can audio recording improve patients' recall of outpatient consultations?

    DEFF Research Database (Denmark)

    Wolderslund, Maiken; Kofoed, Poul-Erik; Axboe, Mette

    Introduction In order to give patients possibility to listen to their consultation again, we have designed a system which gives the patients access to digital audio recordings of their consultations. An Interactive Voice Response platform enables the audio recording and gives the patients access...... to replay their consultation. The intervention is evaluated in a randomised controlled trial with 5.460 patients in order to determine whether providing patients with digital audio recording of the consultation affects the patients overall perception of their consultation. In addition to this primary...... objective we want to investigate if replay of the consultations improves the patients’ recall of the information given. Methods Interviews are carried out with 40 patients whose consultations have been audio recorded. Patients are divided into two groups, those who have listened to their consultation...

  2. Perancangan Sistem Audio Mobil Berbasiskan Sistem Pakar dan Web

    Directory of Open Access Journals (Sweden)

    Djunaidi Santoso

    2011-12-01

    Full Text Available Designing car audio that fits user’s needs is a fun activity. However, the design often consumes more time and costly since it should be consulted to the experts several times. For easy access to information in designing a car audio system as well as error prevention, an car audio system based on expert system and web is designed for those who do not have sufficient time and expense to consult directly to experts. This system consists of tutorial modules designed using the HyperText Preprocessor (PHP and MySQL as database. This car audio system design is evaluated uses black box testing method which focuses on the functional needs of the application. Tests are performed by providing inputs and produce outputs corresponding to the function of each module. The test results prove the correspondence between input and output, which means that the program meet the initial goals of the design. 

  3. Audio-visual detection benefits in the rat

    National Research Council Canada - National Science Library

    Gleiss, Stephanie; Kayser, Christoph

    2012-01-01

    ... multisensory protocols. We here demonstrate the feasibility of an audio-visual stimulus detection task for rats, in which the animals detect lateralized uni- and multi-sensory stimuli in a two-response forced choice paradigm...

  4. El Digital Audio Tape Recorder. Contra autores y creadores

    Directory of Open Access Journals (Sweden)

    Jun Ono

    2015-01-01

    Full Text Available La llamada "DAT" (abreviatura por "digital audio tape recorder" / grabadora digital de audio ha recibido cobertura durante mucho tiempo en los medios masivos de Japón y otros países, como un producto acústico electrónico nuevo y controversial de la industria japonesa de artefactos electrónicos. ¿Qué ha pasado con el objeto de esta controversia?

  5. IELTS speaking instruction through audio/voice conferencing

    Directory of Open Access Journals (Sweden)

    Hamed Ghaemi

    2012-02-01

    Full Text Available The currentstudyaimsatinvestigatingtheimpactofAudio/Voiceconferencing,asanewapproachtoteaching speaking, on the speakingperformanceand/orspeakingband score ofIELTScandidates.Experimentalgroupsubjectsparticipated in an audio conferencing classwhile those of the control group enjoyed attending in a traditional IELTS Speakingclass. At the endofthestudy,allsubjectsparticipatedinanIELTSExaminationheldonNovemberfourthin Tehran,Iran.To compare thegroupmeansforthestudy,anindependentt-testanalysiswasemployed.Thedifferencebetween experimental and control groupwasconsideredtobestatisticallysignificant(P<0.01.Thatisthecandidates in experimental group have outperformed the ones in control group in IELTS Speaking test scores.

  6. Towards Bridging the Gap between Sheet Music and Audio

    OpenAIRE

    Fremerey, Christian; Mueller, Meinard; Clausen, Michael

    2009-01-01

    Sheet music and audio recordings represent and describe music on different semantic levels. Sheet music describes abstract high-level parameters such as notes, keys, measures, or repeats in a visual form. Because of its explicitness and compactness, most musicologists discuss and analyze the meaning of music on the basis of sheet music. On the contrary, most people enjoy music by listening to audio recordings, which represent music in an acoustic form. In particular, the nua...

  7. Ferrite bead effect on Class-D amplifier audio quality

    OpenAIRE

    Haddad, Kevin El; Mrad, Roberto; Morel, Florent; Pillonnet, Gael; Vollaire, Christian; Nagari, Angelo

    2014-01-01

    International audience; This paper studies the effect of ferrite beads on the audio quality of Class-D audio amplifiers. This latter is a switch-ing circuit which creates high frequency harmonics. Generally, a filter is used at the amplifier output for the sake of electro-magnetic compatibility (EMC). So often, in integrated solutions, this filter contains ferrite beads which are magnetic components and present nonlinear behavior. Time domain measurements and their equivalence in frequency do...

  8. Automated processing of massive audio/video content using FFmpeg

    Directory of Open Access Journals (Sweden)

    Kia Siang Hock

    2014-01-01

    Full Text Available Audio and video content forms an integral, important and expanding part of the digital collections in libraries and archives world-wide. While these memory institutions are familiar and well-versed in the management of more conventional materials such as books, periodicals, ephemera and images, the handling of audio (e.g., oral history recordings and video content (e.g., audio-visual recordings, broadcast content requires additional toolkits. In particular, a robust and comprehensive tool that provides a programmable interface is indispensable when dealing with tens of thousands of hours of audio and video content. FFmpeg is comprehensive and well-established open source software that is capable of the full-range of audio/video processing tasks (such as encode, decode, transcode, mux, demux, stream and filter. It is also capable of handling a wide-range of audio and video formats, a unique challenge in memory institutions. It comes with a command line interface, as well as a set of developer libraries that can be incorporated into applications.

  9. Visualizing Music in its Entirety using Acoustic Features: Music Flowgram

    OpenAIRE

    Dasaem Jeong; Juhan Nam

    2016-01-01

    In this paper, we present an automatic method for visualizing a music audio file from its beginning to end, especially for classical music. Our goal is developing an easy-to-use visualization method that is helpful for listeners and can be used for various kinds of classical music, even for complex orchestral music. To represent musical characteristic, the method uses audio features like volume, onset density, and auditory roughness, which describe loudness, tempo, and dissonance, respectivel...

  10. TNO at TRECVID 2008, Combining Audio and Video Fingerprinting for Robust Copy Detection

    NARCIS (Netherlands)

    Doets, P.J.; Eendebak, P.T.; Ranguelova, E.; Kraaij, W.

    2009-01-01

    TNO has evaluated a baseline audio and a video fingerprinting system based on robust hashing for the TRECVID 2008 copy detection task. We participated in the audio, the video and the combined audio-video copy detection task. The audio fingerprinting implementation clearly outperformed the video

  11. Communicative Competence in Audio Classrooms: A Position Paper for the CADE 1991 Conference.

    Science.gov (United States)

    Burge, Liz

    Classroom practitioners need to move their attention away from the technological and logistical competencies required for audio conferencing (AC) to the required communicative competencies in order to advance their skills in handling the psychodynamics of audio virtual classrooms which include audio alone and audio with graphics. While the…

  12. 37 CFR 201.28 - Statements of Account for digital audio recording devices or media.

    Science.gov (United States)

    2010-07-01

    ... digital audio recording devices or media. 201.28 Section 201.28 Patents, Trademarks, and Copyrights... of Account for digital audio recording devices or media. (a) General. This section prescribes rules... United States any digital audio recording device or digital audio recording medium. (b) Definitions. For...

  13. 37 CFR 201.27 - Initial notice of distribution of digital audio recording devices or media.

    Science.gov (United States)

    2010-07-01

    ... distribution of digital audio recording devices or media. 201.27 Section 201.27 Patents, Trademarks, and... Initial notice of distribution of digital audio recording devices or media. (a) General. This section..., any digital audio recording device or digital audio recording medium in the United States. (b...

  14. Adaptive Modulation Approach for Robust MPEG-4 AAC Encoded Audio Transmission

    Science.gov (United States)

    2011-11-01

    translates to perceptual audio quality in the range of slightly annoying to annoying as per the ITU-R BS.1387-1 standard [8], [9] shown in Table 1. But in...of Audiovisual Objects,” Part 3: Audio, Subpart 4: General Audio (GA) Coding: AAC/TwinVQ, 2000. [2] Y. Zhang, “Robust audio Coding over Wireless

  15. Recursive nearest neighbor search in a sparse and multiscale domain for comparing audio signals

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Daudet, Laurent

    2011-01-01

    of time-frequency atoms. Theoretically, error bounds on these approximations provide efficient means for quickly reducing the search space to the nearest neighborhood of a given data; but we demonstrate here that the best bound defined thus far involving a probabilistic assumption does not provide......We investigate recursive nearest neighbor search in a sparse domain at the scale of audio signals. Essentially, to approximate the cosine distance between the signals we make pairwise comparisons between the elements of localized sparse models built from large and redundant multiscale dictionaries...

  16. Weighted Feature Distance

    DEFF Research Database (Denmark)

    Ortiz-Arroyo, Daniel; Yazdani, Hossein

    2017-01-01

    The accuracy of machine learning methods for clustering depends on the optimal selection of similarity functions. Conventional distance functions for the vector space might cause an algorithm to being affected by some dominant features that may skew its final results. This paper introduces a flexible...... environment for mining algorithms that uses the most suitable similarity functions to cover the diversity of both vector and feature spaces. The paper describes some well known conventional distance functions and introduces Weighted Feature Distance (WFD) and Prioritized Weighted Feature Distance (PWFD......). These novel functions attempt to balance the impact of the dominant features by covering both feature and vector spaces, additionally to optionally allowing us to increase or decrease the impact of some features. We evaluate and compare the accuracy of our proposed WFD(s) on conventional fuzzy...

  17. ARC Code TI: SLAB Spatial Audio Renderer

    Data.gov (United States)

    National Aeronautics and Space Administration — SLAB is a software-based, real-time virtual acoustic environment rendering system being developed as a tool for the study of spatial hearing. SLAB is designed to...

  18. Adaptive Multi-Class Audio Classification in Noisy In-Vehicle Environment

    OpenAIRE

    Won, Myounggyu; Alsaadan, Haitham; Eun, Yongsoon

    2017-01-01

    With ever-increasing number of car-mounted electric devices and their complexity, audio classification is increasingly important for the automotive industry as a fundamental tool for human-device interactions. Existing approaches for audio classification, however, fall short as the unique and dynamic audio characteristics of in-vehicle environments are not appropriately taken into account. In this paper, we develop an audio classification system that classifies an audio stream into music, spe...

  19. Digital Audio Radio Broadcast Systems Laboratory Testing Nearly Complete

    Science.gov (United States)

    2005-01-01

    Radio history continues to be made at the NASA Lewis Research Center with the completion of phase one of the digital audio radio (DAR) testing conducted by the Consumer Electronics Group of the Electronic Industries Association. This satellite, satellite/terrestrial, and terrestrial digital technology will open up new audio broadcasting opportunities both domestically and worldwide. It will significantly improve the current quality of amplitude-modulated/frequency-modulated (AM/FM) radio with a new digitally modulated radio signal and will introduce true compact-disc-quality (CD-quality) sound for the first time. Lewis is hosting the laboratory testing of seven proposed digital audio radio systems and modes. Two of the proposed systems operate in two modes each, making a total of nine systems being tested. The nine systems are divided into the following types of transmission: in-band on-channel (IBOC), in-band adjacent-channel (IBAC), and new bands. The laboratory testing was conducted by the Consumer Electronics Group of the Electronic Industries Association. Subjective assessments of the audio recordings for each of the nine systems was conducted by the Communications Research Center in Ottawa, Canada, under contract to the Electronic Industries Association. The Communications Research Center has the only CCIR-qualified (Consultative Committee for International Radio) audio testing facility in North America. The main goals of the U.S. testing process are to (1) provide technical data to the Federal Communication Commission (FCC) so that it can establish a standard for digital audio receivers and transmitters and (2) provide the receiver and transmitter industries with the proper standards upon which to build their equipment. In addition, the data will be forwarded to the International Telecommunications Union to help in the establishment of international standards for digital audio receivers and transmitters, thus allowing U.S. manufacturers to compete in the

  20. Acoustic Heritage and Audio Creativity: the Creative Application of Sound in the Representation, Understanding and Experience of Past Environments

    Directory of Open Access Journals (Sweden)

    Damian Murphy

    2017-06-01

    Full Text Available Acoustic Heritage is one aspect of archaeoacoustics, and refers more specifically to the quantifiable acoustic properties of buildings, sites and landscapes from our architectural and archaeological past, forming an important aspect of our intangible cultural heritage. Auralisation, the audio equivalent of 3D visualisation, enables these acoustic properties, captured via the process of measurement and survey, or computer-based modelling, to form the basis of an audio reconstruction and presentation of the studied space. This article examines the application of auralisation and audio creativity as a means to explore our acoustic heritage, thereby diversifying and enhancing the toolset available to the digital heritage or humanities researcher. The Open Acoustic Impulse Response (OpenAIR library is an online repository for acoustic impulse response and auralisation data, with a significant part having been gathered from a broad range of heritage sites. The methodology used to gather this acoustic data is discussed, together with the processes used in generating and calibrating a comparable computer model, and how the data generated might be analysed and presented. The creative use of this acoustic data is also considered, in the context of music production, mixed media artwork and audio for gaming. More relevant to digital heritage is how these data can be used to create new experiences of past environments, as information, interpretation, guide or artwork and ultimately help to articulate new research questions and explorations of our acoustic heritage.

  1. Talker variability in audio-visual speech perception.

    Science.gov (United States)

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  2. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  3. Deep Complementary Bottleneck Features for Visual Speech Recognition

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Deep bottleneck features (DBNFs) have been used successfully in the past for acoustic speech recognition from audio. However, research on extracting DBNFs for visual speech recognition is very limited. In this work, we present an approach to extract deep bottleneck visual features based on deep

  4. Technical Evaluation Report 31: Internet Audio Products (3/ 3

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-08-01

    Full Text Available Two contrasting additions to the online audio market are reviewed: iVocalize, a browser-based audio-conferencing software, and Skype, a PC-to-PC Internet telephone tool. These products are selected for review on the basis of their success in gaining rapid popular attention and usage during 2003-04. The iVocalize review emphasizes the product’s role in the development of a series of successful online audio communities – notably several serving visually impaired users. The Skype review stresses the ease with which the product may be used for simultaneous PC-to-PC communication among up to five users. Editor’s Note: This paper serves as an introduction to reports about online community building, and reviews of online products for disabled persons, in the next ten reports in this series. JPB, Series Ed.

  5. Multi Carrier Modulation Audio Power Amplifier with Programmable Logic

    DEFF Research Database (Denmark)

    Christiansen, Theis; Andersen, Toke Meyer; Knott, Arnold

    2009-01-01

    . Analytically expressions, simulations and measurements result in reduced switching frequency amplitudes using MCM techniques. It is also shown that the Total Harmonic Distortion (THD) tends to be compromised compared to conventional class D amplifiers due to intermodulation products of the switching......While switch-mode audio power amplifiers allow compact implementations and high output power levels due to their high power efficiency, they are very well known for creating electromagnetic interference (EMI) with other electronic equipment. To lower the EMI of switch-mode (class D) audio power...... amplifiers while keeping the performance measures to excellent levels is therefore of high interest. In this paper a class D audio amplifier utilising Multi Carrier Modulation (MCM) will be analysed, and a prototype Master-Slave Multi Carrier Modulated (MS MCM) amplifier has been constructed and measured...

  6. Audio-visual active speaker tracking in cluttered indoors environments.

    Science.gov (United States)

    Talantzis, Fotios; Pnevmatikakis, Aristodemos; Constantinides, Anthony G

    2009-02-01

    We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.

  7. Highlight summarization in golf videos using audio signals

    Science.gov (United States)

    Kim, Hyoung-Gook; Kim, Jin Young

    2008-01-01

    In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.

  8. Say What? The Role of Audio in Multimedia Video

    Science.gov (United States)

    Linder, C. A.; Holmes, R. M.

    2011-12-01

    Audio, including interviews, ambient sounds, and music, is a critical-yet often overlooked-part of an effective multimedia video. In February 2010, Linder joined scientists working on the Global Rivers Observatory Project for two weeks of intensive fieldwork in the Congo River watershed. The team's goal was to learn more about how climate change and deforestation are impacting the river system and coastal ocean. Using stills and video shot with a lightweight digital SLR outfit and audio recorded with a pocket-sized sound recorder, Linder documented the trials and triumphs of working in the heart of Africa. Using excerpts from the six-minute Congo multimedia video, this presentation will illustrate how to record and edit an engaging audio track. Topics include interview technique, collecting ambient sounds, choosing and using music, and editing it all together to educate and entertain the viewer.

  9. Sistema de adquisición y procesamiento de audio

    OpenAIRE

    Pérez Segurado, Rubén

    2015-01-01

    El objetivo de este proyecto es el diseño y la implementación de una plataforma para un sistema de procesamiento de audio. El sistema recibirá una señal de audio analógica desde una fuente de audio, permitirá realizar un tratamiento digital de dicha señal y generará una señal procesada que se enviará a unos altavoces externos. Para la realización del sistema de procesamiento se empleará: - Un dispositivo FPGA de Lattice, modelo MachX02-7000-HE, en la cual estarán todas la...

  10. Multi Carrier Modulator for Switch-Mode Audio Power Amplifiers

    DEFF Research Database (Denmark)

    Knott, Arnold; Pfaffinger, Gerhard; Andersen, Michael Andreas E.

    2008-01-01

    -mode audio power amplifiers while keeping the performance measures to excellent levels is therefore of high general interest. A modulator utilizing multiple carrier signals to generate a two level pulse train will be shown in this paper. The performance of the modulator will be compared in simulation......While switch-mode audio power amplifiers allow compact implementations and high output power levels due to their high power efficiency, they are very well known for creating electromagnetic interference (EMI) with other electronic equipment, in particular radio receivers. Lowering the EMI of switch...... to existing modulation topologies. The lower EMI as well as the preserved audio performance will be shown in simulation as well as in measurement results on a prototype....

  11. Audio engineering 101 a beginner's guide to music production

    CERN Document Server

    Dittmar, Tim

    2013-01-01

    Audio Engineering 101 is a real world guide for starting out in the recording industry. If you have the dream, the ideas, the music and the creativity but don't know where to start, then this book is for you!Filled with practical advice on how to navigate the recording world, from an author with first-hand, real-life experience, Audio Engineering 101 will help you succeed in the exciting, but tough and confusing, music industry. Covering all you need to know about the recording process, from the characteristics of sound to a guide to microphones to analog versus digital

  12. Video-assisted segmentation of speech and audio track

    Science.gov (United States)

    Pandit, Medha; Yusoff, Yusseri; Kittler, Josef; Christmas, William J.; Chilton, E. H. S.

    1999-08-01

    Video database research is commonly concerned with the storage and retrieval of visual information invovling sequence segmentation, shot representation and video clip retrieval. In multimedia applications, video sequences are usually accompanied by a sound track. The sound track contains potential cues to aid shot segmentation such as different speakers, background music, singing and distinctive sounds. These different acoustic categories can be modeled to allow for an effective database retrieval. In this paper, we address the problem of automatic segmentation of audio track of multimedia material. This audio based segmentation can be combined with video scene shot detection in order to achieve partitioning of the multimedia material into semantically significant segments.

  13. The Single- and Multichannel Audio Recordings Database (SMARD)

    DEFF Research Database (Denmark)

    Nielsen, Jesper Kjær; Jensen, Jesper Rindom; Jensen, Søren Holdt

    2014-01-01

    A new single- and multichannel audio recordings database (SMARD) is presented in this paper. The database contains recordings from a box-shaped listening room for various loudspeaker and array types. The recordings were made for 48 different configurations of three different loudspeakers and four...... different microphone arrays. In each configuration, 20 different audio segments were played and recorded ranging from simple artificial sounds to polyphonic music. SMARD can be used for testing algorithms developed for numerous application, and we give examples of source localisation results....

  14. Efficiency Optimization in Class-D Audio Amplifiers

    DEFF Research Database (Denmark)

    Yamauchi, Akira; Knott, Arnold; Jørgensen, Ivan Harald Holger

    2015-01-01

    This paper presents a new power efficiency optimization routine for designing Class-D audio amplifiers. The proposed optimization procedure finds design parameters for the power stage and the output filter, and the optimum switching frequency such that the weighted power losses are minimized under...... the given constraints. The optimization routine is applied to minimize the power losses in a 130 W class-D audio amplifier based on consumer behavior investigations, where the amplifier operates at idle and low power levels most of the time. Experimental results demonstrate that the optimization method can...

  15. DOA Estimation of Audio Sources in Reverberant Environments

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Nielsen, Jesper Kjær; Heusdens, Richard

    2016-01-01

    Reverberation is well-known to have a detrimental impact on many localization methods for audio sources. We address this problem by imposing a model for the early reflections as well as a model for the audio source itself. Using these models, we propose two iterative localization methods that est...... bias. Our simulation results show that we can estimate the DOA of the desired signal more accurately with this procedure compared to state-of-theart estimator in both synthetic and real data experiments with reverberation....

  16. Design of a WAV audio player based on K20

    Directory of Open Access Journals (Sweden)

    Xu Yu

    2016-01-01

    Full Text Available The designed player uses the Freescale Company’s MK20DX128VLH7 as the core control ship, and its hardware platform is equipped with VS1003 audio decoder, OLED display interface, USB interface and SD card slot. The player uses the open source embedded real-time operating system μC/OS-II, Freescale USB Stack V4.1.1 and FATFS, and a graphical user interface is developed to improve the user experience based on CGUI. In general, the designed WAV audio player has a strong applicability and a good practical value.

  17. Cambridge English First 2 audio CDs : authentic examination papers

    CERN Document Server

    2016-01-01

    Four authentic Cambridge English Language Assessment examination papers for the Cambridge English: First (FCE) exam. These examination papers for the Cambridge English: First (FCE) exam provide the most authentic exam preparation available, allowing candidates to familiarise themselves with the content and format of the exam and to practise useful exam techniques. The Audio CDs contain the recorded material to allow thorough preparation for the Listening paper and are designed to be used with the Student's Book. A Student's Book with or without answers and a Student's Book with answers and downloadable Audio are available separately. These tests are also available as Cambridge English: First Tests 5-8 on Testbank.org.uk

  18. Minimizing Crosstalk in Self Oscillating Switch Mode Audio Power Amplifiers

    DEFF Research Database (Denmark)

    Knott, Arnold; Ploug, Rasmus Overgaard

    2012-01-01

    The varying switching frequencies of self oscillating switch mode audio amplifiers have been known to cause interchannel intermodulation disturbances in multi channel configurations. This crosstalk phenomenon has a negative impact on the audio performance. The goal of this paper is to present...... a method to minimize this phenomenon by improving the integrity of the various power distribution systems of the amplifier. The method is then applied to an amplifier built for this investigation. The results show that the crosstalk is suppressed with 30 dB, but is not entirely eliminated...

  19. Optimizing dictionary learning parameters for solving Audio Inpainting problem

    Directory of Open Access Journals (Sweden)

    Václav Mach

    2013-01-01

    Full Text Available Recovering missing or distorted audio signal sam-ples has been recently improved by solving an Audio Inpaintingproblem. This paper aims to connect this problem with K-SVD dictionary learning to improve reconstruction error formissing signal insertion problem. Our aim is to adapt an initialdictionary to the reliable signal to be more accurate in missingsamples estimation. This approach is based on sparse signalsreconstruction and optimization problem. In the paper two staplealgorithms, connection between them and emerging problemsare described. We tried to find optimal parameters for efficientdictionary learning.

  20. Audio-Visual Peripheral Localization Disparity

    Directory of Open Access Journals (Sweden)

    Ryota Miyauchi

    2011-10-01

    Full Text Available In localizing simultaneous auditory and visual events, the brain should map the audiovisual events onto a unified perceptual space in a subsequent spatial process for integrating and/or comparing multisensory information. However, there is little qualitative and quantitative psychological data for estimating multisensory localization in peripheral visual fields. We measured the relative perceptual direction of a sound to a flash when they were simultaneously presented in peripheral visual fields. The results demonstrated that the sound and flash were perceptually located at the same position when the sound was presented in 5 deg-periphery from the flash. This phenomenon occurred even excluding the trial in which the participants' eyes moved. The measurement of the location of each sound and flash in a pointing task showed that the perceptual location of the sound shifted toward the frontal direction and conversely the perceptual location of the flash shifted toward the periphery. Our findings suggest that unisensory perceptual spaces of audition and vision have deviations in peripheral visual fields and, when the brain remaps unisensory locations of auditory and visual events into unified perceptual space, the unisensory spatial information of the events can be suitably maintained.

  1. Cover signal specific steganalysis: the impact of training on the example of two selected audio steganalysis approaches

    Science.gov (United States)

    Kraetzer, Christian; Dittmann, Jana

    2008-02-01

    The main goals of this paper are to show the impact of the basic assumptions for the cover channel characteristics as well as the impact of different training/testing set generation strategies on the statistical detectability of exemplary chosen audio hiding approaches known from steganography and watermarking. Here we have selected exemplary five steganography algorithms and four watermarking algorithms. The channel characteristics for two different chosen audio cover channels (an application specific exemplary scenario of VoIP steganography and universal audio steganography) are formalised and their impact on decisions in the steganalysis process, especially on the strategies applied for training/ testing set generation, are shown. Following the assumptions on the cover channel characteristics either cover dependent or cover independent training and testing can be performed, using either correlated or non-correlated training and test sets. In comparison to previous work, additional frequency domain features are introduced for steganalysis and the performance (in terms of classification accuracy) of Bayesian classifiers and multinomial logistic regression models is compared with the results of SVM classification. We show that the newly implemented frequency domain features increase the classification accuracy achieved in SVM classification. Furthermore it is shown on the example of VoIP steganalysis that channel character specific evaluation performs better than tests without focus on a specific channel (i.e. universal steganalysis). A comparison of test results for cover dependent and independent training and testing shows that the latter performs better for all nine algorithms evaluated here and the used SVM based classifier.

  2. Multi-modal gesture recognition using integrated model of motion, audio and video

    Science.gov (United States)

    Goutsu, Yusuke; Kobayashi, Takaki; Obara, Junya; Kusajima, Ikuo; Takeichi, Kazunari; Takano, Wataru; Nakamura, Yoshihiko

    2015-07-01

    Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.

  3. Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

    Directory of Open Access Journals (Sweden)

    Petr Motlicek

    2013-01-01

    Full Text Available We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director. Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

  4. Market potential for interactive audio-visual media

    NARCIS (Netherlands)

    Leurdijk, A.; Limonard, S.

    2005-01-01

    NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows

  5. Voice Quality Improvement with Error Concealment in Audio Sensor Networks

    NARCIS (Netherlands)

    Türkes, Okan; Baydere, Sebnem

    2012-01-01

    Multi-dimensional properties of audio data and resource-poor nodes make voice processing and transmission a challenging task for Wireless Sensor Networks (WSN). This study analyzes voice quality distortions caused by packet losses occurring over a multi-hop WSN testbed: A comprehensive analysis of

  6. Directions for Change in an Audio-Lingual Approach.

    Science.gov (United States)

    Knop, Constance K.

    2000-01-01

    This article, originally published in 1981 (Canadian Modern Language Review; v37 n4), examines recent research and thinking in the field of second language teaching to find directions and suggestions for change that can be interwoven into an audio-lingual approach. (Author/VWL)

  7. Audio-Described Educational Materials: Ugandan Teachers' Experiences

    Science.gov (United States)

    Wormnaes, Siri; Sellaeg, Nina

    2013-01-01

    This article describes and discusses a qualitative, descriptive, and exploratory study of how 12 visually impaired teachers in Uganda experienced audio-described educational video material for teachers and student teachers. The study is based upon interviews with these teachers and observations while they were using the material either…

  8. Possible technical solutions to reduce energy consumption in audio products

    Energy Technology Data Exchange (ETDEWEB)

    Nielsen, K.; Andersen, M.A.E.

    1999-07-01

    In common audio products nearly all the supplied power is dissipated as heat. The major consumers are with almost no exception the power supply and the audio amplifier. This paper is divided in two parts, concentrating on typical efficiency measures for the concepts of today and the possibly technical solutions, by which the overall efficiency can be considerably improved in the future. Traditional power supplies are made using a transformer operating on the mains frequency followed by a linear regulator. These are bulky and the efficiency is only around 40%. Using high frequency switch mode power supplies the size of the power supply can be reduced and the efficiency can be increased to 80-90%. Construction of optimal amplifiers in regard to total energy consumption over life time, can only be accomplished by considering both the general volume control distribution, and the general spectral amplitude distribution of audio signals. The traditional efficiency measure specified at the maximum efficiency level says only very little about the real energy consumption of the audio amplifier. As an example, the theoretical efficiency for at traditional class B amplifier is 78%. Using a new efficiency measure defined on the basis of the approximate volume control distribution, an 50W amplifier example shows an overall efficiency of only 1%. In the paper possible solutions and guidelines to increase the real amplifier efficiency are given. (au)

  9. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-05-28

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  10. Making Audio-Visual Teaching Materials for Elementary Science

    OpenAIRE

    永田, 四郎

    1980-01-01

    For the elementary science, some audio-visual teaching materials were made by author and our students. These materials are slides for projector, transparencies and materials for OHP, 8 mm sound films and video tapes. We hope this kind of study will continue.

  11. Effect of Audio-Visual Intervention Program on Cognitive ...

    African Journals Online (AJOL)

    Thus the purpose of the study was to study the effectiveness of the audio-visual intervention program on the cognitive development of preschool children in relation to their socio economic status. The researcher employed experimental method to conduct the study. The sample consisted of 100 students from preschool of ...

  12. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  13. Audio-Visual Communications, A Tool for the Professional

    Science.gov (United States)

    Journal of Environmental Health, 1976

    1976-01-01

    The manner in which the Cuyahoga County, Ohio Department of Environmental Health utilizes audio-visual presentations for communication with business and industry, professional public health agencies and the general public is presented. Subjects including food sanitation, radiation protection and safety are described. (BT)

  14. The Timbre Toolbox: extracting audio descriptors from musical signals.

    Science.gov (United States)

    Peeters, Geoffroy; Giordano, Bruno L; Susini, Patrick; Misdariis, Nicolas; McAdams, Stephen

    2011-11-01

    The analysis of musical signals to extract audio descriptors that can potentially characterize their timbre has been disparate and often too focused on a particular small set of sounds. The Timbre Toolbox provides a comprehensive set of descriptors that can be useful in perceptual research, as well as in music information retrieval and machine-learning approaches to content-based retrieval in large sound databases. Sound events are first analyzed in terms of various input representations (short-term Fourier transform, harmonic sinusoidal components, an auditory model based on the equivalent rectangular bandwidth concept, the energy envelope). A large number of audio descriptors are then derived from each of these representations to capture temporal, spectral, spectrotemporal, and energetic properties of the sound events. Some descriptors are global, providing a single value for the whole sound event, whereas others are time-varying. Robust descriptive statistics are used to characterize the time-varying descriptors. To examine the information redundancy across audio descriptors, correlational analysis followed by hierarchical clustering is performed. This analysis suggests ten classes of relatively independent audio descriptors, showing that the Timbre Toolbox is a multidimensional instrument for the measurement of the acoustical structure of complex sound signals.

  15. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  16. Current Events and Technology: Video and Audio on the Internet.

    Science.gov (United States)

    Laposata, Matthew M.; Howick, Tom; Dias, Michael J.

    2002-01-01

    Explains the effectiveness of visual aids compared to written materials in teaching and recommends using television segments for teaching purposes. Introduces digitized clips provided by major television news organizations through the Internet and describes the technology requirements for successful viewing of streaming videos and audios. (YDS)

  17. A Comparison of Vocabulary Acquisition in Audio and Video Contexts.

    Science.gov (United States)

    Duquette, Lise; Painchaud, Gisele

    1996-01-01

    Examines the effects of different kinds of rich contexts for vocabulary learning based on second language oral input. The article compares the number and kinds of words learned through exposure to a dialogue or video, or by first listening to an oral account of the dialogue situation and then hearing the audio soundtrack without visual support.…

  18. Influence of Audio-Visual Presentations on Learning Abstract Concepts.

    Science.gov (United States)

    Lai, Shu-Ling

    2000-01-01

    Describes a study of college students that investigated whether various types of visual illustrations influenced abstract concept learning when combined with audio instruction. Discusses results of analysis of variance and pretest posttest scores in relation to learning performance, attitudes toward the computer-based program, and differences in…

  19. The Audio-Visual Marketing Handbook for Independent Schools.

    Science.gov (United States)

    Griffith, Tom

    This how-to booklet offers specific advice on producing video or slide/tape programs for marketing independent schools. Five chapters present guidelines for various stages in the process: (1) Audio-Visual Marketing in Context (aesthetics and economics of audiovisual marketing); (2) A Question of Identity (identifying the audience and deciding on…

  20. A Power Efficient Audio Amplifier Combining Switching and Linear Techniques

    NARCIS (Netherlands)

    van der Zee, Ronan A.R.; van Tuijl, Adrianus Johannes Maria

    1998-01-01

    Integrated Class D audio amplifiers are very power efficient, but require an external filter which prevents further integration. Also due to this filter, large feedback factors are hard to realise, so that the load influences the distortion- and transfer characteristics. The amplifier presented in

  1. The Role of Audio Media in the Lives of Children.

    Science.gov (United States)

    Christenson, Peter G.; Lindlof, Thomas R.

    Mass communication researchers have largely ignored the role of audio media and popular music in the lives of children, yet the available evidence shows that children do listen. Extant studies yield a consistent developmental portrait of childrens' listening frequency, but there is a notable lack of programatic research over the past decade, one…

  2. Deep learning, audio adversaries, and music content analysis

    DEFF Research Database (Denmark)

    Kereliuk, Corey Mose; Sturm, Bob L.; Larsen, Jan

    2015-01-01

    We present the concept of adversarial audio in the context of deep neural networks (DNNs) for music content analysis. An adversary is an algorithm that makes minor perturbations to an input that cause major repercussions to the system response. In particular, we design an adversary for a DNN...

  3. The relationship between basic audio quality and overall listening experience.

    Science.gov (United States)

    Schoeffler, Michael; Herre, Jürgen

    2016-09-01

    Basic audio quality (BAQ) is a well-known perceptual attribute, which is rated in various listening test methods to measure the performance of audio systems. Unfortunately, when it comes to purchasing audio systems, BAQ might not have a significant influence on the customers' buying decisions since other factors, like brand loyalty, might be more important. In contrast to BAQ, overall listening experience (OLE) is an affective attribute which incorporates all aspects that are important to an individual assessor, including his or her preference for music genre and audio quality. In this work, the relationship between BAQ and OLE is investigated in more detail. To this end, an experiment was carried out, in which participants rated the BAQ and the OLE of music excerpts with different timbral and spatial degradations. In a between-group-design procedure, participants were assigned into two groups, in each of which a different set of stimuli was rated. The results indicate that rating of both attributes, BAQ and OLE, leads to similar rankings, even if a different set of stimuli is rated. In contrast to the BAQ ratings, which were more influenced by timbral than spatial degradations, the OLE ratings were almost equally influenced by timbral and spatial degradations.

  4. Audio-visual materials usage preference among agricultural ...

    African Journals Online (AJOL)

    It was found that respondents preferred radio, television, poster, advert, photographs, specimen, bulletin, magazine, cinema, videotape, chalkboard, and bulletin board as audio-visual materials for extension work. These are the materials that can easily be manipulated and utilized for extension work. Nigerian Journal of ...

  5. Utilizing Domain Knowledge in End-to-End Audio Processing

    DEFF Research Database (Denmark)

    Tax, Tycho; Antich, Jose Luis Diez; Purwins, Hendrik

    2017-01-01

    End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model...

  6. A listening test system for automotive audio - listeners

    DEFF Research Database (Denmark)

    Choisel, Sylvain; Hegarty, Patrick; Christensen, Flemming

    2007-01-01

    A series of experiments was conducted in order to validate an experimental procedure to perform listening tests on car audio systems in a simulation of the car environment in a laboratory, using binaural synthesis with head-tracking. Seven experts and 40 non-expert listeners rated a range...

  7. Subband coding of digital audio signals without loss of quality

    NARCIS (Netherlands)

    Veldhuis, Raymond N.J.; Breeuwer, Marcel; van de Waal, Robbert

    1989-01-01

    A subband coding system for high quality digital audio signals is described. To achieve low bit rates at a high quality level, it exploits the simultaneous masking effect of the human ear. It is shown how this effect can be used in an adaptive bit-allocation scheme. The proposed approach has been

  8. Mediatheque - digitization and preservation of audio content in RTV Slovenia

    Directory of Open Access Journals (Sweden)

    Martin Žvelc

    2011-01-01

    Full Text Available RTV Slovenia’s archives contain large amounts of audio and video materials, various documents and music scores, and most of them are still in the analogue format. Widespread digitization has revolutionized the processes and ways of creating content in the digital format, recorded on different media. Such records also require new ways of preservation. In the article the development and structure of the Mediateque department at RTV Slovenia is presented. Also an overview to the preservation model of audio content is given. Due to rapid technological changes the audio content was the most critical and the first to be digitized. The intensive work in Mediatheque began in 2008 and after two years Radio Slovenia has developed modern system of permanent storage of audio content. Radio Slovenia’s Digital Archive meets all the standards and regulations applicable to modern archival systems. In the article the application of Mediarc software is also presented, which as it could be used for digitizing and permanent storage of TV Slovenia’s video archives.

  9. Overview of the 2015 Workshop on Speech, Language and Audio in Multimedia

    NARCIS (Netherlands)

    Gravier, Guillaume; Jones, Gareth J.F.; Larson, Martha; Ordelman, Roeland J.F.

    2015-01-01

    The Workshop on Speech, Language and Audio in Multimedia (SLAM) positions itself at at the crossroad of multiple scientific fields - music and audio processing, speech processing, natural language processing and multimedia - to discuss and stimulate research results, projects, datasets and

  10. Voice over: Audio-visual congruency and content recall in the gallery setting

    National Research Council Canada - National Science Library

    Merle T Fairhurst; Minnie Scott; Ophelia Deroy

    2017-01-01

    ...? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain...

  11. 372 Audio Books in the Nigerian Higher Educational System: To be ...

    African Journals Online (AJOL)

    Nekky Umera

    . It discusses the advantages and disadvantages of audio books. It examined students' familiarization with audio books and their perception about its being introduced into the school system. It was found out that Nigerian students are already ...

  12. Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking

    Science.gov (United States)

    Umapathy, K.; Ghoraani, B.; Krishnan, S.

    2010-12-01

    Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.

  13. Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking

    Directory of Open Access Journals (Sweden)

    K. Umapathy

    2010-01-01

    Full Text Available Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.

  14. Real-Time Perceptual Model for Distraction in Interfering Audio-on-Audio Scenarios

    DEFF Research Database (Denmark)

    Rämö, Jussi; Bech, Søren; Jensen, Søren Holdt

    2017-01-01

    was to utilize similar features as the previous model, but to use faster underlying algorithms to calculate these features. The results show that the proposed model has a root mean squared error of 11.9%, compared to the previous model's 11.0%, while only taking 0.04% of the computational time of the previous...

  15. Interactions audio-tactiles et perception de la parole : Comparaisons entre sujets aveugles et voyants

    OpenAIRE

    Cavé, Christian; Sato, Marc; Ménard, Lucie; Brasseur, Annie

    2010-01-01

    International audience; The present study investigated whether manual tactile information from a speaker's face modulates the decoding of speech when audio-tactile perception is compared with audio-only perception. Two groups of congenitally blind and sighted adults were compared. Participants performed a syllable decision task across three conditions: audio-only and congruent/incongruent audio-tactile conditions. For the auditory modality, the syllables were presented in a background white n...

  16. Penyisipan Teks Dengan Metode Low Bit Coding Pada Media Audio Menggunakan MATLAB 7.7.0

    OpenAIRE

    Wijaya, Hartana; - Universitas Budi Luhur, Karti Wilianti

    2013-01-01

    —Steganografi adalah teknik menyamarkan atau menyembunyikan pesan ke dalam sebuah media pembawa (carrier). Kelebihan steganografi terletak pada sifatnya yang tidak menarik perhatian atau kecurigaan orang lain. Salah satu media yang dapat digunakan sebagai carrier adalah berkas audio. Teknik steganografi pada berkas audio memanfaatkan kelemahan pendengaran manusia, karena kualitas suara antara berkas audio asli dengan berkas audio yang telah disisipkan pesan rahasia tidak jauh berbeda. Salah s...

  17. Effects of reproduction equipment on interaction with a spatial audio interface

    OpenAIRE

    Marentakis, G.; Brewster, S.A.

    2005-01-01

    Spatial audio displays have been criticized because the use of headphones may isolate users from their real world audio environment. In this paper we study the effects of three types of audio reproduction equipment (standard headphones, bone-conductance headphones and monaural presentation using a single earphone) on time and accuracy during interaction with a deictic spatial audio display. Participants selected a target sound emitting from one of four different locations in the presence of d...

  18. Deutsch Durch Audio-Visuelle Methode: An Audio-Lingual-Oral Approach to the Teaching of German.

    Science.gov (United States)

    Dickinson Public Schools, ND. Instructional Media Center.

    This teaching guide, designed to accompany Chilton's "Deutsch Durch Audio-Visuelle Methode" for German 1 and 2 in a three-year secondary school program, focuses major attention on the operational plan of the program and a student orientation unit. A section on teaching a unit discusses four phases: (1) presentation, (2) explanation, (3)…

  19. A new audio processor for combined electric and acoustic stimulation for the treatment of partial deafness.

    Science.gov (United States)

    Lorens, Artur; Zgoda, Malgorzata; Skarzynski, Henryk

    2012-07-01

    The results of this study demonstrate that a conversion from the Duet to Duet 2 audio processor greatly improved patient satisfaction and subjective benefits. The aims of this study were to compare the DUET 2 audio processor to the DUET speech processor and to evaluate DUET 2 user satisfaction subjectively. Ten experienced electric acoustic stimulation (EAS) users following partial deafness treatment upgraded from the MED-EL DUET to the DUET 2 were tested with the adaptive auditory speech test, Pruszewicz monosyllabic word test, visual analog scales, and Duet 2 user questionnaire. Tests were performed post-upgrade and compared simultaneously to the DUET at three test intervals over 3 months. Objective analyses showed that all subjects performed as well with the DUET 2 as the DUET. There was a tendency toward better results with the DUET 2. Subjective testing indicated DUET 2 user preference upon speech and musical stimuli. DUET 2 subject satisfaction was high for wearing comfort, sound quality, and for FineTuner and Private Alert features.

  20. Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations

    Directory of Open Access Journals (Sweden)

    Nancy X.R. Wang

    2016-04-01

    Full Text Available Fully automated decoding of human activities and intentions from direct neural recordings is a tantalizing challenge in brain-computer interfacing. Implementing Brain Computer Interfaces (BCIs outside carefully controlled experiments in laboratory settings requires adaptive and scalable strategies with minimal supervision. Here we describe an unsupervised approach to decoding neural states from naturalistic human brain recordings. We analyzed continuous, long-term electrocorticography (ECoG data recorded over many days from the brain of subjects in a hospital room, with simultaneous audio and video recordings. We discovered coherent clusters in high-dimensional ECoG recordings using hierarchical clustering and automatically annotated them using speech and movement labels extracted from audio and video. To our knowledge, this represents the first time techniques from computer vision and speech processing have been used for natural ECoG decoding. Interpretable behaviors were decoded from ECoG data, including moving, speaking and resting; the results were assessed by comparison with manual annotation. Discovered clusters were projected back onto the brain revealing features consistent with known functional areas, opening the door to automated functional brain mapping in natural settings.

  1. Space space space

    CERN Document Server

    Trembach, Vera

    2014-01-01

    Space is an introduction to the mysteries of the Universe. Included are Task Cards for independent learning, Journal Word Cards for creative writing, and Hands-On Activities for reinforcing skills in Math and Language Arts. Space is a perfect introduction to further research of the Solar System.

  2. Interactive 3D audio: Enhancing awareness of details in immersive soundscapes?

    DEFF Research Database (Denmark)

    Schmidt, Mikkel Nørgaard; Schwartz, Stephen; Larsen, Jan

    2012-01-01

    Spatial audio and the possibility of interacting with the audio environment is thought to increase listeners' attention to details in a soundscape. This work examines if interactive 3D audio enhances listeners' ability to recall details in a soundscape. Nine different soundscapes were constructed...

  3. 77 FR 16890 - Second Meeting: RTCA Special Committee 226, Audio Systems and Equipment

    Science.gov (United States)

    2012-03-22

    ... Federal Aviation Administration Second Meeting: RTCA Special Committee 226, Audio Systems and Equipment... meeting RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY: The FAA is issuing this notice to advise the public of the second meeting of RTCA Special Committee 226, Audio Systems and Equipment...

  4. 76 FR 79755 - First Meeting: RTCA Special Committee 226 Audio Systems and Equipment

    Science.gov (United States)

    2011-12-22

    ... Federal Aviation Administration First Meeting: RTCA Special Committee 226 Audio Systems and Equipment... RTCA Special Committee 226, Audio Systems and Equipment. SUMMARY: The FAA is issuing this notice to advise the public of a meeting of RTCA Special Committee 226, Audio Systems and Equipment, for the first...

  5. Real-Time Audio Processing on the T-CREST Multicore Platform

    DEFF Research Database (Denmark)

    Ausin, Daniel Sanz; Pezzarossa, Luca; Schoeberl, Martin

    2017-01-01

    of the audio signal. This paper presents a real-time multicore audio processing system based on the T-CREST platform. T-CREST is a time-predictable multicore processor for real-time embedded systems. Multiple audio effect tasks have been implemented, which can be connected together in different configurations...

  6. Responding Effectively to Composition Students: Comparing Student Perceptions of Written and Audio Feedback

    Science.gov (United States)

    Bilbro, J.; Iluzada, C.; Clark, D. E.

    2013-01-01

    The authors compared student perceptions of audio and written feedback in order to assess what types of students may benefit from receiving audio feedback on their essays rather than written feedback. Many instructors previously have reported the advantages they see in audio feedback, but little quantitative research has been done on how the…

  7. EEG correlates of postural audio-biofeedback.

    Science.gov (United States)

    Pirini, Marco; Mancini, Martina; Farella, Elisabetta; Chiari, Lorenzo

    2011-04-01

    The control of postural sway depends on the dynamic integration of multi-sensory information in the central nervous system. Augmentation of sensory information, such as during auditory biofeedback (ABF) of the trunk acceleration, has been shown to improve postural control. By means of quantitative electroencephalography (EEG), we examined the basic processes in the brain that are involved in the perception and cognition of auditory signals used for ABF. ABF and Fake ABF (FAKE) auditory stimulations were delivered to 10 healthy naive participants during quiet standing postural tasks, with eyes-open and closed. Trunk acceleration and 19-channels EEG were recorded at the same time. Advanced, state-of-the-art EEG analysis and modeling methods were employed to assess the possibly differential, functional activation, and localization of EEG spectral features (power in α, β, and γ bands) between the FAKE and the ABF conditions, for both the eyes-open and the eyes-closed tasks. Participants gained advantage by ABF in reducing their postural sway, as measured by a reduction of the root mean square of trunk acceleration during the ABF compared to the FAKE condition. Population-wise localization analysis performed on the comparison FAKE - ABF revealed: (i) a significant decrease of α power in the right inferior parietal cortex for the eyes-open task; (ii) a significant increase of γ power in left temporo-parietal areas for the eyes-closed task; (iii) a significant increase of γ power in the left temporo-occipital areas in the eyes-open task. EEG outcomes supported the idea that ABF for postural control heavily modulates (increases) the cortical activation in healthy participants. The sites showing the higher ABF-related modulation are among the known cortical areas associated with multi-sensory, perceptual integration, and sensorimotor integration, showing a differential activation between the eyes-open and eyes-closed conditions. Copyright © 2010 Elsevier B.V. All

  8. An introduction to audio content analysis applications in signal processing and music informatics

    CERN Document Server

    Lerch, Alexander

    2012-01-01

    "With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included"--

  9. Audio-Vestibular Findings in Increased Intracranial Hypertension Syndrome.

    Science.gov (United States)

    Çoban, Kübra; Aydın, Erdinç; Özlüoğlu, Levent Naci

    2017-04-01

    Idiopathic intracranial hypertension (IIH) can be manifested by audiological and vestibular complaints. The aim of the present study is to determine the audio-vestibular pathologies and their pathophysiologies in this syndrome by performing current audio-vestibular tests. The study was performed prospectively on 40 individuals (20 IIH patients, 20 healthy volunteers). Pure tone audiometry, tympanometry, vestibular evoked myogenic potentials, and electronystagmography tests were performed in both groups and the results were compared. The mean age of both groups was found to be 30.2±18.7. There were 11 females and 9 males in each group. The study group patients had significantly worse hearing levels. Pure tone averages were significantly higher in both ears of the study group (pintracranial pressure may affect the inner ear with similar mechanisms as in hydrops.

  10. Audio teleconferencing: creative use of a forgotten innovation.

    Science.gov (United States)

    Mather, Carey; Marlow, Annette

    2012-06-01

    As part of a regional School of Nursing and Midwifery's commitment to addressing recruitment and retention issues, approximately 90% of second year undergraduate student nurses undertake clinical placements at: multipurpose centres; regional or district hospitals; aged care; or community centres based in rural and remote regions within the State. The remaining 10% undertake professional experience placement in urban areas only. This placement of a large cohort of students, in low numbers in a variety of clinical settings, initiated the need to provide consistent support to both students and staff at these facilities. Subsequently the development of an audio teleconferencing model of clinical facilitation to guide student teaching and learning and to provide support to registered nurse preceptors in clinical practice was developed. This paper draws on Weimer's 'Personal Accounts of Change' approach to describe, discuss and evaluate the modifications that have occurred since the inception of this audio teleconferencing model (Weimer, 2006).

  11. Underdetermined Blind Audio Source Separation Using Modal Decomposition

    Directory of Open Access Journals (Sweden)

    Aïssa-El-Bey Abdeldjalil

    2007-01-01

    Full Text Available This paper introduces new algorithms for the blind separation of audio sources using modal decomposition. Indeed, audio signals and, in particular, musical signals can be well approximated by a sum of damped sinusoidal (modal components. Based on this representation, we propose a two-step approach consisting of a signal analysis (extraction of the modal components followed by a signal synthesis (grouping of the components belonging to the same source using vector clustering. For the signal analysis, two existing algorithms are considered and compared: namely the EMD (empirical mode decomposition algorithm and a parametric estimation algorithm using ESPRIT technique. A major advantage of the proposed method resides in its validity for both instantaneous and convolutive mixtures and its ability to separate more sources than sensors. Simulation results are given to compare and assess the performance of the proposed algorithms.

  12. Underdetermined Blind Audio Source Separation Using Modal Decomposition

    Directory of Open Access Journals (Sweden)

    Abdeldjalil Aïssa-El-Bey

    2007-03-01

    Full Text Available This paper introduces new algorithms for the blind separation of audio sources using modal decomposition. Indeed, audio signals and, in particular, musical signals can be well approximated by a sum of damped sinusoidal (modal components. Based on this representation, we propose a two-step approach consisting of a signal analysis (extraction of the modal components followed by a signal synthesis (grouping of the components belonging to the same source using vector clustering. For the signal analysis, two existing algorithms are considered and compared: namely the EMD (empirical mode decomposition algorithm and a parametric estimation algorithm using ESPRIT technique. A major advantage of the proposed method resides in its validity for both instantaneous and convolutive mixtures and its ability to separate more sources than sensors. Simulation results are given to compare and assess the performance of the proposed algorithms.

  13. Random Numbers Generated from Audio and Video Sources

    Directory of Open Access Journals (Sweden)

    I-Te Chen

    2013-01-01

    Full Text Available Random numbers are very useful in simulation, chaos theory, game theory, information theory, pattern recognition, probability theory, quantum mechanics, statistics, and statistical mechanics. The random numbers are especially helpful in cryptography. In this work, the proposed random number generators come from white noise of audio and video (A/V sources which are extracted from high-resolution IPCAM, WEBCAM, and MPEG-1 video files. The proposed generator applied on video sources from IPCAM and WEBCAM with microphone would be the true random number generator and the pseudorandom number generator when applied on video sources from MPEG-1 video file. In addition, when applying NIST SP 800-22 Rev.1a 15 statistics tests on the random numbers generated from the proposed generator, around 98% random numbers can pass 15 statistical tests. Furthermore, the audio and video sources can be found easily; hence, the proposed generator is a qualified, convenient, and efficient random number generator.

  14. Using content models to build audio-video summaries

    Science.gov (United States)

    Saarela, Janne; Merialdo, Bernard

    1998-12-01

    The amount of digitized video in archives is becoming so huge, that easier access and content browsing tools are desperately needed. Also, video is no longer one big piece of data, but a collection of useful smaller building blocks, which can be accessed and used independently from the original context of presentation. In this paper, we demonstrate a content model for audio video sequences, with the purpose of enabling the automatic generation of video summaries. The model is based on descriptors, which indicate various properties and relations of audio and video segments. In practice, these descriptors could either be generated automatically by methods of analysis, or produced manually (or computer-assisted) by the content provider. We analyze the requirements and characteristics of the different data segments, with respect to the problem of summarization, and we define our model as a set of constraints, which allow to produce good quality summaries.

  15. Audio Visual Media Components in Educational Game for Elementary Students

    Directory of Open Access Journals (Sweden)

    Meilani Hartono

    2016-12-01

    Full Text Available The purpose of this research was to review and implement interactive audio visual media used in an educational game to improve elementary students’ interest in learning mathematics. The game was developed for desktop platform. The art of the game was set as 2D cartoon art with animation and audio in order to make students more interest. There were four mini games developed based on the researches on mathematics study. Development method used was Multimedia Development Life Cycle (MDLC that consists of requirement, design, development, testing, and implementation phase. Data collection methods used are questionnaire, literature study, and interview. The conclusion is elementary students interest with educational game that has fun and active (moving objects, with fast tempo of music, and carefree color like blue. This educational game is hoped to be an alternative teaching tool combined with conventional teaching method.

  16. Audio Quality Assurance : An Application of Cross Correlation

    DEFF Research Database (Denmark)

    Jurik, Bolette Ammitzbøll; Nielsen, Jesper Asbjørn Sindahl

    2012-01-01

    We describe algorithms for automated quality assurance on content of audio files in context of preservation actions and access. The algorithms use cross correlation to compare the sound waves. They are used to do overlap analysis in an access scenario, where preserved radio broadcasts are used...... in research and annotated. They have been applied in a migration scenario, where radio broadcasts are to be migrated for long term preservation....

  17. Investigating Electrophysiology for Measuring Emotions Triggered by Audio Stimuli

    OpenAIRE

    Mazza, Filippo; Perreira Da Silva, Matthieu; Le Callet, Patrick

    2013-01-01

    International audience; Multimedia quality evaluation recently started to take into account also analysis of emotional response to audio- visual stimuli. This is especially true for quality of experience evaluation. Self-assessed affective reports are commonly used for this purpose. Nevertheless, measuring emotions via physiological measurement might be also considered as it could limit the effects of cognitive bias due to self-report following the rule that your body cannot lie. In this pape...

  18. Building real-time audio applications with component technology

    OpenAIRE

    Mork, Eivind

    2005-01-01

    The thesis looks at how QoS-aware applications (QSAs), with an Internet phone as a case, can be built from components (with Enterprise Java Beans (EJB) as the chosen component model). It also identifies limitations of the EJB standard that cause problems implementing real-time audio applications. The requirements to the EJB platform are identified by analyzing a generic design which is made from common components. These components are found by inspecting existing applicat...

  19. Active Learning for Automatic Audio Processing of Unwritten Languages (ALAPUL)

    Science.gov (United States)

    2016-07-01

    affixes or stem morphemes; however, this kind of knowledge is hard to operationalize if nothing more is known about the target language utterance...procedure. A reasonable expectation is that audio of this kind will lend itself to more accurate pattern matching. Furthermore, the fact that one...learning the vocabulary of the language and is using lightly supervised annotations on the data to iteratively augment and confirm the discovered

  20. Representation of sound fields for audio recording and reproduction

    OpenAIRE

    Fazi, Filippo; Noisternig, Markus; Warusfel, Olivier

    2012-01-01

    International audience; Spherical and circular microphone arrays and loudspeaker arrays are often used for the recording and reproduction of a given sound field. A number of approaches and formats are available for the representation of the recorded field, some of which are more popular among the audio engineering community (B-format, virtual microphones, etc.) whilst other representations are widely used in the literature on mathematics (generalized Fourier series, single layer potential, He...

  1. Microphone array processing for parametric spatial audio techniques

    OpenAIRE

    Politis, Archontis

    2016-01-01

    Reproduction of spatial properties of recorded sound scenes is increasingly recognised as a crucial element of all emerging immersive applications, with domestic or cinema-oriented audiovisual reproduction for entertainment, telepresence and immersive teleconferencing, and augmented and virtual reality being key examples. Such applications benefit from a general spatial audio processing framework, being able to exploit spatial information from a variety of recording formats in order to reprod...

  2. Modular Sensor Environment : Audio Visual Industry Monitoring Applications

    OpenAIRE

    Guillot, Calvin

    2017-01-01

    This work was made for Electro Waves Oy. The company specializes in Audio-visual services and interactive systems. The purpose of this work is to design and implement a modular sensor environment for the company, which will be used for developing automated systems. This thesis begins with an introduction to sensor systems and their different topologies. It is followed by an introduction to the technologies used in this project. The system is divided in three parts. The client, tha...

  3. Unsupervised Learning of Structural Representation of Percussive Audio Using a Hierarchical Dirichlet Process Hidden Markov Model

    DEFF Research Database (Denmark)

    Antich, Jose Luis Diez; Paterna, Mattia; Marxer, Richard

    2016-01-01

    A method is proposed that extracts a structural representation of percussive audio in an unsupervised manner. It consists of two parts: 1) The input signal is segmented into blocks of approximately even duration, aligned to a metrical grid, using onset and timbre feature extraction, agglomerative...... single-linkage clustering, metrical regularity calculation and beat detection. 2) The approx. equal length blocks are clustered into k clusters and the resulting cluster sequence is modelled by transition probabilities between clusters. The Hierarchical Dirichlet Process Hidden Markov Model is employed...... to jointly estimate the optimal number of sound clusters, to cluster the blocks, and to estimate the transition probabilities between clusters. The result is a segmentation of the input into a sequence of symbols (typically corresponding to hits of hi-hat, snare, bass, cymbal, etc.) that can be evaluated...

  4. Classification of Overlapped Audio Events Based on AT, PLSA, and the Combination of Them

    Directory of Open Access Journals (Sweden)

    Y. Leng

    2015-06-01

    Full Text Available Audio event classification, as an important part of Computational Auditory Scene Analysis, has attracted much attention. Currently, the classification technology is mature enough to classify isolated audio events accurately, but for overlapped audio events, it performs much worse. While in real life, most audio documents would have certain percentage of overlaps, and so the overlap classification problem is an important part of audio classification. Nowadays, the work on overlapped audio event classification is still scarce, and most existing overlap classification systems can only recognize one audio event for an overlap. In this paper, in order to deal with overlaps, we innovatively introduce the author-topic (AT model which was first proposed for text analysis into audio classification, and innovatively combine it with PLSA (Probabilistic Latent Semantic Analysis. We propose 4 systems, i.e. AT, PLSA, AT-PLSA and PLSA-AT, to classify overlaps. The 4 proposed systems have the ability to recognize two or more audio events for an overlap. The experimental results show that the 4 systems perform well in classifying overlapped audio events, whether it is the overlap in training set or the overlap out of training set. Also they perform well in classifying isolated audio events.

  5. Sounding better: fast audio cues increase walk speed in treadmill-mediated virtual rehabilitation environments.

    Science.gov (United States)

    Powell, Wendy; Stevens, Brett; Hand, Steve; Simmonds, Maureen

    2010-01-01

    Music or sound effects are often used to enhance Virtual Environments, but it is not known how this audio may influence gait speed. This study investigated the influence of audio cue tempo on treadmill walking with and without visual flow. The walking speeds of 11 individuals were recorded during exposure to a range of audio cue rates. There was a significant effect of audio tempo without visual flow, with a 16% increase in walk speed with faster audio cue tempos. Audio with visual flow resulted in a smaller but still significant increase in walking speed (8%). The results suggest that the inclusion of faster rate audio cues may be of benefit in improving walk speed in virtual rehabilitation.

  6. A method to synchronise video cameras using the audio band.

    Science.gov (United States)

    Leite de Barros, Ricardo Machado; Guedes Russomanno, Tiago; Brenzikofer, René; Jovino Figueroa, Pascual

    2006-01-01

    This paper proposes and evaluates a novel method for synchronisation of video cameras using the audio band. The method consists in generating and transmitting an audio signal through radio frequency for receivers connected to the microphone input of the cameras and inserting the signal in the audio band. In a software environment, the phase differences among the video signals are calculated and used to interpolate the synchronous 2D projections of the trajectories. The validation of the method was based on: (1) Analysis of the phase difference changes as a function of time of two video signals. (2) Comparison between the values measured with an oscilloscope and by the proposed method. (3) Estimation of the improvement in the accuracy in the measurements of the distance between two markers mounted on a rigid body during movement applying the method. The results showed that the phase difference changes in time slowly (0.150 ms/min) and linearly, even when the same model of cameras are used. The values measured by the proposed method and by oscilloscope showed equivalence (R2=0.998), the root mean square of the difference between the measurements was 0.10 ms and the maximum difference found was 0.31 ms. Applying the new method, the accuracy of the 3D reconstruction had a statistically significant improvement. The accuracy, simplicity and wide applicability of the proposed method constitute the main contributions of this work.

  7. Ears on the hand: reaching 3D audio targets

    Directory of Open Access Journals (Sweden)

    Hanneton Sylvain

    2011-12-01

    Full Text Available We studied the ability of right-handed participants to reach 3D audio targets with their right hand. Our immersive audio environment was based on the OpenAL library and Fastrak magnetic sensors for motion capture. Participants listen the target through a “virtual” listener linked to a sensor fixed either on the head or on the hand. We compare three experimental conditions in which the virtual listener is on the head, on the left hand, and on the right hand (that reach the target. We show that (1 participants are able to learn the task but (2 with a low success rate and high durations, (3 the individual levels of performance are very variable, (4 the best performances are achieved when the listener is on the right hand. Consequently, we concluded that our participants were able to learn to locate 3D audio sources even if their ears are transposed on their hand, but we found of behavioral differences between the three experimental conditions.

  8. Comparison of Linear Prediction Models for Audio Signals

    Directory of Open Access Journals (Sweden)

    2009-03-01

    Full Text Available While linear prediction (LP has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consisting of a number of sinusoids can be perfectly predicted based on an (all-pole LP model with a model order that is twice the number of sinusoids. We provide an explanation why this result cannot simply be extrapolated to LP of audio signals. If noise is taken into account in the tonal signal model, a low-order all-pole model appears to be only appropriate when the tonal components are uniformly distributed in the Nyquist interval. Based on this observation, different alternatives to the conventional LP model can be suggested. Either the model should be changed to a pole-zero, a high-order all-pole, or a pitch prediction model, or the conventional LP model should be preceded by an appropriate frequency transform, such as a frequency warping or downsampling. By comparing these alternative LP models to the conventional LP model in terms of frequency estimation accuracy, residual spectral flatness, and perceptual frequency resolution, we obtain several new and promising approaches to LP-based audio modeling.

  9. The Impact of Audio Book on the Elderly Mental Health.

    Science.gov (United States)

    Ameri, Fereshteh; Vazifeshenas, Naser; Haghparast, Abbas

    2017-01-01

    The growing elderly population calls mental health professionals to take measures concerning the treatment of the elderly mental disorders. Today in developed countries, bibliotherapy is used for the treatment of the most prevalent psychiatric disorders. Therefore, this study aimed to investigate the effects of audio book on the elderly mental health of Retirement Center of Shahid Beheshti University of Medical Sciences. This experimental study was conducted on 60 elderly people participated in 8 audio book presentation sessions, and their mental health aspects were evaluated through mental health questionnaire (SCL-90-R). Data were analyzed using SPSS 24. Data analysis revealed that the mean difference of pretest and posttest of control group is less than 5.0, so no significant difference was observed in their mental health, but this difference was significant in the experimental group (more than 5.0). Therefore, a significant improvement in mental health and its dimensions have observed in elderly people participated in audio book sessions. This therapeutic intervention was effective on mental health dimensions of paranoid ideation, psychosis, phobia, aggression, depression, interpersonal sensitivity, anxiety, obsessive-compulsive and somatic complaints. Considering the fact that our population is moving toward aging, the obtained results could be useful for policy makers and health and social planners to improve the health status of the elderly.

  10. PENGEMBANGAN MEDIA AJAR PERAWATAN DAN PERBAIKAN SISTEM AUDIO PADA MATAKULIAH WORKSHOP AUDIO VIDEO UNTUK MAHASISWA PROGRAM STUDI PENDIDIKAN TEKNIK ELEKTRO UNIVERSITAS NEGERI MALANG

    Directory of Open Access Journals (Sweden)

    Suwasono Suwasono

    2017-01-01

    Full Text Available This research addresses jobsheet and trainer learning media development for maintenance and repairment of Audio System in Audio Video Workshop Course (PTEL65 for Electrical Engineering Education student Universitas Negeri Malang. This development research was refered to Development Model suggested by Sugiyono. The result of this research were jobsheet and trainer for maintenance and repairment of Audio System supplemented with test point to measure and determine input and output of signal forms in each audio systme diagram block. The product then were set on trial within Electrical Engineering Education students Universitas Negeri Malang who have enrolled Audio Video Workshop Course year 2012/2013 and 2013/2014. The result of product trial accomplished 88.30% and categorized as appropriate. Validity examination was also cinducted. The results were 92.80% from material expertise, 91.60% from media expertise. Hence, jobsheet and trainer learning media development for maintenance and repairment of Audio System in Audio Video Workshop Course (PTEL65 is appropriate to be used in Electrical Engineering Education Department Universitas Negeri Malang= Penelitian ini bertujuan untuk mengembangkan media ajar berupa jobsheet dan trainer perawatan dan perbaikan Sistem Audio pada matakuliah Workshop Audio Video (PTEL665 untuk mahasiswa program studi Pendidikan Teknik Elektro Universitas Negeri Malang. Penelitian pengembangan ini mengacu model pengembangan Sugiyono. Hasil pengembangan media ajar berupa jobsheet dan trainer perawatan dan perbaikan Sistem Audio, yang dilengkapi dengan test point untuk mengukur dan mengetahui bentuk sinyal input dan output pada masing-masing blok diagram Sistem Audio dan saklar kesalahan untuk mengetahui kesalahan, dan menentukan langkah perbaikan Sistem Audio. Produk diujicobakan pada mahasiswa Pendidikan Teknik Elektro yang sudah menempuh matakuliah Workshop Audio Video angkatan 2012/2013 dan 2013/2014. Hasil uji coba pada

  11. On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common

    Directory of Open Access Journals (Sweden)

    Felix eWeninger

    2013-05-01

    Full Text Available Without doubt, there is emotional information in almost any kind of sound received by humans every day: be it the affective state ofa person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow’s pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of ’the sound that something makes’, in order to evaluate the systems auditory environment and its own audio output. This article aims at a first step towards a holistic computational model: Starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal and valence regression is feasible achieving significant correlations with the observer annotations of up to .78 for arousal (training on sound and testing on enacted speech and .60 for valence (training on enacted speech and testing on music. The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

  12. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.

    Science.gov (United States)

    Weninger, Felix; Eyben, Florian; Schuller, Björn W; Mortillaro, Marcello; Scherer, Klaus R

    2013-01-01

    WITHOUT DOUBT, THERE IS EMOTIONAL INFORMATION IN ALMOST ANY KIND OF SOUND RECEIVED BY HUMANS EVERY DAY: be it the affective state of a person transmitted by means of speech; the emotion intended by a composer while writing a musical piece, or conveyed by a musician while performing it; or the affective state connected to an acoustic event occurring in the environment, in the soundtrack of a movie, or in a radio play. In the field of affective computing, there is currently some loosely connected research concerning either of these phenomena, but a holistic computational model of affect in sound is still lacking. In turn, for tomorrow's pervasive technical systems, including affective companions and robots, it is expected to be highly beneficial to understand the affective dimensions of "the sound that something makes," in order to evaluate the system's auditory environment and its own audio output. This article aims at a first step toward a holistic computational model: starting from standard acoustic feature extraction schemes in the domains of speech, music, and sound analysis, we interpret the worth of individual features across these three domains, considering four audio databases with observer annotations in the arousal and valence dimensions. In the results, we find that by selection of appropriate descriptors, cross-domain arousal, and valence regression is feasible achieving significant correlations with the observer annotations of up to 0.78 for arousal (training on sound and testing on enacted speech) and 0.60 for valence (training on enacted speech and testing on music). The high degree of cross-domain consistency in encoding the two main dimensions of affect may be attributable to the co-evolution of speech and music from multimodal affect bursts, including the integration of nature sounds for expressive effects.

  13. Digital Spaces, Material Traces : Investigating the Performance of Gender, Sexuality, and Embodiment on Internet Platforms that feature User-Generated Content

    NARCIS (Netherlands)

    van Doorn, N.A.J.M.

    2010-01-01

    As ever more social interaction and cultural production takes place in the networked digital spaces of the internet, it is crucial to develop an understanding of the ways in which gender and sexuality are articulated in these online practices. Through four comparative case studies, this dissertation

  14. Understanding Legacy Features with Featureous

    DEFF Research Database (Denmark)

    Olszak, Andrzej; Jørgensen, Bo Nørregaard

    2011-01-01

    Feature-centric comprehension of source code is essential during software evolution. However, such comprehension is oftentimes difficult to achieve due the discrepancies between structural and functional units of object-oriented programs. We present a tool for feature-centric analysis of legacy...... Java programs called Featureous that addresses this issue. Featureous allows a programmer to easily establish feature-code traceability links and to analyze their characteristics using a number of visualizations. Featureous is an extension to the NetBeans IDE, and can itself be extended by third...

  15. MAIN SURFACE FEATURES

    Directory of Open Access Journals (Sweden)

    MINTIANSCHI Andrei V.

    2010-07-01

    Full Text Available Surface characterization means splitting the surface geometry into basic components based usually on some functional requirement. These components can have different shapes, scales of size, distribution in space and can be constrained by multiple boundaries in height and position. The measurement can influence the importance of a parameter or feature. This paper presents the main features that need to be considered when a surface is analyzed, especially the roughness and the waviness.

  16. Improving Music Genre Classification by Short Time Feature Integration

    DEFF Research Database (Denmark)

    Meng, Anders; Ahrendt, Peter; Larsen, Jan

    . The problem of making new features on the larger time scale from the short-time features (feature integration) has only received little attention. This paper investigates different methods for feature integration (early information fusion) and late information fusion (assembling of probabilistic outputs......Many different short-time features (derived from 10-30ms of audio) have been proposed for music segmentation, retrieval and genre classification. Often the available time frame of the music to make a decision (the decision time horizon) is in the range of seconds instead of milliseconds...

  17. Music and audio - oh how they can stress your network

    Science.gov (United States)

    Fletcher, R.

    Nearly ten years ago a paper written by the Audio Engineering Society (AES)[1] made a number of interesting statements: 1. 2. The current Internet is inadequate for transmitting music and professional audio. Performance and collaboration across a distance stress beyond acceptable bounds the quality of service Audio and music provide test cases in which the bounds of the network are quickly reached and through which the defects in a network are readily perceived. Given these key points, where are we now? Have we started to solve any of the problems from the musician's point of view? What is it that musician would like to do that can cause the network so many problems? To understand this we need to appreciate that a trained musician's ears are extremely sensitive to very subtle shifts in temporal materials and localisation information. A shift of a few milliseconds can cause difficulties. So, can modern networks provide the temporal accuracy demanded at this level? The sample and bit rates needed to represent music in the digital domain is still contentious, but a general consensus in the professional world is for 96 KHz and IEEE 64-bit floating point. If this was to be run between two points on the network across 24 channels in near real time to allow for collaborative composition/production/performance, with QOS settings to allow as near to zero latency and jitter, it can be seen that the network indeed has to perform very well. Lighting the Blue Touchpaper for UK e-Science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK 26-28 March, 200

  18. Audio Logo Recognition, Reduced Articulation and Coding Orientation

    DEFF Research Database (Denmark)

    Bonde, Anders; Hansen, Allan Grutt

    2013-01-01

    connected to notions of brand recognisability and brand identification, thus resulting in the concept of ‘Reduced Articulation Form’ (RAF). The concept has been tested empirically through a survey of 137 upper secondary school students. On the basis of a conditioning experiment, manipulating five existing...... audio logos in terms of tempo, rhythm, pitch and timbre, the students filled out a structured questionnaire and assessed at which condition they were able to recognise the logos and the corresponding brands. The results indicated that pitch is a much more recognisable trait than rhythm. Also, while...

  19. Digital video and audio broadcasting technology a practical engineering guide

    CERN Document Server

    Fischer, Walter

    2010-01-01

    Digital Video and Audio Broadcasting Technology - A Practical Engineering Guide' deals with all the most important digital television, sound radio and multimedia standards such as MPEG, DVB, DVD, DAB, ATSC, T-DMB, DMB-T, DRM and ISDB-T. The book provides an in-depth look at these subjects in terms of practical experience. In addition it contains chapters on the basics of technologies such as analog television, digital modulation, COFDM or mathematical transformations between time and frequency domains. The attention in the respective field under discussion is focussed on aspects of measuring t

  20. Audio-Visual Perception System for a Humanoid Robotic Head

    OpenAIRE

    Raquel Viciana-Abad; Rebeca Marfil; Perez-Lorenzo, Jose M.; Juan P. Bandera; Adrian Romero-Garces; Pedro Reche-Lopez

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can...

  1. The complete guide to high-end audio

    CERN Document Server

    Harley, Robert

    2015-01-01

    An updated edition of what many consider the "bible of high-end audio"   In this newly revised and updated fifth edition, Robert Harley, editor in chief of the Absolute Sound magazine, tells you everything you need to know about buying and enjoying high-quality hi-fi. With this book, discover how to get the best sound for your money, how to identify the weak links in your system and upgrade where it will do the most good, how to set up and tweak your system for maximum performance, and how to become a more perceptive and appreciative listener. Just a few of the secrets you will learn cover hi

  2. Tools for signal compression applications to speech and audio coding

    CERN Document Server

    Moreau, Nicolas

    2013-01-01

    This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t

  3. An assessment of individualized technical ear training for audio production.

    Science.gov (United States)

    Kim, Sungyoung

    2015-07-01

    An individualized technical ear training method is compared to a non-individualized method. The efficacy of the individualized method is assessed using a standardized test conducted before and after the training period. Participants who received individualized training improved better than the control group on the test. Results indicate the importance of individualized training for acquisition of spectrum-identification and spectrum-matching skills. Individualized training, therefore, should be implemented by default into technical ear training programs used in audio production industry and education.

  4. Utilization of non-linear converters for audio amplification

    DEFF Research Database (Denmark)

    Iversen, Niels Elkjær; Birch, Thomas; Knott, Arnold

    2012-01-01

    Class D amplifiers fits the automotive demands quite well. The traditional buck-based amplifier has reduced both the cost and size of amplifiers. However the buck topology is not without its limitations. The maximum peak AC output voltage produced by the power stage is only equal the supply voltage....... The introduction of non-linear converters for audio amplification defeats this limitation. A Cuk converter, designed to deliver an AC peak output voltage twice the supply voltage, is presented in this paper. A 3V prototype has been developed to prove the concept. The prototype shows that it is possible to achieve...

  5. Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison

    Science.gov (United States)

    Yu, Yi; Joe, Kazuki; Downie, J. Stephen

    This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E2LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.

  6. Comparison of audio vs. written feedback on clinical assignments of nursing students.

    Science.gov (United States)

    Bourgault, Annette M; Mundy, Cynthia; Joshua, Thomas

    2013-01-01

    This pilot study explored using audio recordings as method of feedback for weekly clinical assignments of nursing students. Feedback that provides students with insight into their performance is an essential component of nursing education. Audio methods have been used to communicate feedback on written assignments in other disciplines, but this method has not been reported in the nursing literature. A survey and VARK questionnaire were completed by eight nursing students. Each student had randomly received written and audio feedback during an eight-week period. There were no differences between written and audio methods. Students perceived audio as the most personal, easy to understand, and positive method. Only one student expressed a preference for written feedback.There was no difference in instructor time. Audio feedback is an innovative method of feedback for clinical assignments of 'Net Generation' nursing students.

  7. Voice radio communication, pedestrian localization, and the tactical use of 3D audio

    OpenAIRE

    Nilsson, John-Olof; Schüldt, Christian; Händel, Peter

    2013-01-01

    The relation between voice radio communication and pedestrian localization is studied. 3D audio is identified as a linking technology which brings strong mutual benefits. Voice communication rendered with 3D audio provides a potential low secondary task interference user interface to the localization information. Vice versa, location information in the 3D audio provides spatial cues in the voice communication, improving speech intelligibility. An experimental setup with voice radio communicat...

  8. Audio editing in the time-frequency domain using the Gabor Wavelet Transform

    OpenAIRE

    Hammarqvist, Ulf

    2011-01-01

    Visualization, processing and editing of audio, directly on a time-frequency surface, is the scope of this thesis. More precisely the scalogram produced by a Gabor Wavelet transform is used, which is a powerful alternative to traditional techinques where the wave form is the main visual aid and editting is performed by parametric filters. Reconstruction properties, scalogram design and enhancements as well audio manipulation algorithms are investigated for this audio representation.The scalog...

  9. Audio vs. chat: The effects of group size on media choice

    OpenAIRE

    Löber, Andreas; Schwabe, Gerhard; Grimm, Sibylle

    2007-01-01

    The increasing usage of audio and chat communication in private and commercial cooperative settings requires new insight into choosing the appropriate media for collaborative tasks. The paper presents the results of two series of experiments comparing audio and chat communication with varying group sizes. The experimental data indicates that chat scales up better to an increase in group size than audio. We propose that the media richness theory appropriately predicts the productivity of small...

  10. Generative Audio Systems: Musical Applications of Time-Varying Feedback Networks and Computational Aesthetics

    OpenAIRE

    Surges, Gregory

    2015-01-01

    This dissertation is focused on the development of generative audio systems - a term used describe generative music systems that generate both formal structure and synthesized audio content from the same audio-rate computational process. In other words, a system wherein the synthesis and organizational processes are inseparable and operate at the sample level.First, a series of generative software systems are described. These systems each employ a different method to create generativity and, ...

  11. Multidimensional Attributes of the Sense of Presence in Audio-Visual Content

    Directory of Open Access Journals (Sweden)

    Kazutomo Fukue

    2011-10-01

    Full Text Available The sense of presence is crucial for evaluating audio-visual equipment and content. To clarify the multidimensional attributes of the sense, we conducted three experiments on audio, visual, and audio-visual content items. Initially 345 adjectives, which express the sense of presence, were collected and the number of adjectives was reduced to 40 pairs based on the KJ method. Forty scenes were recorded with a high-definition video camera while their sounds were recorded using a dummy head. Each content item was reproduced with a 65-inch display and headphones in three conditions of audio-only, visual-only and audio-visual. Twenty-one subjects evaluated them using the 40 pairs of adjectives by the Semantic Differential method with seven-point scales. The sense of presence in each content item was also evaluated using a Likert scale. The experimental data was analyzed by the factor analysis and four, five and five factors were extracted for audio, visual, and audio-visual conditions, respectively. The multiple regression analysis revealed that audio and audio-visual presences were explained by the extracted factors, although further consideration is required for the visual presence. These results indicated that the factors of psychological loading and activity are relevant for the sense of presence.

  12. On the relative importance of audio and video in the presence of packet losses

    DEFF Research Database (Denmark)

    Korhonen, Jari; Reiter, Ulrich; Myakotnykh, Eugene

    2010-01-01

    In streaming applications, unequal protection of audio and video tracks may be necessary to maintain the optimal perceived overall quality. For this purpose, the application should be aware of the relative importance of audio and video in an audiovisual sequence. In this paper, we propose a subje...... and video quality, but also that the currently used classification criteria for content are not sufficient to predict the users’ preference...... a subjective test arrangement for finding the optimal tradeoff between subjective audio and video qualities in situations when it is not possible to have perfect quality for both modalities concurrently. Our results show that content poses a significant impact on the preferred compromise between audio...

  13. Paper-Based Textbooks with Audio Support for Print-Disabled Students.

    Science.gov (United States)

    Fujiyoshi, Akio; Ohsawa, Akiko; Takaira, Takuya; Tani, Yoshiaki; Fujiyoshi, Mamoru; Ota, Yuko

    2015-01-01

    Utilizing invisible 2-dimensional codes and digital audio players with a 2-dimensional code scanner, we developed paper-based textbooks with audio support for students with print disabilities, called "multimodal textbooks." Multimodal textbooks can be read with the combination of the two modes: "reading printed text" and "listening to the speech of the text from a digital audio player with a 2-dimensional code scanner." Since multimodal textbooks look the same as regular textbooks and the price of a digital audio player is reasonable (about 30 euro), we think multimodal textbooks are suitable for students with print disabilities in ordinary classrooms.

  14. Audio-Tactile Integration and the Influence of Musical Training

    Science.gov (United States)

    Kuchenbuch, Anja; Paraskevopoulos, Evangelos; Herholz, Sibylle C.; Pantev, Christo

    2014-01-01

    Perception of our environment is a multisensory experience; information from different sensory systems like the auditory, visual and tactile is constantly integrated. Complex tasks that require high temporal and spatial precision of multisensory integration put strong demands on the underlying networks but it is largely unknown how task experience shapes multisensory processing. Long-term musical training is an excellent model for brain plasticity because it shapes the human brain at functional and structural levels, affecting a network of brain areas. In the present study we used magnetoencephalography (MEG) to investigate how audio-tactile perception is integrated in the human brain and if musicians show enhancement of the corresponding activation compared to non-musicians. Using a paradigm that allowed the investigation of combined and separate auditory and tactile processing, we found a multisensory incongruency response, generated in frontal, cingulate and cerebellar regions, an auditory mismatch response generated mainly in the auditory cortex and a tactile mismatch response generated in frontal and cerebellar regions. The influence of musical training was seen in the audio-tactile as well as in the auditory condition, indicating enhanced higher-order processing in musicians, while the sources of the tactile MMN were not influenced by long-term musical training. Consistent with the predictive coding model, more basic, bottom-up sensory processing was relatively stable and less affected by expertise, whereas areas for top-down models of multisensory expectancies were modulated by training. PMID:24465675

  15. Separate mechanisms for audio-tactile pitch and loudness interactions

    Directory of Open Access Journals (Sweden)

    Jeffrey M Yau

    2010-10-01

    Full Text Available A major goal in perceptual neuroscience is to understand how signals from different sensory modalities are combined to produce stable and coherent representations. We previously investigated interactions between audition and touch, motivated by the fact that both modalities are sensitive to environmental oscillations. In our earlier study, we characterized the effect of auditory distractors on tactile frequency and intensity perception. Here, we describe the converse experiments examining the effect of tactile distractors on auditory processing. Because the two studies employ the same psychophysical paradigm, we combined their results for a comprehensive view of how auditory and tactile signals interact and how these interactions depend on the perceptual task. Together, our results show that temporal frequency representations are perceptually linked regardless of the attended modality. In contrast, audio-tactile loudness interactions depend on the attended modality: Tactile distractors influence judgments of auditory intensity, but judgments of tactile intensity are impervious to auditory distraction. Lastly, we show that audio-tactile loudness interactions depend critically on stimulus timing, while pitch interactions do not. These results reveal that auditory and tactile inputs are combined differently depending on the perceptual task. That distinct rules govern the integration of auditory and tactile signals in pitch and loudness perception implies that the two are mediated by separate neural mechanisms. These findings underscore the complexity and specificity of multisensory interactions.

  16. Automatic processing of CERN video, audio and photo archives

    Science.gov (United States)

    Kwiatek, M.

    2008-07-01

    The digitalization of CERN audio-visual archives, a major task currently in progress, will generate over 40 TB of video, audio and photo files. Storing these files is one issue, but a far more important challenge is to provide long-time coherence of the archive and to make these files available on-line with minimum manpower investment. An infrastructure, based on standard CERN services, has been implemented, whereby master files, stored in the CERN Distributed File System (DFS), are discovered and scheduled for encoding into lightweight web formats based on predefined profiles. Changes in master files, conversion profiles or in the metadata database (read from CDS, the CERN Document Server) are automatically detected and the media re-encoded whenever necessary. The encoding processes are run on virtual servers provided on-demand by the CERN Server Self Service Centre, so that new servers can be easily configured to adapt to higher load. Finally, the generated files are made available from the CERN standard web servers with streaming implemented using Windows Media Services.

  17. Speech and audio processing for coding, enhancement and recognition

    CERN Document Server

    Togneri, Roberto; Narasimha, Madihally

    2015-01-01

    This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. ·         Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; ·         Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; ·     �...

  18. Information-Driven Active Audio-Visual Source Localization.

    Directory of Open Access Journals (Sweden)

    Niclas Schult

    Full Text Available We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

  19. Audio-tactile integration and the influence of musical training.

    Directory of Open Access Journals (Sweden)

    Anja Kuchenbuch

    Full Text Available Perception of our environment is a multisensory experience; information from different sensory systems like the auditory, visual and tactile is constantly integrated. Complex tasks that require high temporal and spatial precision of multisensory integration put strong demands on the underlying networks but it is largely unknown how task experience shapes multisensory processing. Long-term musical training is an excellent model for brain plasticity because it shapes the human brain at functional and structural levels, affecting a network of brain areas. In the present study we used magnetoencephalography (MEG to investigate how audio-tactile perception is integrated in the human brain and if musicians show enhancement of the corresponding activation compared to non-musicians. Using a paradigm that allowed the investigation of combined and separate auditory and tactile processing, we found a multisensory incongruency response, generated in frontal, cingulate and cerebellar regions, an auditory mismatch response generated mainly in the auditory cortex and a tactile mismatch response generated in frontal and cerebellar regions. The influence of musical training was seen in the audio-tactile as well as in the auditory condition, indicating enhanced higher-order processing in musicians, while the sources of the tactile MMN were not influenced by long-term musical training. Consistent with the predictive coding model, more basic, bottom-up sensory processing was relatively stable and less affected by expertise, whereas areas for top-down models of multisensory expectancies were modulated by training.

  20. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Berthommier Frédéric

    2002-01-01

    Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.