video audio text: Topics by WorldWideScience.org

Sample records for video audio text

Big Data Analytics: Challenges And Applications For Text, Audio, Video, And Social Media Data

OpenAIRE

Jai Prakash Verma; Smita Agrawal; Bankim Patel; Atul Patel

2016-01-01

All types of machine automated systems are generating large amount of data in different forms like statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we are discussing issues, challenges, and application of these types of Big Data with the consideration of big data dimensions. Here we are discussing social media data analytics, content based analytics, text data analytics, audio, and video data analytics their issues and expected applica...
Wavelet-based audio embedding and audio/video compression

Science.gov (United States)

Mendenhall, Michael J.; Claypoole, Roger L., Jr.

2001-12-01

Watermarking, traditionally used for copyright protection, is used in a new and exciting way. An efficient wavelet-based watermarking technique embeds audio information into a video signal. Several effective compression techniques are applied to compress the resulting audio/video signal in an embedded fashion. This wavelet-based compression algorithm incorporates bit-plane coding, index coding, and Huffman coding. To demonstrate the potential of this audio embedding and audio/video compression algorithm, we embed an audio signal into a video signal and then compress. Results show that overall compression rates of 15:1 can be achieved. The video signal is reconstructed with a median PSNR of nearly 33 dB. Finally, the audio signal is extracted from the compressed audio/video signal without error.
Mobile Message Services Using Text, Audio or Video for Improving the Learning Infrastructure in Higher Education

Directory of Open Access Journals (Sweden)

BjÃƒÂ¶rn Olof Hedin

2006-06-01

Full Text Available This study examines how media files sent to mobile phones can be used to improve education at universities, and describes a prototype implement of such a system using standard components. To accomplish this, university students were equipped with mobile phones and software that allowed teachers to send text-based, audio-based and video-based messages to the students. Data was collected using questionnaires, focus groups and log files. The conclusions were that students preferred to have information and learning content sent as text, rather than audio or video. Text messages sent to phones should be no longer than 2000 characters. The most appreciated services were notifications of changes in course schedules, short lecture introductions and reminders. The prototype showed that this functionality is easy to implement using standard components.
Categorizing Video Game Audio

DEFF Research Database (Denmark)

Westerberg, Andreas Rytter; Schoenau-Fog, Henrik

2015-01-01

they can use audio in video games. The conclusion of this study is that the current models' view of the diegetic spaces, used to categorize video game audio, is not t to categorize all sounds. This can however possibly be changed though a rethinking of how the player interprets audio.......This paper dives into the subject of video game audio and how it can be categorized in order to deliver a message to a player in the most precise way. A new categorization, with a new take on the diegetic spaces, can be used a tool of inspiration for sound- and game-designers to rethink how...
Web Audio/Video Streaming Tool

Science.gov (United States)

Guruvadoo, Eranna K.

2003-01-01

In order to promote NASA-wide educational outreach program to educate and inform the public of space exploration, NASA, at Kennedy Space Center, is seeking efficient ways to add more contents to the web by streaming audio/video files. This project proposes a high level overview of a framework for the creation, management, and scheduling of audio/video assets over the web. To support short-term goals, the prototype of a web-based tool is designed and demonstrated to automate the process of streaming audio/video files. The tool provides web-enabled users interfaces to manage video assets, create publishable schedules of video assets for streaming, and schedule the streaming events. These operations are performed on user-defined and system-derived metadata of audio/video assets stored in a relational database while the assets reside on separate repository. The prototype tool is designed using ColdFusion 5.0.
Automated processing of massive audio/video content using FFmpeg

Directory of Open Access Journals (Sweden)

Kia Siang Hock

2014-01-01

Full Text Available Audio and video content forms an integral, important and expanding part of the digital collections in libraries and archives world-wide. While these memory institutions are familiar and well-versed in the management of more conventional materials such as books, periodicals, ephemera and images, the handling of audio (e.g., oral history recordings and video content (e.g., audio-visual recordings, broadcast content requires additional toolkits. In particular, a robust and comprehensive tool that provides a programmable interface is indispensable when dealing with tens of thousands of hours of audio and video content. FFmpeg is comprehensive and well-established open source software that is capable of the full-range of audio/video processing tasks (such as encode, decode, transcode, mux, demux, stream and filter. It is also capable of handling a wide-range of audio and video formats, a unique challenge in memory institutions. It comes with a command line interface, as well as a set of developer libraries that can be incorporated into applications.
Mobile video-to-audio transducer and motion detection for sensory substitution

Directory of Open Access Journals (Sweden)

Maxime eAmbard

2015-10-01

Full Text Available Visuo-auditory sensory substitution systems are augmented reality devices that translate a video stream into an audio stream in order to help the blind in daily tasks requiring visuo-spatial information. In this work, we present both a new mobile device and a transcoding method specifically designed to sonify moving objects. Frame differencing is used to extract spatial features from the video stream and two-dimensional spatial information is converted into audio cues using pitch, interaural time difference and interaural level difference. Using numerical methods, we attempt to reconstruct visuo-spatial information based on audio signals generated from various video stimuli. We show that despite a contrasted visual background and a highly lossy encoding method, the information in the audio signal is sufficient to allow object localization, object trajectory evaluation, object approach detection, and spatial separation of multiple objects. We also show that this type of audio signal can be interpreted by human users by asking ten subjects to discriminate trajectories based on generated audio signals.
Robust audio-visual speech recognition under noisy audio-video conditions.

Science.gov (United States)

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
PENGEMBANGAN MULTIMEDIA PEMBELAJARAN FISIKA BERBASIS AUDIO-VIDEO EKSPERIMEN LISTRIK DINAMIS DI SMP

Directory of Open Access Journals (Sweden)

P. Rante

2013-10-01

Full Text Available Penelitian pengembangan ini dilakukan dengan tujuan untuk melihat profil pengembangan multimedia pembelajaran fisika berbasis audio-video eksperimen listrik dinamis yang dapat menjadi solusi ketidakterlaksanaan praktikum di sekolah. Hasil penelitian menunjukkan bahwa propil multimedia berbasis audio-video eksperimen dari segi tampilan menarik, fasilitas runtut, sistematis dan praktis digunakan serta menjadi solusi ketidakterlaksanaan praktikum di sekolah. Produk akhir adalah sebuah paket CD autorun multimedia pembelajaran interaktif sebagai media pembelajaran mandiri dan sebagai media presentase yang dilengkapi perangkat pembelajaran untuk guru.Â This research aims to see the profile of multimedia learning development on physics based audio-video on the topic dynamic electricity experiment that may become a solution of practicum that not mastered well in the school. The result shows that the profile of develop multimedia based audio-video experiment has interesting display, harmonious facilities, systematic and practical in used as well as become a solution of the practicum that not mastered yet. The final product produced an auto run CD package of interactive learning multimedia as a self learning media and as a representation of media that equipped with teaching and learning media for teacher.
Design of batch audio/video conversion platform based on JavaEE

Science.gov (United States)

Cui, Yansong; Jiang, Lianpin

2018-03-01

With the rapid development of digital publishing industry, the direction of audio / video publishing shows the diversity of coding standards for audio and video files, massive data and other significant features. Faced with massive and diverse data, how to quickly and efficiently convert to a unified code format has brought great difficulties to the digital publishing organization. In view of this demand and present situation in this paper, basing on the development architecture of Sptring+SpringMVC+Mybatis, and combined with the open source FFMPEG format conversion tool, a distributed online audio and video format conversion platform with a B/S structure is proposed. Based on the Java language, the key technologies and strategies designed in the design of platform architecture are analyzed emphatically in this paper, designing and developing a efficient audio and video format conversion system, which is composed of “Front display system”, "core scheduling server " and " conversion server ". The test results show that, compared with the ordinary audio and video conversion scheme, the use of batch audio and video format conversion platform can effectively improve the conversion efficiency of audio and video files, and reduce the complexity of the work. Practice has proved that the key technology discussed in this paper can be applied in the field of large batch file processing, and has certain practical application value.
ENERGY STAR Certified Audio Video

Data.gov (United States)

U.S. Environmental Protection Agency — Certified models meet all ENERGY STAR requirements as listed in the Version 3.0 ENERGY STAR Program Requirements for Audio Video Equipment that are effective as of...
Hierarchical structure for audio-video based semantic classification of sports video sequences

Science.gov (United States)

Kolekar, M. H.; Sengupta, S.

2005-07-01

A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.
Linking Video and Text via Representations of Narrative

OpenAIRE

Salway, Andrew; Graham, Mike; Tomadaki, Eleftheria; Xu, Yan

2003-01-01

The ongoing TIWO project is investigating the synthesis of language technologies, like information extraction and corpus-based text analysis, video data modeling and knowledge representation. The aim is to develop a computational account of how video and text can be integrated by representations of narrative in multimedia systems. The multimedia domain is that of film and audio description – an emerging text type that is produced specifically to be informative about the events and objects dep...
News video story segmentation method using fusion of audio-visual features

Science.gov (United States)

Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang

2007-11-01

News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.
On the relative importance of audio and video in the presence of packet losses

DEFF Research Database (Denmark)

Korhonen, Jari; Reiter, Ulrich; Myakotnykh, Eugene

2010-01-01

In streaming applications, unequal protection of audio and video tracks may be necessary to maintain the optimal perceived overall quality. For this purpose, the application should be aware of the relative importance of audio and video in an audiovisual sequence. In this paper, we propose...... a subjective test arrangement for finding the optimal tradeoff between subjective audio and video qualities in situations when it is not possible to have perfect quality for both modalities concurrently. Our results show that content poses a significant impact on the preferred compromise between audio...... and video quality, but also that the currently used classification criteria for content are not sufficient to predict the users’ preference...
Audio scene segmentation for video with generic content

Science.gov (United States)

Niu, Feng; Goela, Naveen; Divakaran, Ajay; Abdel-Mottaleb, Mohamed

2008-01-01

In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Directory of Open Access Journals (Sweden)

W. H. Adams

2003-02-01

Full Text Available We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM, hidden Markov models (HMM, and support vector machines (SVM. Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.
Streaming Audio and Video: New Challenges and Opportunities for Museums.

Science.gov (United States)

Spadaccini, Jim

Streaming audio and video present new challenges and opportunities for museums. Streaming media is easier to author and deliver to Internet audiences than ever before; digital video editing is commonplace now that the tools--computers, digital video cameras, and hard drives--are so affordable; the cost of serving video files across the Internet…
A scheme for racquet sports video analysis with the combination of audio-visual information

Science.gov (United States)

Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

2005-07-01

As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.
Agency Video, Audio and Imagery Library

Science.gov (United States)

Grubbs, Rodney

2015-01-01

The purpose of this presentation was to inform the ISS International Partners of the new NASA Agency Video, Audio and Imagery Library (AVAIL) website. AVAIL is a new resource for the public to search for and download NASA-related imagery, and is not intended to replace the current process by which the International Partners receive their Space Station imagery products.

Hierarchical vs non-hierarchical audio indexation and classification for video genres

Science.gov (United States)

Dammak, Nouha; BenAyed, Yassine

2018-04-01

In this paper, Support Vector Machines (SVMs) are used for segmenting and indexing video genres based on only audio features extracted at block level, which has a prominent asset by capturing local temporal information. The main contribution of our study is to show the wide effect on the classification accuracies while using an hierarchical categorization structure based on Mel Frequency Cepstral Coefficients (MFCC) audio descriptor. In fact, the classification consists in three common video genres: sports videos, music clips and news scenes. The sub-classification may divide each genre into several multi-speaker and multi-dialect sub-genres. The validation of this approach was carried out on over 360 minutes of video span yielding a classification accuracy of over 99%.
On the definition of adapted audio/video profiles for high-quality video calling services over LTE/4G

Science.gov (United States)

Ndiaye, Maty; Quinquis, Catherine; Larabi, Mohamed Chaker; Le Lay, Gwenael; Saadane, Hakim; Perrine, Clency

2014-01-01

During the last decade, the important advances and widespread availability of mobile technology (operating systems, GPUs, terminal resolution and so on) have encouraged a fast development of voice and video services like video-calling. While multimedia services have largely grown on mobile devices, the generated increase of data consumption is leading to the saturation of mobile networks. In order to provide data with high bit-rates and maintain performance as close as possible to traditional networks, the 3GPP (The 3rd Generation Partnership Project) worked on a high performance standard for mobile called Long Term Evolution (LTE). In this paper, we aim at expressing recommendations related to audio and video media profiles (selection of audio and video codecs, bit-rates, frame-rates, audio and video formats) for a typical video-calling services held over LTE/4G mobile networks. These profiles are defined according to targeted devices (smartphones, tablets), so as to ensure the best possible quality of experience (QoE). Obtained results indicate that for a CIF format (352 x 288 pixels) which is usually used for smartphones, the VP8 codec provides a better image quality than the H.264 codec for low bitrates (from 128 to 384 kbps). However sequences with high motion, H.264 in slow mode is preferred. Regarding audio, better results are globally achieved using wideband codecs offering good quality except for opus codec (at 12.2 kbps).
BDVC (Bimodal Database of Violent Content): A database of violent audio and video

Science.gov (United States)

Rivera Martínez, Jose Luis; Mijes Cruz, Mario Humberto; Rodríguez Vázqu, Manuel Antonio; Rodríguez Espejo, Luis; Montoya Obeso, Abraham; García Vázquez, Mireya Saraí; Ramírez Acosta, Alejandro Álvaro

2017-09-01

Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
Assessing the importance of audio/video synchronization for simultaneous translation of video sequences

OpenAIRE

Staelens, Nicolas; De Meulenaere, Jonas; Bleumers, Lizzy; Van Wallendael, Glenn; De Cock, Jan; Geeraert, Koen; Vercammen, Nick; Van den Broeck, Wendy; Vermeulen, Brecht; Van de Walle, Rik; Demeester, Piet

2012-01-01

Lip synchronization is considered a key parameter during interactive communication. In the case of video conferencing and television broadcasting, the differential delay between audio and video should remain below certain thresholds, as recommended by several standardization bodies. However, further research has also shown that these thresholds can be relaxed, depending on the targeted application and use case. In this article, we investigate the influence of lip sync on the ability to perfor...
Real-Time Transmission and Storage of Video, Audio, and Health Data in Emergency and Home Care Situations

Directory of Open Access Journals (Sweden)

Riccardo Stagnaro

2007-01-01

Full Text Available The increase in the availability of bandwidth for wireless links, network integration, and the computational power on fixed and mobile platforms at affordable costs allows nowadays for the handling of audio and video data, their quality making them suitable for medical application. These information streams can support both continuous monitoring and emergency situations. According to this scenario, the authors have developed and implemented the mobile communication system which is described in this paper. The system is based on ITU-T H.323 multimedia terminal recommendation, suitable for real-time data/video/audio and telemedical applications. The audio and video codecs, respectively, H.264 and G723.1, were implemented and optimized in order to obtain high performance on the system target processors. Offline media streaming storage and retrieval functionalities were supported by integrating a relational database in the hospital central system. The system is based on low-cost consumer technologies such as general packet radio service (GPRS and wireless local area network (WLAN or WiFi for lowband data/video transmission. Implementation and testing were carried out for medical emergency and telemedicine application. In this paper, the emergency case study is described.
Exploring Meaning Negotiation Patterns in Synchronous Audio and Video Conferencing English Classes in China

Science.gov (United States)

Li, Chenxi; Wu, Ligao; Li, Chen; Tang, Jinlan

2017-01-01

This work-in-progress doctoral research project aims to identify meaning negotiation patterns in synchronous audio and video Computer-Mediated Communication (CMC) environments based on the model of CMC text chat proposed by Smith (2003). The study was conducted in the Institute of Online Education at Beijing Foreign Studies University. Four dyads…
Non Audio-Video gesture recognition system

DEFF Research Database (Denmark)

Craciunescu, Razvan; Mihovska, Albena Dimitrova; Kyriazakos, Sofoklis

2016-01-01

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current research focus includes on the emotion...... recognition from the face and hand gesture recognition. Gesture recognition enables humans to communicate with the machine and interact naturally without any mechanical devices. This paper investigates the possibility to use non-audio/video sensors in order to design a low-cost gesture recognition device...
Semantic Analysis of Multimedial Information Usign Both Audio and Visual Clues

Directory of Open Access Journals (Sweden)

Andrej Lukac

2008-01-01

Full Text Available Nowadays, there is a lot of information in databases (text, audio/video form, etc.. It is important to be able to describe this data for better orientation in them. It is necessary to apply audio/video properties, which are used for metadata management, segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, indexing, summarizing the document for efficient retrieval and browsing. Data can be used for system that automatically searches for a specific person in a sequence also for special video sequences. Audio/video properties are presented by descriptors and description schemes. There are many features that can be used to characterize multimedial signals. We can analyze audio and video sequences jointly or considered them completely separately. Our aim is oriented to possibilities of combining multimedial features. Focus is direct into discussion programs, because there are more decisions how to combine audio features with video sequences.
Interactive video audio system: communication server for INDECT portal

Science.gov (United States)

Mikulec, Martin; Voznak, Miroslav; Safarik, Jakub; Partila, Pavol; Rozhon, Jan; Mehic, Miralem

2014-05-01

The paper deals with presentation of the IVAS system within the 7FP EU INDECT project. The INDECT project aims at developing the tools for enhancing the security of citizens and protecting the confidentiality of recorded and stored information. It is a part of the Seventh Framework Programme of European Union. We participate in INDECT portal and the Interactive Video Audio System (IVAS). This IVAS system provides a communication gateway between police officers working in dispatching centre and police officers in terrain. The officers in dispatching centre have capabilities to obtain information about all online police officers in terrain, they can command officers in terrain via text messages, voice or video calls and they are able to manage multimedia files from CCTV cameras or other sources, which can be interesting for officers in terrain. The police officers in terrain are equipped by smartphones or tablets. Besides common communication, they can reach pictures or videos sent by commander in office and they can respond to the command via text or multimedia messages taken by their devices. Our IVAS system is unique because we are developing it according to the special requirements from the Police of the Czech Republic. The IVAS communication system is designed to use modern Voice over Internet Protocol (VoIP) services. The whole solution is based on open source software including linux and android operating systems. The technical details of our solution are presented in the paper.
Design and Implementation of a Video-Zoom Driven Digital Audio-Zoom System for Portable Digital Imaging Devices

Science.gov (United States)

Park, Nam In; Kim, Seon Man; Kim, Hong Kook; Kim, Ji Woon; Kim, Myeong Bo; Yun, Su Won

In this paper, we propose a video-zoom driven audio-zoom algorithm in order to provide audio zooming effects in accordance with the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone system, in conjunction with a soft masking process that considers the phase differences between microphones. Thus, the audio-zoom processed signal is obtained by multiplying an audio gain derived from a video-zoom level by the masked signal. After all, a real-time audio-zoom system is implemented on an ARM-CORETEX-A8 having a clock speed of 600 MHz after different levels of optimization are performed such as algorithmic level, C-code, and memory optimizations. To evaluate the complexity of the proposed real-time audio-zoom system, test data whose length is 21.3 seconds long is sampled at 48 kHz. As a result, it is shown from the experiments that the processing time for the proposed audio-zoom system occupies 14.6% or less of the ARM clock cycles. It is also shown from the experimental results performed in a semi-anechoic chamber that the signal with the front direction can be amplified by approximately 10 dB compared to the other directions.
La trasmissione di audio-video in diretta su larga scala: una breve rassegna

Directory of Open Access Journals (Sweden)

Franco Tommasi

2011-09-01

Full Text Available ItIn quanti modi è possibile trasmettere audio e video in diretta a un gran numero di destinatari? Se da una parte il grosso pubblico può ritenere banale il problema, uno sguardo più attento rivela senza difficoltà che è vero l'esatto opposto. L'articolo passa in rassegna i principali metodi attualmente usati ed evidenzia i problemi che essi comportano. Infine viene brevemente presentata una proposta sviluppata nel laboratorio dell'autore che cerca di sfruttare Internet ed il satellite per aggirare le difficoltà dei metodi passati in rassegna.EnHow many ways to transmit live audio and video to a very large audience are available?While the layman may consider such problem a trivial one, the exact opposite is true. A short review of the answers to the question is given in the following, together with a brief description of the drawbacks associated with each of them. The article ends with a short presentation of a proposal developed in the author's lab which puts to work the Internet and the satellite to get around the difficulties of the reviewed methods.
VideoSET: Video Summary Evaluation through Text

OpenAIRE

Yeung, Serena; Fathi, Alireza; Fei-Fei, Li

2014-01-01

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text ...
Speaker detection for conversational robots using synchrony between audio and video

NARCIS (Netherlands)

Noulas, A.; Englebienne, G.; Terwijn, B.; Kröse, B.; Hanheide, M.; Zender, H.

2010-01-01

This paper compares different methods for detecting the speaking person when multiple persons are interacting with a robot. We evaluate the state-of-the-art speaker detection methods on the iCat robot. These methods use the synchrony between audio and video to locate the most probable speaker. We
Parameter and state estimation using audio and video signals

OpenAIRE

Evestedt, Magnus

2005-01-01

The complexity of industrial systems and the mathematical models to describe them increases. In many cases point sensors are no longer sufficient to provide controllers and monitoring instruments with the information necessary for operation. The need for other types of information, such as audio and video, has grown. Suitable applications range in a broad spectrum from microelectromechanical systems and bio-medical engineering to papermaking and steel production. This thesis is divided into f...
TEACHING LEARNING MATERIALS: THE REVIEWS COURSEBOOKS, GAMES, WORKSHEETS, AUDIO VIDEO FILES

Directory of Open Access Journals (Sweden)

Anak Agung Sagung Shanti Sari Dewi

2016-11-01

Full Text Available Teaching learning materials (TLM has been widely recognised as one of most important components in language teaching to support the success of language learning. TLM is essential for teachers in planning their lessons, assisting them in their professional duty, and use them as rosources to describe instructions. This writing reviews 10 (ten teaching learning materials in the form of cousebooks, games, worksheets, and audio video files. The materials were chosen randomly and were analysed qualitatively. The discussion of the materials is done individually by presenting their target learners, how they are applied by teachers and students, the aims of the use of the materials, and the role of teachers and learners in different kind of TLM.
Low-cost synchronization of high-speed audio and video recordings in bio-acoustic experiments.

Science.gov (United States)

Laurijssen, Dennis; Verreycken, Erik; Geipel, Inga; Daems, Walter; Peremans, Herbert; Steckel, Jan

2018-02-27

In this paper, we present a method for synchronizing high-speed audio and video recordings of bio-acoustic experiments. By embedding a random signal into the recorded video and audio data, robust synchronization of a diverse set of sensor streams can be performed without the need to keep detailed records. The synchronization can be performed using recording devices without dedicated synchronization inputs. We demonstrate the efficacy of the approach in two sets of experiments: behavioral experiments on different species of echolocating bats and the recordings of field crickets. We present the general operating principle of the synchronization method, discuss its synchronization strength and provide insights into how to construct such a device using off-the-shelf components. © 2018. Published by The Company of Biologists Ltd.
Correspondence between audio and visual deep models for musical instrument detection in video recordings

OpenAIRE

Slizovskaia, Olga; Gómez, Emilia; Haro, Gloria

2017-01-01

This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate exist- ing cross-correlations between neurons from the ...
The implementation of Project-Based Learning in courses Audio Video to Improve Employability Skills

Science.gov (United States)

Sulistiyo, Edy; Kustono, Djoko; Purnomo; Sutaji, Eddy

2018-04-01

This paper presents a project-based learning (PjBL) in subjects with Audio Video the Study Programme Electro Engineering Universitas Negeri Surabaya which consists of two ways namely the design of the prototype audio-video and assessment activities project-based learning tailored to the skills of the 21st century in the form of employability skills. The purpose of learning innovation is applying the lab work obtained in the theory classes. The PjBL aims to motivate students, centering on the problems of teaching in accordance with the world of work. Measures of learning include; determine the fundamental questions, designs, develop a schedule, monitor the learners and progress, test the results, evaluate the experience, project assessment, and product assessment. The results of research conducted showed the level of mastery of the ability to design tasks (of 78.6%), technical planning (39,3%), creativity (42,9%), innovative (46,4%), problem solving skills (the 57.1%), skill to communicate (75%), oral expression (75%), searching and understanding information (to 64.3%), collaborative work skills (71,4%), and classroom conduct (of 78.6%). In conclusion, instructors have to do the reflection and make improvements in some of the aspects that have a level of mastery of the skills less than 60% both on the application of project-based learning courses, audio video.
Automatic processing of CERN video, audio and photo archives

International Nuclear Information System (INIS)

Kwiatek, M

2008-01-01

The digitalization of CERN audio-visual archives, a major task currently in progress, will generate over 40 TB of video, audio and photo files. Storing these files is one issue, but a far more important challenge is to provide long-time coherence of the archive and to make these files available on-line with minimum manpower investment. An infrastructure, based on standard CERN services, has been implemented, whereby master files, stored in the CERN Distributed File System (DFS), are discovered and scheduled for encoding into lightweight web formats based on predefined profiles. Changes in master files, conversion profiles or in the metadata database (read from CDS, the CERN Document Server) are automatically detected and the media re-encoded whenever necessary. The encoding processes are run on virtual servers provided on-demand by the CERN Server Self Service Centre, so that new servers can be easily configured to adapt to higher load. Finally, the generated files are made available from the CERN standard web servers with streaming implemented using Windows Media Services
Automatic processing of CERN video, audio and photo archives

Energy Technology Data Exchange (ETDEWEB)

Kwiatek, M [CERN, Geneva (Switzerland)], E-mail: Michal.Kwiatek@cem.ch

2008-07-15

The digitalization of CERN audio-visual archives, a major task currently in progress, will generate over 40 TB of video, audio and photo files. Storing these files is one issue, but a far more important challenge is to provide long-time coherence of the archive and to make these files available on-line with minimum manpower investment. An infrastructure, based on standard CERN services, has been implemented, whereby master files, stored in the CERN Distributed File System (DFS), are discovered and scheduled for encoding into lightweight web formats based on predefined profiles. Changes in master files, conversion profiles or in the metadata database (read from CDS, the CERN Document Server) are automatically detected and the media re-encoded whenever necessary. The encoding processes are run on virtual servers provided on-demand by the CERN Server Self Service Centre, so that new servers can be easily configured to adapt to higher load. Finally, the generated files are made available from the CERN standard web servers with streaming implemented using Windows Media Services.

Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering.

Science.gov (United States)

Savran, Arman; Cao, Houwei; Shah, Miraj; Nenkova, Ani; Verma, Ragini

2012-01-01

We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.
Multiple frequency audio signal communication as a mechanism for neurophysiology and video data synchronization.

Science.gov (United States)

Topper, Nicholas C; Burke, Sara N; Maurer, Andrew Porter

2014-12-30

Current methods for aligning neurophysiology and video data are either prepackaged, requiring the additional purchase of a software suite, or use a blinking LED with a stationary pulse-width and frequency. These methods lack significant user interface for adaptation, are expensive, or risk a misalignment of the two data streams. A cost-effective means to obtain high-precision alignment of behavioral and neurophysiological data is obtained by generating an audio-pulse embedded with two domains of information, a low-frequency binary-counting signal and a high, randomly changing frequency. This enabled the derivation of temporal information while maintaining enough entropy in the system for algorithmic alignment. The sample to frame index constructed using the audio input correlation method described in this paper enables video and data acquisition to be aligned at a sub-frame level of precision. Traditionally, a synchrony pulse is recorded on-screen via a flashing diode. The higher sampling rate of the audio input of the camcorder enables the timing of an event to be detected with greater precision. While on-line analysis and synchronization using specialized equipment may be the ideal situation in some cases, the method presented in the current paper presents a viable, low cost alternative, and gives the flexibility to interface with custom off-line analysis tools. Moreover, the ease of constructing and implements this set-up presented in the current paper makes it applicable to a wide variety of applications that require video recording. Copyright © 2014 Elsevier B.V. All rights reserved.
Service provider perceptions of transitioning from audio to video capability in a telehealth system: a qualitative evaluation

OpenAIRE

Clay-Williams, Robyn; Baysari, Melissa; Taylor, Natalie; Zalitis, Dianne; Georgiou, Andrew; Robinson, Maureen; Braithwaite, Jeffrey; Westbrook, Johanna

2017-01-01

Background Telephone consultation and triage services are increasingly being used to deliver health advice. Availability of high speed internet services in remote areas allows healthcare providers to move from telephone to video telehealth services. Current approaches for assessing video services have limitations. This study aimed to identify the challenges for service providers associated with transitioning from audio to video technology. Methods Using a mixed-method, qualitative approach, w...
Use of Effective Audio in E-learning Courseware

OpenAIRE

Ray, Kisor

2015-01-01

E-Learning uses electronic media, information & communication technologies to provide education to the masses. E-learning deliver hypertext, text, audio, images, animation and videos using desktop standalone computer, local area network based intranet and internet based contents. While producing an e-learning content or course-ware, a major decision making factor is whether to use audio for the benefit of the end users. Generally, three types of audio can be used in e-learning: narration, mus...
“A Real China” on User-Generated Videos? Audio-Visual Narratives of Confucianism

Directory of Open Access Journals (Sweden)

Jianxiu Hao

2014-03-01

Full Text Available Beneath the “Chinese successful story”, social stratification, class polarization, and cultural displacement have been accelerated. The Chinese Communist Party has not found a coherent solution to the challenges of reconciling social interests, since Communism has been more and more becoming mere “lip service”. However, it has been claimed that Confucian values can provide sources to dissolve the downsides of modernization in contemporary Chinese society. This study intends to investigate the revival of Confucianism, as a source for criticism and construction in Chinese socio-culture, as portrayed in user-generated videos which are produced/consumed by the largest Internet using population in the world, under the Chinese authoritarian regime which controls over communication. By means of a thematic audio-visual narrative analysis, this study has investigated 20 hours of Youku Paike videos published between 2007 and 2013. It has been detected: (1 about one third of the user-generated videos can be interpreted as Confucian thematic narratives; and there is a slightly increasing trend portraying Confucian values; (2 Confucianism can become a source for the formation of a new online socio-culture, in the circumstances of China’s modernization and cyberization, to advocate social actors’ cultivation and humanity’s flourishing.
Digital video and audio broadcasting technology a practical engineering guide

CERN Document Server

Fischer, Walter

2010-01-01

Digital Video and Audio Broadcasting Technology - A Practical Engineering Guide' deals with all the most important digital television, sound radio and multimedia standards such as MPEG, DVB, DVD, DAB, ATSC, T-DMB, DMB-T, DRM and ISDB-T. The book provides an in-depth look at these subjects in terms of practical experience. In addition it contains chapters on the basics of technologies such as analog television, digital modulation, COFDM or mathematical transformations between time and frequency domains. The attention in the respective field under discussion is focussed on aspects of measuring t
General Considerations Regarding the Interceptions and Audio-video Registrations Related to the Judicial Practice and Present Legislation

Directory of Open Access Journals (Sweden)

Sandra Gradinaru

2011-05-01

Full Text Available The present paper tries to analyze the controversy of the admissibility of theinterceptions and audio-video registrations in the phase of the precursory documentscannot be admitted. The fact that interceptions and registrations can be disposed evenbefore starting the criminal prosecution, respectively before starting the criminal processor even before committing an offence is to bring severe prejudices to the right of a fairprocess and the right to a private life in the way in which these are stipulated in theConstitution and in the European Convention of the Human rights.
ABOUT SOUNDS IN VIDEO GAMES

Directory of Open Access Journals (Sweden)

Denikin Anton A.

2012-12-01

Full Text Available The article considers the aesthetical and practical possibilities for sounds (sound design in video games and interactive applications. Outlines the key features of the game sound, such as simulation, representativeness, interactivity, immersion, randomization, and audio-visuality. The author defines the basic terminology in study of game audio, as well as identifies significant aesthetic differences between film sounds and sounds in video game projects. It is an attempt to determine the techniques of art analysis for the approaches in study of video games including aesthetics of their sounds. The article offers a range of research methods, considering the video game scoring as a contemporary creative practice.
Designing a large-scale video chat application

OpenAIRE

Scholl, Jeremiah; Parnes, Peter; McCarthy, John D.; Sasse, Angela

2005-01-01

Studies of video conferencing systems generally focus on scenarios where users communicate using an audio channel. However, text chat serves users in a wide variety of contexts, and is commonly included in multimedia conferencing systems as a complement to the audio channel. This paper introduces a prototype application which integrates video and text communication, and describes a formative evaluation of the prototype with 53 users in a social setting. We focus the evaluation on bandwidth an...
Audio Networking in the Music Industry

Directory of Open Access Journals (Sweden)

Glebs Kuzmics

2018-01-01

Full Text Available This paper surveys the rôle of computer networking technologies in the music industry. A comparison of their relevant technologies, their defining advantages and disadvantages; analyses and discussion of the situation in the market of network enabled audio products followed by a discussion of different devices are presented. The idea of replacing a proprietary solution with open-source and freeware software programs has been chosen as the fundamental concept of this research. The technologies covered include: native IEEE AVnu Alliance Audio Video Bridging (AVB, CobraNet®, Audinate Dante™ and Harman BLU Link.
Audio Description as a Pedagogical Tool

Directory of Open Access Journals (Sweden)

Georgina Kleege

2015-05-01

Full Text Available Audio description is the process of translating visual information into words for people who are blind or have low vision. Typically such description has focused on films, museum exhibitions, images and video on the internet, and live theater. Because it allows people with visual impairments to experience a variety of cultural and educational texts that would otherwise be inaccessible, audio description is a mandated aspect of disability inclusion, although it remains markedly underdeveloped and underutilized in our classrooms and in society in general. Along with increasing awareness of disability, audio description pushes students to practice close reading of visual material, deepen their analysis, and engage in critical discussions around the methodology, standards and values, language, and role of interpretation in a variety of academic disciplines. We outline a few pedagogical interventions that can be customized to different contexts to develop students' writing and critical thinking skills through guided description of visual material.
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

Science.gov (United States)

Giannakopoulos, Theodoros

2015-01-01

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
Implementing Audio-CASI on Windows’ Platforms

Science.gov (United States)

Cooley, Philip C.; Turner, Charles F.

2011-01-01

Audio computer-assisted self interviewing (Audio-CASI) technologies have recently been shown to provide important and sometimes dramatic improvements in the quality of survey measurements. This is particularly true for measurements requiring respondents to divulge highly sensitive information such as their sexual, drug use, or other sensitive behaviors. However, DOS-based Audio-CASI systems that were designed and adopted in the early 1990s have important limitations. Most salient is the poor control they provide for manipulating the video presentation of survey questions. This article reports our experiences adapting Audio-CASI to Microsoft Windows 3.1 and Windows 95 platforms. Overall, our Windows-based system provided the desired control over video presentation and afforded other advantages including compatibility with a much wider array of audio devices than our DOS-based Audio-CASI technologies. These advantages came at the cost of increased system requirements --including the need for both more RAM and larger hard disks. While these costs will be an issue for organizations converting large inventories of PCS to Windows Audio-CASI today, this will not be a serious constraint for organizations and individuals with small inventories of machines to upgrade or those purchasing new machines today. PMID:22081743
Checking Interceptions and Audio Video Recordings by the Court after Referral

Directory of Open Access Journals (Sweden)

Sandra Grădinaru

2012-05-01

Full Text Available In any event, the prosecutor and the judiciary should pay particular attention to the risk of theirfalsification, which can be achieved by taking only parts of conversations or communications that took place in thepast and are declared to be registered recently, or by removing parts of conversations or communications, or evenby the translation or removal of images. This is why the legislature provided an express provision for theirverification. Provisions of art. 916 Paragraph 1 Criminal Procedure Code offers the possibility of a technicalexpertise regarding the originality and continuity of the records, at the prosecutor's request, the parties or exofficio, where there are doubts about the correctness of the registration in whole or in part, especially if notsupported by all the evidence. Therefore, audio or video recordings serve themselves as evidence in criminalproceedings, if not appealed or confirmed by technical expertise, if there were doubts about their conformity withreality. In the event that there is lack of expertise from the authenticity of records, they will not be accepted asevidence in solving a criminal case, thus eliminating any probative value of the intercepted conversations andcommunications in that case, by applying article 64 Par. 2 Criminal Procedure Code.
Audio wiring guide how to wire the most popular audio and video connectors

CERN Document Server

Hechtman, John

2012-01-01

Whether you're a pro or an amateur, a musician or into multimedia, you can't afford to guess about audio wiring. The Audio Wiring Guide is a comprehensive, easy-to-use guide that explains exactly what you need to know. No matter the size of your wiring project or installation, this handy tool provides you with the essential information you need and the techniques to use it. Using The Audio Wiring Guide is like having an expert at your side. By following the clear, step-by-step directions, you can do professional-level work at a fraction of the cost.
Audio-Visual Classification of Sports Types

DEFF Research Database (Denmark)

Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

2015-01-01

In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modali...
Fusion of audio and visual cues for laughter detection

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal
Removable Watermarking Sebagai Pengendalian Terhadap Cyber Crime Pada Audio Digital

Directory of Open Access Journals (Sweden)

Reyhani Lian Putri

2017-08-01

Full Text Available Perkembangan teknologi informasi yang pesat menuntut penggunanya untuk lebih berhati-hati seiring semakin meningkatnya cyber crime.Banyak pihak telah mengembangkan berbagai teknik perlindungan data digital, salah satunya adalah watermarking. Teknologi watermarking berfungsi untuk memberikan identitas, melindungi, atau menandai data digital, baik audio, citra, ataupun video, yang mereka miliki. Akan tetapi, teknik tersebut masih dapat diretas oleh oknum-oknum yang tidak bertanggung jawab.Pada penelitian ini, proses watermarking diterapkan pada audio digital dengan menyisipkan watermark yang terdengar jelas oleh indera pendengaran manusia (perceptible pada audio host.Hal ini bertujuan agar data audio dapat terlindungi dan apabila ada pihak lain yang ingin mendapatkan data audio tersebut harus memiliki “kunci” untuk menghilangkan watermark. Proses removable watermarking ini dilakukan pada data watermark yang sudah diketahui metode penyisipannya, agar watermark dapat dihilangkan sehingga kualitas audio menjadi lebih baik. Dengan menggunakan metode ini diperoleh kinerja audio watermarking pada nilai distorsi tertinggi dengan rata-rata nilai SNR sebesar7,834 dB dan rata-rata nilai ODG sebesar -3,77.Kualitas audio meningkat setelah watermark dihilangkan, di mana rata-rata SNR menjadi sebesar 24,986 dB dan rata-rata ODG menjadi sebesar -1,064 serta nilai MOS sebesar 4,40.
ObamaNet: Photo-realistic lip-sync from text

OpenAIRE

Kumar, Rithesh; Sotelo, Jose; Kumar, Kundan; de Brebisson, Alexandre; Bengio, Yoshua

2017-01-01

We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to ...
Impairment-Factor-Based Audiovisual Quality Model for IPTV: Influence of Video Resolution, Degradation Type, and Content Type

Directory of Open Access Journals (Sweden)

Garcia MN

2011-01-01

Full Text Available This paper presents an audiovisual quality model for IPTV services. The model estimates the audiovisual quality of standard and high definition video as perceived by the user. The model is developed for applications such as network planning and packet-layer quality monitoring. It mainly covers audio and video compression artifacts and impairments due to packet loss. The quality tests conducted for model development demonstrate a mutual influence of the perceived audio and video quality, and the predominance of the video quality for the overall audiovisual quality. The balance between audio quality and video quality, however, depends on the content, the video format, and the audio degradation type. The proposed model is based on impairment factors which quantify the quality-impact of the different degradations. The impairment factors are computed from parameters extracted from the bitstream or packet headers. For high definition video, the model predictions show a correlation with unknown subjective ratings of 95%. For comparison, we have developed a more classical audiovisual quality model which is based on the audio and video qualities and their interaction. Both quality- and impairment-factor-based models are further refined by taking the content-type into account. At last, the different model variants are compared with modeling approaches described in the literature.

Automatic summarization of soccer highlights using audio-visual descriptors.

Science.gov (United States)

Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc

2015-01-01

Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.
The Use of Videos in Teaching - Some Experiences From the University of Copenhagen

Directory of Open Access Journals (Sweden)

Henrik Bregnhøj

2016-11-01

Full Text Available This paper covers videos created and used in different learning patterns. The videos are grouped according to the teaching or learning activities in which they are used. One group of videos are used by the teacher for one-way communication, including: online lectures, experts interacting with one another, instruction videos and introduction videos. Further videos are teacher-student interactive videos, including: feedback on student deliveries, student productions and interactive videos. Examples from different courses at different faculties at The University of Copenhagen of different types of videos (screencasts, pencasts and different kinds of camera recordings, from quick-and-dirty videos made by teachers at their own computer to professionally produced studio recordings as well as audio files are presented with links, as an empirical basis for the discussion. The paper is very practically oriented and looks at e.g. which course design and teaching situation is suitable for which type of video; at which point is an audio file preferable to a video file; and how to produce videos easily and without specialized equipment, if you don’t have access to (or time for professional assistance. In the article, we also point out how a small amount of tips & tricks regarding planning, design and presentation technique can improve recordings made by teachers themselves. We argue that the way to work with audio and video is to start by analyzing the pedagogical needs, in this way adapting the type and use of audio and video to the pedagogical context.
Search the Audio, Browse the Video—A Generic Paradigm for Video Collections

Directory of Open Access Journals (Sweden)

Efrat Alon

2003-01-01

Full Text Available The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.
Procedural Audio in Computer Games Using Motion Controllers: An Evaluation on the Effect and Perception

Directory of Open Access Journals (Sweden)

Niels Böttcher

2013-01-01

Full Text Available A study has been conducted into whether the use of procedural audio affects players in computer games using motion controllers. It was investigated whether or not (1 players perceive a difference between detailed and interactive procedural audio and prerecorded audio, (2 the use of procedural audio affects their motor-behavior, and (3 procedural audio affects their perception of control. Three experimental surveys were devised, two consisting of game sessions and the third consisting of watching videos of gameplay. A skiing game controlled by a Nintendo Wii balance board and a sword-fighting game controlled by a Wii remote were implemented with two versions of sound, one sample based and the other procedural based. The procedural models were designed using a perceptual approach and by alternative combinations of well-known synthesis techniques. The experimental results showed that, when being actively involved in playing or purely observing a video recording of a game, the majority of participants did not notice any difference in sound. Additionally, it was not possible to show that the use of procedural audio caused any consistent change in the motor behavior. In the skiing experiment, a portion of players perceived the control of the procedural version as being more sensitive.
Audio Papers

DEFF Research Database (Denmark)

Groth, Sanne Krogh; Samson, Kristine

2016-01-01

With this special issue of Seismograf we are happy to present a new format of articles: Audio Papers. Audio papers resemble the regular essay or the academic text in that they deal with a certain topic of interest, but presented in the form of an audio production. The audio paper is an extension...
Deception Detection in Videos

OpenAIRE

Wu, Zhe; Singh, Bharat; Davis, Larry S.; Subrahmanian, V. S.

2017-01-01

We present a system for covert automated deception detection in real-life courtroom trial videos. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely ...
An Interactive Concert Program Based on Infrared Watermark and Audio Synthesis

Science.gov (United States)

Wang, Hsi-Chun; Lee, Wen-Pin Hope; Liang, Feng-Ju

The objective of this research is to propose a video/audio system which allows the user to listen the typical music notes in the concert program under infrared detection. The system synthesizes audio with different pitches and tempi in accordance with the encoded data in a 2-D barcode embedded in the infrared watermark. The digital halftoning technique has been used to fabricate the infrared watermark composed of halftone dots by both amplitude modulation (AM) and frequency modulation (FM). The results show that this interactive system successfully recognizes the barcode and synthesizes audio under infrared detection of a concert program which is also valid for human observation of the contents. This interactive video/audio system has greatly expanded the capability of the printout paper to audio display and also has many potential value-added applications.
Sound for digital video

CERN Document Server

Holman, Tomlinson

2013-01-01

Achieve professional quality sound on a limited budget! Harness all new, Hollywood style audio techniques to bring your independent film and video productions to the next level.In Sound for Digital Video, Second Edition industry experts Tomlinson Holman and Arthur Baum give you the tools and knowledge to apply recent advances in audio capture, video recording, editing workflow, and mixing to your own film or video with stunning results. This fresh edition is chockfull of techniques, tricks, and workflow secrets that you can apply to your own projects from preproduction
Summarizing Audiovisual Contents of a Video Program

Science.gov (United States)

Gong, Yihong

2003-12-01

In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.
Using Video in Web-Based Listening Tests

Directory of Open Access Journals (Sweden)

Cristina Pardo-Ballester

2016-07-01

Full Text Available With sophisticated multimedia technology, there is a renewed interest in the relationship between visual and auditory channels in assessing listening comprehension (LC. Research on the use of visuals in assessing listening has emerged with inconclusive results. Some learners perform better on tests which include visual input (Wagner, 2007 while others have found no difference in the performance of participants on the two test formats (Batty, 2015. These mixed results make it necessary to examine the role of using audio and video in LC as measured by L2 listening tests. The current study examined the effects of two different types of listening support on L2 learners’ comprehension: (a visual aid in a video with input modified with redundancy and (b no visuals (audio-only input with input modified with redundancy. The participants of this study included 246 Spanish students enrolled in two different intermediate Spanish courses at a large Midwestern university who participated in four listening tasks either with video or with audio. Findings of whether the video serves as a listening support device and whether the course formats differ on intermediate-level Spanish learners’ comprehension will be shared as well as participants’ preferences with respect to listening support.
Automated Indexing and Search of Video Data in Large Collections with inVideo

Directory of Open Access Journals (Sweden)

Shuangbao Paul Wang

2017-08-01

Full Text Available In this paper, we present a novel system, inVideo, for automatically indexing and searching videos based on the keywords spoken in the audio track and the visual content of the video frames. Using the highly efficient video indexing engine we developed, inVideo is able to analyze videos using machine learning and pattern recognition without the need for initial viewing by a human. The time-stamped commenting and tagging features refine the accuracy of search results. The cloud-based implementation makes it possible to conduct elastic search, augmented search, and data analytics. Our research shows that inVideo presents an efficient tool in processing and analyzing videos and increasing interactions in video-based online learning environment. Data from a cybersecurity program with more than 500 students show that applying inVideo to current video material, interactions between student-student and student-faculty increased significantly across 24 sections program-wide.
Huffman coding in advanced audio coding standard

Science.gov (United States)

Brzuchalski, Grzegorz

2012-05-01

This article presents several hardware architectures of Advanced Audio Coding (AAC) Huffman noiseless encoder, its optimisations and working implementation. Much attention has been paid to optimise the demand of hardware resources especially memory size. The aim of design was to get as short binary stream as possible in this standard. The Huffman encoder with whole audio-video system has been implemented in FPGA devices.
Analisis Pengaruh Kualitas Video Rolling terhadap Kinerja Stasiun Relay Trans7 Pontianak

OpenAIRE

Fajar Maulana

2013-01-01

Video Rolling is caused by the exciter which is the generator of audio signal as well as video signal which is produced from the receiver unit, that is a part of the internal disturbance. Video Rolling happens because of the unsyncrhonization between the audio signal and the video signal. The unsyncrhonization is caused by the audio signal is faster than the video signal, so that the place of sound information surpass the destination of video information and the video signal is faster than au...
Realization on the interactive remote video conference system based on multi-Agent

Directory of Open Access Journals (Sweden)

Zheng Yan

2016-01-01

Full Text Available To make people at different places participate in the same conference, speak and discuss freely, the interactive remote video conferencing system is designed and realized based on multi-Agent collaboration. FEC (forward error correction and tree P2P technology are firstly used to build a live conference structure to transfer audio and video data; then the branch conference port can participate to speak and discuss through the application of becoming a interactive focus; the introduction of multi-Agent collaboration technology improve the system robustness. The experiments showed that, under normal network conditions, the system can support 350 branch conference node simultaneously to make live broadcasting. The audio and video quality is smooth. It can carry out large-scale remote video conference.
Video Links from Prison: Permeability and the Carceral World

Directory of Open Access Journals (Sweden)

Carolyn McKay

2016-03-01

Full Text Available As audio visual communication technologies are installed in prisons, these spaces of incarceration are networked with courtrooms and other non-contiguous spaces, potentially facilitating a process of permeability. Jurisdictions around the world are embracing video conferencing and the technology is becoming a major interface for prisoners’ interactions with courts and legal advisers. In this paper, I draw on fieldwork interviews with prisoners from two correction centres in New South Wales, Australia, to understand their subjective and sensorial experiences of using video links as a portal to the outside world. These interviews raised many issues including audio permeability: a soundtrack of incarceration sometimes infiltrates into the prison video studio and then the remote courtroom, framing the prisoner in the context of their detention, intruding on legal process, and affecting prisoners’ comprehension and participation.
Audio-Visual Aid in Teaching "Fatty Liver"

Science.gov (United States)

Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

2016-01-01

Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…
Amping it up on a small budget: Transforming inexpensive, commercial audio and video components into a useful charged particle spectrometer

Science.gov (United States)

Pallone, Arthur

Necessity often leads to inspiration. Such was the case when a traditional amplifier quit working during the collection of an alpha particle spectrum. I had a 15 battery-powered audio amplifier in my box of project electronics so I connected it between the preamplifier and the multichannel analyzer. The alpha particle spectrum that appeared on the computer screen matched expectations even without correcting for impedance mismatches. Encouraged by this outcome, I have begun to systematically replace each of the parts in a traditional charged particle spectrometer with audio and video components available through consumer electronics stores with the goal of producing an inexpensive charged particle spectrometer for use in education and research. Hopefully my successes, setbacks, and results to date described in this presentation will inform and inspire others.
Video Conferencing for a Virtual Seminar Room

DEFF Research Database (Denmark)

Forchhammer, Søren; Fosgerau, A.; Hansen, Peter Søren K.

2002-01-01

A PC-based video conferencing system for a virtual seminar room is presented. The platform is enhanced with DSPs for audio and video coding and processing. A microphone array is used to facilitate audio based speaker tracking, which is used for adaptive beam-forming and automatic camera...
The audio expert everything you need to know about audio

CERN Document Server

Winer, Ethan

2012-01-01

The Audio Expert is a comprehensive reference that covers all aspects of audio, with many practical, as well as theoretical, explanations. Providing in-depth descriptions of how audio really works, using common sense plain-English explanations and mechanical analogies with minimal math, the book is written for people who want to understand audio at the deepest, most technical level, without needing an engineering degree. It's presented in an easy-to-read, conversational tone, and includes more than 400 figures and photos augmenting the text.The Audio Expert takes th
PVR system design of advanced video navigation reinforced with audible sound

NARCIS (Netherlands)

Eerenberg, O.; Aarts, R.; De With, P.N.

2014-01-01

This paper presents an advanced video navigation concept for Personal Video Recording (PVR), based on jointly using the primary image and a Picture-in-Picture (PiP) image, featuring combined rendering of normal-play video fragments with audio and fast-search video. The hindering loss of audio during

[Intermodal timing cues for audio-visual speech recognition].

Science.gov (United States)

Hashimoto, Masahiro; Kumashiro, Masaharu

2004-06-01

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
New musical organology : the audio-games

OpenAIRE

Zénouda , Hervé

2012-01-01

International audience; This article aims to shed light on a new and emerging creative field: " Audio Games, " a crossroad between video games and computer music. Today, a plethora of tiny applications, which propose entertaining audiovisual experiences with a preponderant sound dimension, are available for game consoles, computers, and mobile phones. These experiences represent a new universe where the gameplay of video games is applied to musical composition, hence creating new links betwee...
Detection of goal events in soccer videos

Science.gov (United States)

Kim, Hyoung-Gook; Roeber, Steffen; Samour, Amjad; Sikora, Thomas

2005-01-01

In this paper, we present an automatic extraction of goal events in soccer videos by using audio track features alone without relying on expensive-to-compute video track features. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The detection of soccer video highlights using audio contents comprises three steps: 1) extraction of audio features from a video sequence, 2) event candidate detection of highlight events based on the information provided by the feature extraction Methods and the Hidden Markov Model (HMM), 3) goal event selection to finally determine the video intervals to be included in the summary. For this purpose we compared the performance of the well known Mel-scale Frequency Cepstral Coefficients (MFCC) feature extraction method vs. MPEG-7 Audio Spectrum Projection feature (ASP) extraction method based on three different decomposition methods namely Principal Component Analysis( PCA), Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF). To evaluate our system we collected five soccer game videos from various sources. In total we have seven hours of soccer games consisting of eight gigabytes of data. One of five soccer games is used as the training data (e.g., announcers' excited speech, audience ambient speech noise, audience clapping, environmental sounds). Our goal event detection results are encouraging.
The Success of Free to Play Games and Possibilities of Audio Monetization

OpenAIRE

Hahl, Kalle

2014-01-01

Video games are a huge business – nearly four times greater than film and music business combined. Free to play is the fastest growing category in video gaming. Game audio is part of the development of every game having a direct correlation between the growth of gaming industry and the growth of gaming audio industry. Games have inherently different goals for the players and the developers. Players are consumers seeking for entertainment. Developers are content producers trying to moneti...
Identifying sports videos using replay, text, and camera motion features

Science.gov (United States)

Kobla, Vikrant; DeMenthon, Daniel; Doermann, David S.

1999-12-01

Automated classification of digital video is emerging as an important piece of the puzzle in the design of content management systems for digital libraries. The ability to classify videos into various classes such as sports, news, movies, or documentaries, increases the efficiency of indexing, browsing, and retrieval of video in large databases. In this paper, we discuss the extraction of features that enable identification of sports videos directly from the compressed domain of MPEG video. These features include detecting the presence of action replays, determining the amount of scene text in vide, and calculating various statistics on camera and/or object motion. The features are derived from the macroblock, motion,and bit-rate information that is readily accessible from MPEG video with very minimal decoding, leading to substantial gains in processing speeds. Full-decoding of selective frames is required only for text analysis. A decision tree classifier built using these features is able to identify sports clips with an accuracy of about 93 percent.
Audio-Visual Perception System for a Humanoid Robotic Head

Directory of Open Access Journals (Sweden)

Raquel Viciana-Abad

2014-05-01

Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.
Video genre classification using multimodal features

Science.gov (United States)

Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

2003-12-01

We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.
Patient perceptions of text-messages, email, and video in dermatologic surgery patients.

Science.gov (United States)

Hawkins, Spencer D; Barilla, Steven; Williford, Phillip Williford M; Feldman, Steven R; Pearce, Daniel J

2017-04-14

We developed dermatology patient education videos and a post-operative text message service that could be accessed universally via web based applications. A secondary outcome of the study was to assess patient opinions of text-messages, email, and video in the health care setting which is reported here. An investigator-blinded, randomized, controlled intervention was evaluated in 90 nonmelanoma MMS patients at Wake Forest Baptist Dermatology. Patients were randomized 1:1:1:1 for exposure to: 1) videos with text messages, 2) videos only, 3) text messages-only, or 4) standard of care. Assessment measures were obtained by the use of REDCap survey questions during the follow up visit. 1) 67% would like to receive an email with information about the procedure beforehand 2) 98% of patients reported they would like other doctors to use educational videos as a form of patient education 3) 88% of our patients think it is appropriate for physicians to communicate to patients via text message in certain situations. Nearly all patients desired physicians to use text-messages and video in their practice and the majority of patients preferred to receive an email with information about their procedure beforehand.
The definitive guide to HTML 5 video

CERN Document Server

Pfeiffer, Silvia

2010-01-01

Plugins will soon be a thing of the past. The Definitive Guide to HTML5 Video is the first authoritative book on HTML5 video, the new web standard that allows browsers to support audio and video elements natively. This makes it very easy for web developers to publish audio and video, integrating both within the general presentation of web pages. For example, media elements can be styled using CSS (style sheets), integrated into SVG (scalable vector graphics), and manipulated in a Canvas. The book offers techniques for providing accessibility to media elements, enabling consistent handling of a
Video equipment of tele dosimetry and audio

International Nuclear Information System (INIS)

Ojeda R, M.A.; Padilla C, I.

2007-01-01

To develop a work in an area with high radiation, it requires of a detailed knowledge of the surroundings work, a communication and effective vision, a near dosimetric control. In a work where the spaces variables and reduced accesses exist, noise that hinders the communication, defendant operative condition, radiation field and taking of decision, it is necessary to have tools that allow a total control of the environment to make opportune and effective decisions, there where the task is developed. Under this elementary concept, it was developed in the Laguna Verde Central a project that it allowed a mechanism, interactive of control in spaces complex; to see, to hear, to speak, to measure. This concept takes to the creation of an equipped system with closed circuit of television, wireless communication systems, tele dosimetry wireless systems, VHS and DVD recording equipment, uninterrupted energy units. The system requires of an electric power socket, and the installation of two cables by CCTV camera. The system is mobilized by a person. He puts on in operation in 5 minutes using a verification list. The concept was developed in the project denominated VETA-1, (Video Equipment of Tele dosimetry and Audio). It is objective of this work to present before the society the development of the VETA-1 tool that conclude in their first prototype in May of the present year. The VETA-1 project arises by a necessity of optimizing dose, it is an ALARA tool, with a countless applications, like it was proven in the 12 recharge stop of the Unit 1. The VETA-1 project integrate a recording system, with the primary end of analyzing in the place where the task is developed the details for an effective and opportune decision, but the resulting information is of utility for the personnel's training and the planning of future works. The VETA-1 system is an ALARA tool of quick response control. (Author)
Audio-Visual Fusion for Sound Source Localization and Improved Attention

Energy Technology Data Exchange (ETDEWEB)

Lee, Byoung Gi; Choi, Jong Suk; Yoon, Sang Suk; Choi, Mun Taek; Kim, Mun Sang [Korea Institute of Science and Technology, Daejeon (Korea, Republic of); Kim, Dai Jin [Pohang University of Science and Technology, Pohang (Korea, Republic of)

2011-07-15

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
Audio-Visual Fusion for Sound Source Localization and Improved Attention

International Nuclear Information System (INIS)

Lee, Byoung Gi; Choi, Jong Suk; Yoon, Sang Suk; Choi, Mun Taek; Kim, Mun Sang; Kim, Dai Jin

2011-01-01

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection
Evaluation on the use of animated narrative video in teaching narrative text

Directory of Open Access Journals (Sweden)

Soe’oed Rahmat

2018-01-01

Full Text Available In the 21st century, our life is strongly affected by the information technology. Educational technology has been rapidly improved by the development of audiovisual tools. Teachers may choose a number of different types of resources for teaching purposes, including videos and movies. Therefore, this study is aimed at evaluating animated narrative videos from YouTube for the teaching narrative text and identifying potential factors which influence the quality of educational videos. The videos were examined by using assessment rubric to see the quality and suitability of animated narrative videos which might be used in the teaching narrative text. The rubric was adapted from Prince Edward Island (PEI Department of Education: Evaluation and Selection of Learning Resources. It consists of four criteria, content, structure, instructional design, and technical design In addition, the study presents critical awareness of how these aspects can be interpreted to measure animated narrative videos and at the same time the engagement of the teachers in exploring animated narrative videos used in classroom.
Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

DEFF Research Database (Denmark)

Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

2014-01-01

Due to increased computational power, reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between a HRTF enhanced audio system (3D) and an...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations.......Due to increased computational power, reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between a HRTF enhanced audio system (3D...
Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

DEFF Research Database (Denmark)

Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

2014-01-01

Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D) and an...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations.......Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D...
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

Directory of Open Access Journals (Sweden)

D.V. Ivanko

2016-05-01

Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
Applications of Scalable Multipoint Video and Audio Using the Public Internet

Directory of Open Access Journals (Sweden)

Robert D. Gaglianello

2000-01-01

Full Text Available This paper describes a scalable multipoint video system, designed for efficient generation and display of high quality, multiple resolution, multiple compressed video streams over IP-based networks. We present our experiences using the system over the public Internet for several real-world applications, including distance learning, virtual theater, and virtual collaboration. The trials were a combined effort of Bell Laboratories and the Gertrude Stein Repertory Theatre (TGSRT. We also present current advances in the conferencing system since the trials, new areas for application and future applications.
User-Generated Video and Intertextuality

DEFF Research Database (Denmark)

Christensen, Lars Holmgaard; Rasmussen, Tove Arendt; Kofoed, Peter

This paper discusses the changing relationship between texts, producers and audiences and tries to understand user-generated audio-visual content or to be more precise, intertextuality in user-generated videos in relation to distribution formats, cultural form and genres. Continuing on from...... the work of John Fiske and the notion of vertical and horizontal intertextuality this paper tries to develop Fiske's original ideas so that his model incorporates the changing relationship between producers and their audiences, the text generated by mainstream media and the text generated by ordinary...
Ranking Highlights in Personal Videos by Analyzing Edited Videos.

Science.gov (United States)

Sun, Min; Farhadi, Ali; Chen, Tseng-Hung; Seitz, Steve

2016-11-01

We present a fully automatic system for ranking domain-specific highlights in unconstrained personal videos by analyzing online edited videos. A novel latent linear ranking model is proposed to handle noisy training data harvested online. Specifically, given a targeted domain such as "surfing," our system mines the YouTube database to find pairs of raw and their corresponding edited videos. Leveraging the assumption that an edited video is more likely to contain highlights than the trimmed parts of the raw video, we obtain pair-wise ranking constraints to train our model. The learning task is challenging due to the amount of noise and variation in the mined data. Hence, a latent loss function is incorporated to mitigate the issues caused by the noise. We efficiently learn the latent model on a large number of videos (about 870 min in total) using a novel EM-like procedure. Our latent ranking model outperforms its classification counterpart and is fairly competitive compared with a fully supervised ranking system that requires labels from Amazon Mechanical Turk. We further show that a state-of-the-art audio feature mel-frequency cepstral coefficients is inferior to a state-of-the-art visual feature. By combining both audio-visual features, we obtain the best performance in dog activity, surfing, skating, and viral video domains. Finally, we show that impressive highlights can be detected without additional human supervision for seven domains (i.e., skating, surfing, skiing, gymnastics, parkour, dog activity, and viral video) in unconstrained personal videos.
Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

Directory of Open Access Journals (Sweden)

Chanira Nuansa

2014-04-01

Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

A conceptual framework for audio-visual museum media

DEFF Research Database (Denmark)

Kirkedahl Lysholm Nielsen, Mikkel

2017-01-01

In today's history museums, the past is communicated through many other means than original artefacts. This interdisciplinary and theoretical article suggests a new approach to studying the use of audio-visual media, such as film, video and related media types, in a museum context. The centre...... and museum studies, existing case studies, and real life observations, the suggested framework instead stress particular characteristics of contextual use of audio-visual media in history museums, such as authenticity, virtuality, interativity, social context and spatial attributes of the communication...
About subjective evaluation of adaptive video streaming

Science.gov (United States)

Tavakoli, Samira; Brunnström, Kjell; Garcia, Narciso

2015-03-01

The usage of HTTP Adaptive Streaming (HAS) technology by content providers is increasing rapidly. Having available the video content in multiple qualities, using HAS allows to adapt the quality of downloaded video to the current network conditions providing smooth video-playback. However, the time-varying video quality by itself introduces a new type of impairment. The quality adaptation can be done in different ways. In order to find the best adaptation strategy maximizing users perceptual quality it is necessary to investigate about the subjective perception of adaptation-related impairments. However, the novelties of these impairments and their comparably long time duration make most of the standardized assessment methodologies fall less suited for studying HAS degradation. Furthermore, in traditional testing methodologies, the quality of the video in audiovisual services is often evaluated separated and not in the presence of audio. Nevertheless, the requirement of jointly evaluating the audio and the video within a subjective test is a relatively under-explored research field. In this work, we address the research question of determining the appropriate assessment methodology to evaluate the sequences with time-varying quality due to the adaptation. This was done by studying the influence of different adaptation related parameters through two different subjective experiments using a methodology developed to evaluate long test sequences. In order to study the impact of audio presence on quality assessment by the test subjects, one of the experiments was done in the presence of audio stimuli. The experimental results were subsequently compared with another experiment using the standardized single stimulus Absolute Category Rating (ACR) methodology.
Add Audio and Video to Your Site

CERN Document Server

MacDonald, Matthew

2010-01-01

Nothing spices up websites like cool sound effects (think ker-thunk as visitors press a button) or embedded videos. Think you need a programmer to add sizzle to your site? Think again. This hands-on guide gives you the techniques you need to add video, music, animated GIFs, and sound effects to your site. This Mini Missing Manual is excerpted from Creating a Web Site: The Missing Manual.
78 FR 40421 - Inquiry Regarding Video Description in Video Programming Distributed on Television and on the...

Science.gov (United States)

2013-07-05

... description services for television are provided on a secondary audio stream, and typically a consumer can... box. The Commission recently adopted rules requiring apparatus that is designed to receive, play back, or record video programming transmitted simultaneously with sound to make secondary audio streams...
Video Pedagogy as Political Activity.

Science.gov (United States)

Higgins, John W.

1991-01-01

Asserts that the education of students in the technology of video and audio production is a political act. Discusses the structure and style of production, and the ideologies and values contained therein. Offers alternative approaches to critical video pedagogy. (PRA)
Computationally Efficient Clustering of Audio-Visual Meeting Data

Science.gov (United States)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Dual-Layer Video Encryption using RSA Algorithm

Science.gov (United States)

Chadha, Aman; Mallik, Sushmit; Chadha, Ankit; Johar, Ravdeep; Mani Roja, M.

2015-04-01

This paper proposes a video encryption algorithm using RSA and Pseudo Noise (PN) sequence, aimed at applications requiring sensitive video information transfers. The system is primarily designed to work with files encoded using the Audio Video Interleaved (AVI) codec, although it can be easily ported for use with Moving Picture Experts Group (MPEG) encoded files. The audio and video components of the source separately undergo two layers of encryption to ensure a reasonable level of security. Encryption of the video component involves applying the RSA algorithm followed by the PN-based encryption. Similarly, the audio component is first encrypted using PN and further subjected to encryption using the Discrete Cosine Transform. Combining these techniques, an efficient system, invulnerable to security breaches and attacks with favorable values of parameters such as encryption/decryption speed, encryption/decryption ratio and visual degradation; has been put forth. For applications requiring encryption of sensitive data wherein stringent security requirements are of prime concern, the system is found to yield negligible similarities in visual perception between the original and the encrypted video sequence. For applications wherein visual similarity is not of major concern, we limit the encryption task to a single level of encryption which is accomplished by using RSA, thereby quickening the encryption process. Although some similarity between the original and encrypted video is observed in this case, it is not enough to comprehend the happenings in the video.
Roundtable Audio Discussion

Directory of Open Access Journals (Sweden)

Chris Bigum

2007-01-01

Full Text Available RoundTable on Technology, Teaching and Tools. This is a roundtable audio interview conducted by James Farmer, founder of Edublogs, with Anne Bartlett-Bragg (University of Technology Sydney and Chris Bigum (Deakin University. Skype was used to make and record the audio conference and the resulting sound file was edited by Andrew McLauchlan.
If a Picture Is Worth a Thousand Words Is Video Worth a Million? Differences in Affective and Cognitive Processing of Video and Text Cases

Science.gov (United States)

Yadav, Aman; Phillips, Michael M.; Lundeberg, Mary A.; Koehler, Matthew J.; Hilden, Katherine; Dirkin, Kathryn H.

2011-01-01

In this investigation we assessed whether different formats of media (video, text, and video + text) influenced participants' engagement, cognitive processing and recall of non-fiction cases of people diagnosed with HIV/AIDS. For each of the cases used in the study, we designed three informationally-equivalent versions: video, text, and video +…
Stochastic modeling of soundtrack for efficient segmentation and indexing of video

Science.gov (United States)

Naphade, Milind R.; Huang, Thomas S.

1999-12-01

Tools for efficient and intelligent management of digital content are essential for digital video data management. An extremely challenging research area in this context is that of multimedia analysis and understanding. The capabilities of audio analysis in particular for video data management are yet to be fully exploited. We present a novel scheme for indexing and segmentation of video by analyzing the audio track. This analysis is then applied to the segmentation and indexing of movies. We build models for some interesting events in the motion picture soundtrack. The models built include music, human speech and silence. We propose the use of hidden Markov models to model the dynamics of the soundtrack and detect audio-events. Using these models we segment and index the soundtrack. A practical problem in motion picture soundtracks is that the audio in the track is of a composite nature. This corresponds to the mixing of sounds from different sources. Speech in foreground and music in background are common examples. The coexistence of multiple individual audio sources forces us to model such events explicitly. Experiments reveal that explicit modeling gives better result than modeling individual audio events separately.
OAS :: Videos

Science.gov (United States)

subscriptions Videos Photos Live Webcast Social Media Facebook @oasofficial Facebook Twitter @oas_official Audios Photos Social Media Facebook Twitter Newsletters Press and Communications Department Contact us at Rights Actions against Corruption C Children Civil Registry Civil Society Contact Us Culture Cyber
Audio/visual analysis for high-speed TV advertisement detection from MPEG bitstream

OpenAIRE

Sadlier, David A.

2002-01-01

Advertisement breaks dunng or between television programmes are typically flagged by senes of black-and-silent video frames, which recurrendy occur in order to audio-visually separate individual advertisement spots from one another. It is the regular prevalence of these flags that enables automatic differentiauon between what is programme content and what is advertisement break. Detection of these audio-visual depressions within broadcast television content provides a basis on which advertise...
Economic and legal aspects of introducing novel ICT instruments: integrating sound into social media marketing - from audio branding to soundscaping

Directory of Open Access Journals (Sweden)

Daj, A.

2013-12-01

Full Text Available The pervasive expansion and implementation of ICT based marketing instruments imposes a new economic investigation of business models and regulatory solutions. Moreover, the current status of Social Media research indicates that the use of social networking and collaboration technologies is deeply changing the way people communicate, consume and cooperate with each other. Against the backdrop of widespread availability of digital audio-video content and the growing number of “smart” mobile devices, business professionals have developed new strategies for achieving customer involvement and retention through digitally linking audio stimuli to the powerful networking environment of Social Media.
Vocabulary Learning through Viewing Video: The Effect of Two Enhancement Techniques

Science.gov (United States)

Montero Perez, Maribel; Peters, Elke; Desmet, Piet

2018-01-01

While most studies on L2 vocabulary learning through input have addressed learners' vocabulary uptake from written text, this study focuses on audio-visual input. In particular, we investigate the effects of enhancing video by (1) adding different types of L2 subtitling (i.e. no captioning, full captioning, keyword captioning, and glossed keyword…
Simulating Auditory Hallucinations in a Video Game

DEFF Research Database (Denmark)

Weinel, Jonathan; Cunningham, Stuart

2017-01-01

In previous work the authors have proposed the concept of 'ASC Simulations': including audio-visual installations and experiences, as well as interactive video game systems, which simulate altered states of consciousness (ASCs) such as dreams and hallucinations. Building on the discussion...... of the authors' previous paper, where a large-scale qualitative study explored the changes to auditory perception that users of various intoxicating substances report, here the authors present three prototype audio mechanisms for simulating hallucinations in a video game. These were designed in the Unity video...... that make up the player character, and in future developments of this type of work we foresee a more advanced, standardised interface that models the senses, emotions and state of consciousness of player avatars....
Audio-Visual Integration Modifies Emotional Judgment in Music

Directory of Open Access Journals (Sweden)

Shen-Yuan Su

2011-10-01

Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.
The Impact of Videos Presenting Speakers’ Gestures and Facial Clues on Iranian EFL Learners’ Listening Comprehension

Directory of Open Access Journals (Sweden)

Somayeh Karbalaie Safarali

2012-11-01

Full Text Available The current research sought to explore the effectiveness of using videos presenting speakers’ gestures and facial clues on Iranian EFL learners’ listening comprehension proficiency. It was carried out at Ayandeh English Institute among 60 advanced female learners with the age range of 17-30 through a quasi-experimental research design. The researcher administered a TOEFL test to determine the homogeneity of the participants regarding both their general English language proficiency level and listening comprehension ability. Participants were randomly assigned into two groups. After coming up with the conclusion that the two groups were homogeneous, during 10 sessions of treatment, they received two different listening comprehension techniques, i.e. audio-visual group watching the video was equipped with the speaker’s gestures and facial clues, while the audio-only group could just listen to speaker’s voice and no additional clue was presented. Meanwhile, the participants were supposed to answer the questions related to each video. At the end of the treatment, both groups participated in the listening comprehension test of the Longman TOEFL test as the post-test. A t-test was used to compare the mean scores of the two groups, the result of which showed that the learners’ mean score in the audio-visual group was significantly higher than the learners’ mean score in the audio-only group. In conclusion, the result of this study suggests that foreign language pedagogy, especially for adult English learners, would benefit from applying videos presenting speakers’ gestures and facial clues.
Video Streaming in Online Learning

Science.gov (United States)

Hartsell, Taralynn; Yuen, Steve Chi-Yin

2006-01-01

The use of video in teaching and learning is a common practice in education today. As learning online becomes more of a common practice in education, streaming video and audio will play a bigger role in delivering course materials to online learners. This form of technology brings courses alive by allowing online learners to use their visual and…
Intelligent keyframe extraction for video printing

Science.gov (United States)

Zhang, Tong

2004-10-01

Nowadays most digital cameras have the functionality of taking short video clips, with the length of video ranging from several seconds to a couple of minutes. The purpose of this research is to develop an algorithm which extracts an optimal set of keyframes from each short video clip so that the user could obtain proper video frames to print out. In current video printing systems, keyframes are normally obtained by evenly sampling the video clip over time. Such an approach, however, may not reflect highlights or regions of interest in the video. Keyframes derived in this way may also be improper for video printing in terms of either content or image quality. In this paper, we present an intelligent keyframe extraction approach to derive an improved keyframe set by performing semantic analysis of the video content. For a video clip, a number of video and audio features are analyzed to first generate a candidate keyframe set. These features include accumulative color histogram and color layout differences, camera motion estimation, moving object tracking, face detection and audio event detection. Then, the candidate keyframes are clustered and evaluated to obtain a final keyframe set. The objective is to automatically generate a limited number of keyframes to show different views of the scene; to show different people and their actions in the scene; and to tell the story in the video shot. Moreover, frame extraction for video printing, which is a rather subjective problem, is considered in this work for the first time, and a semi-automatic approach is proposed.
Video as a technology for interpersonal communications: a new perspective

Science.gov (United States)

Whittaker, Steve

1995-03-01

Some of the most challenging multimedia applications have involved real- time conferencing, using audio and video to support interpersonal communication. Here we re-examine assumptions about the role, importance and implementation of video information in such systems. Rather than focussing on novel technologies, we present evaluation data relevant to both the classes of real-time multimedia applications we should develop and their design and implementation. Evaluations of videoconferencing systems show that previous work has overestimated the importance of video at the expense of audio. This has strong implications for the implementation of bandwidth allocation and synchronization. Furthermore our recent studies of workplace interaction show that prior work has neglected another potentially vital function of visual information: in assessing the communication availability of others. In this new class of application, rather than providing a supplement to audio information, visual information is used to promote the opportunistic communications that are prevalent in face-to-face settings. We discuss early experiments with such connection applications and identify outstanding design and implementation issues. Finally we examine a different class of application 'video-as-data', where the video image is used to transmit information about the work objects themselves, rather than information about interactants.

Scratch's Third Body: Video Talks Back to Television

Directory of Open Access Journals (Sweden)

Leo Goldsmith

2015-12-01

Full Text Available Emerging in the UK in the 1980s, Scratch Video established a paradoxical union of mass-media critique, Left-wing politics, and music-video and advertising aesthetics with its use of moving-image appropriation in the medium of videotape. Enabled by innovative professional and consumer video technologies, artists like George Barber, The Gorilla Tapes, and Sandra Goldbacher and Kim Flitcroft deployed a style characterized by the rapid sampling and manipulation of dissociated images drawn from broadcast television. Inspired by the cut-up methods of William Burroughs and the audio sampling practiced by contemporary black American musicians, these artists developed strategies for intervening in the audiovisual archive of television and disseminating its images in new contexts: in galleries and nightclubs, and on home video. Reconceptualizing video's “body,” Scratch's appropriation of televisual images of the human form imagined a new hybrid image of the post-industrial body, a “third body” representing a new convergence of human and machine.
Teacher’s Voice on Metacognitive Strategy Based Instruction Using Audio Visual Aids for Listening

Directory of Open Access Journals (Sweden)

Salasiah Salasiah

2018-02-01

Full Text Available The paper primarily stresses on exploring the teacher’s voice toward the application of metacognitive strategy with audio-visual aid in improving listening comprehension. The metacognitive strategy model applied in the study was inspired from Vandergrift and Tafaghodtari (2010 instructional model. Thus it is modified in the procedure and applied with audio-visual aids for improving listening comprehension. The study’s setting was at SMA Negeri 2 Parepare, South Sulawesi Province, Indonesia. The population of the research was the teacher of English at tenth grade at SMAN 2. The sample was taken by using random sampling technique. The data was collected by using in depth interview during the research, recorded, and analyzed using qualitative analysis. This study explored the teacher’s response toward the modified model of metacognitive strategy with audio visual aids in class of listening which covers positive and negative response toward the strategy applied during the teaching of listening. The result of data showed that this strategy helped the teacher a lot in teaching listening comprehension as the procedure has systematic steps toward students’ listening comprehension. Also, it eases the teacher to teach listening by empowering audio visual aids such as video taken from youtube.
BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

Directory of Open Access Journals (Sweden)

A. A. Karpov

2014-09-01

Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.
Newnes audio and Hi-Fi engineer's pocket book

CERN Document Server

Capel, Vivian

2013-01-01

Newnes Audio and Hi-Fi Engineer's Pocket Book, Second Edition provides concise discussion of several audio topics. The book is comprised of 10 chapters that cover different audio equipment. The coverage of the text includes microphones, gramophones, compact discs, and tape recorders. The book also covers high-quality radio, amplifiers, and loudspeakers. The book then reviews the concepts of sound and acoustics, and presents some facts and formulas relevant to audio. The text will be useful to sound engineers and other professionals whose work involves sound systems.
Video-based multimedia designs: A research study testing learning effectiveness

Directory of Open Access Journals (Sweden)

David Reiss

2008-05-01

Full Text Available This paper summarizes research conducted on three computer-based video models’ effectiveness for learning based on memory and comprehension. In this quantitative study, a two-minute video presentation was created and played back in three different types of media players, for a sample of eighty-seven college freshman. The three players evaluated include a standard QuickTime video/audio player, a QuickTime player with embedded triggers that launched HTML-based study guide pages, and a Macromedia Flash-based video/audio player with a text field, with user activated links to the study guides as well as other interactive on-line resources. An assumption guiding this study was that the enhanced designs presenting different types of related information would reinforce the material and produce better comprehension and retention. However, findings indicate that the standard video player was the most effective overall, which suggests that media designs able to control the focus of a learner’s attention to one specific stream of information, a single-stream focused approach, may be the most effective way to present media-based content. Résumé: Cet article résume une étude vérifiant l’efficacité de l’apprentissage basé sur la mémorisation et la compréhension, conduite à partir de trois modèles basés sur la vidéo informatisée. Dans cette étude quantitative, une vidéo de deux minutes a été créée et lue sur trois types de lecteurs différents, pour un échantillon de 87 étudiants universitaires de première année. Les trois lecteurs évalués comprenaient un lecteur standard audio/vidéo Quicktime, un lecteur Quicktime avec déclencheurs intégrés qui lançait un guide d’étude en HTML, et un lecteur audio/vidéo Flash Macromedia avec un champ texte, comprenant des liens activés par l’usager vers des guides d’étude et d’autres ressources interactives en ligne. Une supposition guidant cette étude était que les designs
Use of Audio Cuing to Expand Employment Opportunities for Adolescents with Autism Spectrum Disorders and Intellectual Disabilities

Science.gov (United States)

Allen, Keith D.; Burke, Raymond V.; Howard, Monica R.; Wallace, Dustin P.; Bowen, Scott L.

2012-01-01

We evaluated audio cuing to facilitate community employment of individuals with autism and intellectual disability. The job required promoting products in retail stores by wearing an air-inflated WalkAround[R] costume of a popular commercial character. Three adolescents, ages 16-18, were initially trained with video modeling. Audio cuing was then…
Video- or text-based e-learning when teaching clinical procedures? A randomized controlled trial

Directory of Open Access Journals (Sweden)

Buch SV

2014-08-01

Full Text Available Steen Vigh Buch,1 Frederik Philip Treschow,2 Jesper Brink Svendsen,3 Bjarne Skjødt Worm4 1Department of Vascular Surgery, Rigshospitalet, Copenhagen, Denmark; 2Department of Anesthesia and Intensive Care, Herlev Hospital, Copenhagen, Denmark; 3Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; 4Department of Anesthesia and Intensive Care, Bispebjerg Hospital, Copenhagen, Denmark Background and aims: This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Materials and methods: Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. Results: The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (P<0.001 and in the follow-up test (P<0.01. Regarding theoretical knowledge, no differences were found between the groups on the primary test, though the video group performed better on the follow-up test (P=0.04. Conclusion: Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills. Keywords: e-learning, video versus text, medicine, clinical skills
Subtitled video tutorials, an accessible teaching material

Directory of Open Access Journals (Sweden)

Luis Bengochea

2012-11-01

Full Text Available The use of short-lived audio-visual tutorials constitutes an educational resource very attractive for young students, widely familiar with this type of format similar to YouTube clips. Considered as "learning pills", these tutorials are intended to strengthen the understanding of complex concepts that because their dynamic nature can’t be represented through texts or diagrams. However, the inclusion of this type of content in eLearning platforms presents accessibility problems for students with visual or hearing disabilities. This paper describes this problem and shows the way in which a teacher could add captions and subtitles to their videos.
VOX POPULI: Automatic Generation of Biased Video Sequences

NARCIS (Netherlands)

S. Bocconi; F.-M. Nack (Frank)

2004-01-01

textabstractWe describe our experimental rhetoric engine Vox Populi that generates biased video-sequences from a repository of video interviews and other related audio-visual web sources. Users are thus able to explore their own opinions on controversial topics covered by the repository. The
VOX POPULI: automatic generation of biased video sequences

NARCIS (Netherlands)

S. Bocconi; F.-M. Nack (Frank)

2004-01-01

textabstractWe describe our experimental rhetoric engine Vox Populi that generates biased video-sequences from a repository of video interviews and other related audio-visual web sources. Users are thus able to explore their own opinions on controversial topics covered by the repository. The
Multimodal Semantics Extraction from User-Generated Videos

Directory of Open Access Journals (Sweden)

Francesco Cricri

2012-01-01

Full Text Available User-generated video content has grown tremendously fast to the point of outpacing professional content creation. In this work we develop methods that analyze contextual information of multiple user-generated videos in order to obtain semantic information about public happenings (e.g., sport and live music events being recorded in these videos. One of the key contributions of this work is a joint utilization of different data modalities, including such captured by auxiliary sensors during the video recording performed by each user. In particular, we analyze GPS data, magnetometer data, accelerometer data, video- and audio-content data. We use these data modalities to infer information about the event being recorded, in terms of layout (e.g., stadium, genre, indoor versus outdoor scene, and the main area of interest of the event. Furthermore we propose a method that automatically identifies the optimal set of cameras to be used in a multicamera video production. Finally, we detect the camera users which fall within the field of view of other cameras recording at the same public happening. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real sport events and live music performances.
Smartphone audio port data collection cookbook

Directory of Open Access Journals (Sweden)

Kyle Forinash

2018-06-01

Full Text Available The audio port of a smartphone is designed to send and receive audio but can be harnessed for portable, economical, and accurate data collection from a variety of sources. While smartphones have internal sensors to measure a number of physical phenomena such as acceleration, magnetism and illumination levels, measurement of other phenomena such as voltage, external temperature, or accurate timing of moving objects are excluded. The audio port cannot be only employed to sense external phenomena. It has the additional advantage of timing precision; because audio is recorded or played at a controlled rate separated from other smartphone activities, timings based on audio can be highly accurate. The following outlines unpublished details of the audio port technical elements for data collection, a general data collection recipe and an example timing application for Android devices.
Video- or text-based e-learning when teaching clinical procedures? A randomized controlled trial.

Science.gov (United States)

Buch, Steen Vigh; Treschow, Frederik Philip; Svendsen, Jesper Brink; Worm, Bjarne Skjødt

2014-01-01

This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (Pvideo group performed better on the follow-up test (P=0.04). Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills.
Text Chat during Video/Audio Conferencing Lessons: Scaffolding or Getting in the Way?

Science.gov (United States)

Kozar, Olga

2016-01-01

Private online language tutoring is growing in popularity. An important prerequisite for development of effective pedagogies in this context is a good understanding of how different modalities can be combined. This study provides a detailed account of how several experienced private online teachers use text chat in their Skype-based English…
Automated music selection of video ads

Directory of Open Access Journals (Sweden)

Wiesener Oliver

2017-07-01

Full Text Available The importance of video ads on social media platforms can be measured by views. For instance, Samsung’s commercial ad for one of its new smartphones reached more than 46 million viewers at Youtube. A video ad addresses the visual as well as the auditive sense of users. Often the visual sense is busy in the sense that users focus other screens than the screen with the video ad. This is called the second screen syndrome. Therefore, the importance of the audio channel seems to grow. To get back the visual attention of users that are deflected from other visual impulses it appears reasonable to adapt the music to the target group. Additionally, it appears useful to adapt the music to content of the video. Thus, the overall success of a video ad could by increased by increasing the attention of the users. Humans typically make the decision about the music of a video ad. If there is a correlation between music, products and target groups, a digitization of the music selection process seems to be possible. Since the digitization progress in the music sector is mainly focused on music composing this article strives for making a first step towards the digitization of the music selection.
Advanced text and video analytics for proactive decision making

Science.gov (United States)

Bowman, Elizabeth K.; Turek, Matt; Tunison, Paul; Porter, Reed; Thomas, Steve; Gintautas, Vadas; Shargo, Peter; Lin, Jessica; Li, Qingzhe; Gao, Yifeng; Li, Xiaosheng; Mittu, Ranjeev; Rosé, Carolyn Penstein; Maki, Keith; Bogart, Chris; Choudhari, Samrihdi Shree

2017-05-01

Today's warfighters operate in a highly dynamic and uncertain world, and face many competing demands. Asymmetric warfare and the new focus on small, agile forces has altered the framework by which time critical information is digested and acted upon by decision makers. Finding and integrating decision-relevant information is increasingly difficult in data-dense environments. In this new information environment, agile data algorithms, machine learning software, and threat alert mechanisms must be developed to automatically create alerts and drive quick response. Yet these advanced technologies must be balanced with awareness of the underlying context to accurately interpret machine-processed indicators and warnings and recommendations. One promising approach to this challenge brings together information retrieval strategies from text, video, and imagery. In this paper, we describe a technology demonstration that represents two years of tri-service research seeking to meld text and video for enhanced content awareness. The demonstration used multisource data to find an intelligence solution to a problem using a common dataset. Three technology highlights from this effort include 1) Incorporation of external sources of context into imagery normalcy modeling and anomaly detection capabilities, 2) Automated discovery and monitoring of targeted users from social media text, regardless of language, and 3) The concurrent use of text and imagery to characterize behaviour using the concept of kinematic and text motifs to detect novel and anomalous patterns. Our demonstration provided a technology baseline for exploiting heterogeneous data sources to deliver timely and accurate synopses of data that contribute to a dynamic and comprehensive worldview.
An Introduction to Boiler Water Chemistry for the Marine Engineer: A Text of Audio-Tutorial Instruction.

Science.gov (United States)

Schlenker, Richard M.; And Others

Presented is a manuscript for an introductory boiler water chemistry course for marine engineer education. The course is modular, self-paced, audio-tutorial, contract graded and combined lecture-laboratory instructed. Lectures are presented to students individually via audio-tapes and 35 mm slides. The course consists of a total of 17 modules -…
Making the Switch to Digital Audio

Directory of Open Access Journals (Sweden)

Shannon Gwin Mitchell

2004-12-01

Full Text Available In this article, the authors describe the process of converting from analog to digital audio data. They address the step-by-step decisions that they made in selecting hardware and software for recording and converting digital audio, issues of system integration, and cost considerations. The authors present a brief description of how digital audio is being used in their current research project and how it has enhanced the “quality” of their qualitative research.
Audio Recording of Children with Dyslalia

Directory of Open Access Journals (Sweden)

Stefan Gheorghe Pentiuc

2008-01-01

Full Text Available In this paper we present our researches regarding automat parsing of audio recordings. These recordings are obtained from children with dyslalia and are necessary for an accurate identification of speech problems. We develop a software application that helps parsing audio, real time, recordings.
A Method to Detect AAC Audio Forgery

Directory of Open Access Journals (Sweden)

Qingzhong Liu

2015-08-01

Full Text Available Advanced Audio Coding (AAC, a standardized lossy compression scheme for digital audio, which was designed to be the successor of the MP3 format, generally achieves better sound quality than MP3 at similar bit rates. While AAC is also the default or standard audio format for many devices and AAC audio files may be presented as important digital evidences, the authentication of the audio files is highly needed but relatively missing. In this paper, we propose a scheme to expose tampered AAC audio streams that are encoded at the same encoding bit-rate. Specifically, we design a shift-recompression based method to retrieve the differential features between the re-encoded audio stream at each shifting and original audio stream, learning classifier is employed to recognize different patterns of differential features of the doctored forgery files and original (untouched audio files. Experimental results show that our approach is very promising and effective to detect the forgery of the same encoding bit-rate on AAC audio streams. Our study also shows that shift recompression-based differential analysis is very effective for detection of the MP3 forgery at the same bit rate.

WLAN Technologies for Audio Delivery

Directory of Open Access Journals (Sweden)

Nicolas-Alexander Tatlas

2007-01-01

Full Text Available Audio delivery and reproduction for home or professional applications may greatly benefit from the adoption of digital wireless local area network (WLAN technologies. The most challenging aspect of such integration relates the synchronized and robust real-time streaming of multiple audio channels to multipoint receivers, for example, wireless active speakers. Here, it is shown that current WLAN solutions are susceptible to transmission errors. A detailed study of the IEEE802.11e protocol (currently under ratification is also presented and all relevant distortions are assessed via an analytical and experimental methodology. A novel synchronization scheme is also introduced, allowing optimized playback for multiple receivers. The perceptual audio performance is assessed for both stereo and 5-channel applications based on either PCM or compressed audio signals.
Modified BTC Algorithm for Audio Signal Coding

Directory of Open Access Journals (Sweden)

TOMIC, S.

2016-11-01

Full Text Available This paper describes modification of a well-known image coding algorithm, named Block Truncation Coding (BTC and its application in audio signal coding. BTC algorithm was originally designed for black and white image coding. Since black and white images and audio signals have different statistical characteristics, the application of this image coding algorithm to audio signal presents a novelty and a challenge. Several implementation modifications are described in this paper, while the original idea of the algorithm is preserved. The main modifications are performed in the area of signal quantization, by designing more adequate quantizers for audio signal processing. The result is a novel audio coding algorithm, whose performance is presented and analyzed in this research. The performance analysis indicates that this novel algorithm can be successfully applied in audio signal coding.
DML in VIDEO-CONFERENCING APPLICATIONS

OpenAIRE

Giske, Mats Andreas

2012-01-01

Today's audio in video-conference rooms do not in general have high quality audio standards.Most of the set-ups are PC-Speakers mounted on the wall, with a microphone on the table. With this, strong room modes are often excited from the speakers. When speaking to someone which also have bad equipment, the speech-intelligibility can be really bad. One solution presented with this Master thesis is to improve the loudspeaker set-up by mounting a DML(Distributed Mode Loudspeaker) above and parall...
Investigating the quality of video consultations performed using fourth generation (4G) mobile telecommunications.

Science.gov (United States)

Caffery, Liam J; Smith, Anthony C

2015-09-01

The use of fourth-generation (4G) mobile telecommunications to provide real-time video consultations were investigated in this study with the aims of determining if 4G is a suitable telecommunications technology; and secondly, to identify if variation in perceived audio and video quality were due to underlying network performance. Three patient end-points that used 4G Internet connections were evaluated. Consulting clinicians recorded their perception of audio and video quality using the International Telecommunications Union scales during clinics with these patient end-points. These scores were used to calculate a mean opinion score (MOS). The network performance metrics were obtained for each session and the relationships between these metrics and the session's quality scores were tested. Clinicians scored the quality of 50 hours of video consultations, involving 36 clinic sessions. The MOS for audio was 4.1 ± 0.62 and the MOS for video was 4.4 ± 0.22. Image impairment and effort to listen were also rated favourably. There was no correlation between audio or video quality and the network metrics of packet loss or jitter. These findings suggest that 4G networks are an appropriate telecommunication technology to deliver real-time video consultations. Variations in quality scores observed during this study were not explained by the packet loss and jitter in the underlying network. Before establishing a telemedicine service, the performance of the 4G network should be assessed at the location of the proposed service. This is due to known variability in performance of 4G networks. © The Author(s) 2015.
Semantic Context Detection Using Audio Event Fusion

Directory of Open Access Journals (Sweden)

Cheng Wen-Huang

2006-01-01

Full Text Available Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model and discriminative (support vector machine (SVM approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.
Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

DEFF Research Database (Denmark)

Debortoli, Stefan; Müller, Oliver; Junglas, Iris

2016-01-01

, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic...... researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.......t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches...
Unsupervised topic modelling on South African parliament audio data

CSIR Research Space (South Africa)

Kleynhans, N

2014-11-01

Full Text Available Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many...
Intelligent audio analysis

CERN Document Server

Schuller, Björn W

2013-01-01

This book provides the reader with the knowledge necessary for comprehension of the field of Intelligent Audio Analysis. It firstly introduces standard methods and discusses the typical Intelligent Audio Analysis chain going from audio data to audio features to audio recognition. Further, an introduction to audio source separation, and enhancement and robustness are given. After the introductory parts, the book shows several applications for the three types of audio: speech, music, and general sound. Each task is shortly introduced, followed by a description of the specific data and methods applied, experiments and results, and a conclusion for this specific task. The books provides benchmark results and standardized test-beds for a broader range of audio analysis tasks. The main focus thereby lies on the parallel advancement of realism in audio analysis, as too often today’s results are overly optimistic owing to idealized testing conditions, and it serves to stimulate synergies arising from transfer of ...
Video2vec Embeddings Recognize Events When Examples Are Scarce.

Science.gov (United States)

Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G M

2017-10-01

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.
Paper-Based Textbooks with Audio Support for Print-Disabled Students.

Science.gov (United States)

Fujiyoshi, Akio; Ohsawa, Akiko; Takaira, Takuya; Tani, Yoshiaki; Fujiyoshi, Mamoru; Ota, Yuko

2015-01-01

Utilizing invisible 2-dimensional codes and digital audio players with a 2-dimensional code scanner, we developed paper-based textbooks with audio support for students with print disabilities, called "multimodal textbooks." Multimodal textbooks can be read with the combination of the two modes: "reading printed text" and "listening to the speech of the text from a digital audio player with a 2-dimensional code scanner." Since multimodal textbooks look the same as regular textbooks and the price of a digital audio player is reasonable (about 30 euro), we think multimodal textbooks are suitable for students with print disabilities in ordinary classrooms.
Testing music selection automation possibilities for video ads

Directory of Open Access Journals (Sweden)

Wiesener Oliver

2017-09-01

Full Text Available The importance of video ads on social media platforms can be measured by the number of views. For instance, Samsung’s commercial ad for one of its new smartphones reached more than 46 million viewers at Youtube. Video ads address users both visually and aurally. Often, the visual sense is engaged by users focusing on other screens, rather than on the screen with the video ad, which is referred to as the second screen syndrome. Therefore, the importance of the audio channel seems to gain more importance. To get back the visual attention of users that are deflected from other visual impulses it appears reasonable to adapt the music to the target group. Additionally, it appears useful to adapt the music to the content of the video. Thus, the overall success of a video ad could be improved by increasing the attention of the users. Humans typically decide which music is to be used in a video ad. If there is a correlation between music, products and target groups, a digitization of the music selection process appears to be possible. Since the digitization progress in the music sector is currently mainly focused on music composing this article strives for taking a first step towards the digitization of the music selection.
Mining Contextual Information for Ephemeral Digital Video Preservation

Directory of Open Access Journals (Sweden)

Chirag Shah

2009-06-01

Full Text Available Normal 0 For centuries the archival community has understood and practiced the art of adding contextual information while preserving an artifact. The question now is how these practices can be transferred to the digital domain. With the growing expansion of production and consumption of digital objects (documents, audio, video, etc. it has become essential to identify and study issues related to their representation. A curator in the digital realm may be said to have the same responsibilities as one in a traditional archival domain. However, with the mass production and spread of digital objects, it may be difficult to do all the work manually. In the present article this problem is considered in the area of digital video preservation. We show how this problem can be formulated and propose a framework for capturing contextual information for ephemeral digital video preservation. This proposal is realized in a system called ContextMiner, which allows us to cater to a digital curator's needs with its four components: digital video curation, collection visualization, browsing interfaces, and video harvesting and monitoring. While the issues and systems described here are geared toward digital videos, they can easily be applied to other kinds of digital objects.
The Effect of Audio and Animation in Multimedia Instruction

Science.gov (United States)

Koroghlanian, Carol; Klein, James D.

2004-01-01

This study investigated the effects of audio, animation, and spatial ability in a multimedia computer program for high school biology. Participants completed a multimedia program that presented content by way of text or audio with lean text. In addition, several instructional sequences were presented either with static illustrations or animations.…
Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

NARCIS (Netherlands)

Carmichael, J.; Larson, M.; Marlow, J.; Newman, E.; Clough, P.; Oomen, J.; Sav, S.

2008-01-01

This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their
Affective video retrieval: violence detection in Hollywood movies by large-scale segmental feature extraction.

Directory of Open Access Journals (Sweden)

Florian Eyben

Full Text Available Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Animation, audio, and spatial ability: Optimizing multimedia for scientific explanations

Science.gov (United States)

Koroghlanian, Carol May

This study investigated the effects of audio, animation and spatial ability in a computer based instructional program for biology. The program presented instructional material via text or audio with lean text and included eight instructional sequences presented either via static illustrations or animations. High school students enrolled in a biology course were blocked by spatial ability and randomly assigned to one of four treatments (Text-Static Illustration Audio-Static Illustration, Text-Animation, Audio-Animation). The study examined the effects of instructional mode (Text vs. Audio), illustration mode (Static Illustration vs. Animation) and spatial ability (Low vs. High) on practice and posttest achievement, attitude and time. Results for practice achievement indicated that high spatial ability participants achieved more than low spatial ability participants. Similar results for posttest achievement and spatial ability were not found. Participants in the Static Illustration treatments achieved the same as participants in the Animation treatments on both the practice and posttest. Likewise, participants in the Text treatments achieved the same as participants in the Audio treatments on both the practice and posttest. In terms of attitude, participants responded favorably to the computer based instructional program. They found the program interesting, felt the static illustrations or animations made the explanations easier to understand and concentrated on learning the material. Furthermore, participants in the Animation treatments felt the information was easier to understand than participants in the Static Illustration treatments. However, no difference for any attitude item was found for participants in the Text as compared to those in the Audio treatments. Significant differences were found by Spatial Ability for three attitude items concerning concentration and interest. In all three items, the low spatial ability participants responded more positively
76 FR 8659 - Structure and Practices of the Video Relay Service Program

Science.gov (United States)

2011-02-15

... Practices of the Video Relay Service Program AGENCY: Federal Communications Commission. ACTION: Final rule... with the Commission's Structure and Practices of the Video Relay Service Program, Declaratory Ruling... accessible formats for people with disabilities (Braille, large print, electronic files, audio format), send...
AudioMUD: a multiuser virtual environment for blind people.

Science.gov (United States)

Sánchez, Jaime; Hassler, Tiago

2007-03-01

A number of virtual environments have been developed during the last years. Among them there are some applications for blind people based on different type of audio, from simple sounds to 3-D audio. In this study, we pursued a different approach. We designed AudioMUD by using spoken text to describe the environment, navigation, and interaction. We have also introduced some collaborative features into the interaction between blind users. The core of a multiuser MUD game is a networked textual virtual environment. We developed AudioMUD by adding some collaborative features to the basic idea of a MUD and placed a simulated virtual environment inside the human body. This paper presents the design and usability evaluation of AudioMUD. Blind learners were motivated when interacted with AudioMUD and helped to improve the interaction through audio and interface design elements.
AUTOMATIC SEGMENTATION OF BROADCAST AUDIO SIGNALS USING AUTO ASSOCIATIVE NEURAL NETWORKS

Directory of Open Access Journals (Sweden)

P. Dhanalakshmi

2010-12-01

Full Text Available In this paper, we describe automatic segmentation methods for audio broadcast data. Today, digital audio applications are part of our everyday lives. Since there are more and more digital audio databases in place these days, the importance of effective management for audio databases have become prominent. Broadcast audio data is recorded from the Television which comprises of various categories of audio signals. Efficient algorithms for segmenting the audio broadcast data into predefined categories are proposed. Audio features namely Linear prediction coefficients (LPC, Linear prediction cepstral coefficients, and Mel frequency cepstral coefficients (MFCC are extracted to characterize the audio data. Auto Associative Neural Networks are used to segment the audio data into predefined categories using the extracted features. Experimental results indicate that the proposed algorithms can produce satisfactory results.
Videosorveglianza come supporto interattivo / La vidéosurveillance comme support intéractif / Video surveillance as an interactive support

Directory of Open Access Journals (Sweden)

Dischi Franco

2010-03-01

Full Text Available Video surveillance is not and cannot be considered a system of image acquistions “end in itself”.The acquired audio-visual “product”, in addition to surveillance and security, provides a useful source of information in case of storage and automatic analysis of data in urban planning to optimise land resources and means of support, for example environmental monitoring to protect habitat, land and ecosystem.These are behavioural precognitive models of video analysis, for a perceptive context of the situation of danger.

Photo-acoustic and video-acoustic methods for sensing distant sound sources

Science.gov (United States)

Slater, Dan; Kozacik, Stephen; Kelmelis, Eric

2017-05-01

Long range telescopic video imagery of distant terrestrial scenes, aircraft, rockets and other aerospace vehicles can be a powerful observational tool. But what about the associated acoustic activity? A new technology, Remote Acoustic Sensing (RAS), may provide a method to remotely listen to the acoustic activity near these distant objects. Local acoustic activity sometimes weakly modulates the ambient illumination in a way that can be remotely sensed. RAS is a new type of microphone that separates an acoustic transducer into two spatially separated components: 1) a naturally formed in situ acousto-optic modulator (AOM) located within the distant scene and 2) a remote sensing readout device that recovers the distant audio. These two elements are passively coupled over long distances at the speed of light by naturally occurring ambient light energy or other electromagnetic fields. Stereophonic, multichannel and acoustic beam forming are all possible using RAS techniques and when combined with high-definition video imagery it can help to provide a more cinema like immersive viewing experience. A practical implementation of a remote acousto-optic readout device can be a challenging engineering problem. The acoustic influence on the optical signal is generally weak and often with a strong bias term. The optical signal is further degraded by atmospheric seeing turbulence. In this paper, we consider two fundamentally different optical readout approaches: 1) a low pixel count photodiode based RAS photoreceiver and 2) audio extraction directly from a video stream. Most of our RAS experiments to date have used the first method for reasons of performance and simplicity. But there are potential advantages to extracting audio directly from a video stream. These advantages include the straight forward ability to work with multiple AOMs (useful for acoustic beam forming), simpler optical configurations, and a potential ability to use certain preexisting video recordings. However
Audio-visual aid in teaching "fatty liver".

Science.gov (United States)

Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

2016-05-06

Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various concepts of the topic, while keeping in view Mayer's and Ellaway guidelines for multimedia presentation. A pre-post test study on subject knowledge was conducted for 100 students with the video shown as intervention. A retrospective pre study was conducted as a survey which inquired about students understanding of the key concepts of the topic and a feedback on our video was taken. Students performed significantly better in the post test (mean score 8.52 vs. 5.45 in pre-test), positively responded in the retrospective pre-test and gave a positive feedback for our video presentation. Well-designed multimedia tools can aid in cognitive processing and enhance working memory capacity as shown in our study. In times when "smart" device penetration is high, information and communication tools in medical education, which can act as essential aid and not as replacement for traditional curriculums, can be beneficial to the students. © 2015 by The International Union of Biochemistry and Molecular Biology, 44:241-245, 2016. © 2015 The International Union of Biochemistry and Molecular Biology.
Low Latency Audio Video: Potentials for Collaborative Music Making through Distance Learning

Science.gov (United States)

Riley, Holly; MacLeod, Rebecca B.; Libera, Matthew

2016-01-01

The primary purpose of this study was to examine the potential of LOw LAtency (LOLA), a low latency audio visual technology designed to allow simultaneous music performance, as a distance learning tool for musical styles in which synchronous playing is an integral aspect of the learning process (e.g., jazz, folk styles). The secondary purpose was…
Perceptual Audio Hashing Functions

Directory of Open Access Journals (Sweden)

Emin Anarım

2005-07-01

Full Text Available Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.
The value of filmed interviews: issues of visualization, visual transcrips and the reading of visual texts

NARCIS (Netherlands)

Witteveen, L.; Lie, R.

2013-01-01

The increased access to video technology advances the use of visual methodologies in research. Following these developments, researchers are confronted with new challenges. Video recordings of interviews, as compared to audio recordings, are gaining interest in qualitative field research in the
A new video programme

CERN Multimedia

CERN video productions

2011-01-01

"What's new @ CERN?", a new monthly video programme, will be broadcast on the Monday of every month on webcast.cern.ch. Aimed at the general public, the programme will cover the latest CERN news, with guests and explanatory features. Tune in on Monday 3 October at 4 pm (CET) to see the programme in English, and then at 4:20 pm (CET) for the French version. var flash_video_player=get_video_player_path(); insert_player_for_external('Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0753-kbps-640x360-25-fps-audio-64-kbps-44-kHz-stereo', 'mms://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-Multirate-200-to-753-kbps-640x360-25-fps.wmv', 'false', 480, 360, 'https://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-posterframe-640x360-at-10-percent.jpg', '1383406', true, 'Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0600-kbps-maxH-360-25-fps-...
78 FR 31800 - Accessible Emergency Information, and Apparatus Requirements for Emergency Information and Video...

Science.gov (United States)

2013-05-24

... television receivers do not properly handle two audio tracks identified as English, and thus to ensure... recognize that broadcasters and MVPDs have not yet developed a solution pursuant to which tagging the video... secondary audio stream on all equipment, including older equipment. In the absence of an industry solution...
Audio Journal in an ELT Context

Directory of Open Access Journals (Sweden)

Neşe Aysin Siyli

2012-09-01

Full Text Available It is widely acknowledged that one of the most serious problems students of English as a foreign language face is their deprivation of practicing the language outside the classroom. Generally, the classroom is the sole environment where they can practice English, which by its nature does not provide rich setting to help students develop their competence by putting the language into practice. Motivated by this need, this descriptive study investigated the impact of audio dialog journals on students’ speaking skills. It also aimed to gain insights into students’ and teacher’s opinions on keeping audio dialog journals outside the class. The data of the study developed from student and teacher audio dialog journals, student written feedbacks, interviews held with the students, and teacher observations. The descriptive analysis of the data revealed that audio dialog journals served a number of functions ranging from cognitive to linguistic, from pedagogical to psychological, and social. The findings and pedagogical implications of the study are discussed in detail.
Virtual Microphones for Multichannel Audio Resynthesis

Directory of Open Access Journals (Sweden)

Athanasios Mouchtaris

2003-09-01

Full Text Available Multichannel audio offers significant advantages for music reproduction, including the ability to provide better localization and envelopment, as well as reduced imaging distortion. On the other hand, multichannel audio is a demanding media type in terms of transmission requirements. Often, bandwidth limitations prohibit transmission of multiple audio channels. In such cases, an alternative is to transmit only one or two reference channels and recreate the rest of the channels at the receiving end. Here, we propose a system capable of synthesizing the required signals from a smaller set of signals recorded in a particular venue. These synthesized Ã‚Â“virtualÃ‚Â” microphone signals can be used to produce multichannel recordings that accurately capture the acoustics of that venue. Applications of the proposed system include transmission of multichannel audio over the current Internet infrastructure and, as an extension of the methods proposed here, remastering existing monophonic and stereophonic recordings for multichannel rendering.
Audio Conferencing Enhancements

OpenAIRE

VESTERINEN, LEENA

2006-01-01

Audio conferencing allows multiple people in distant locations to interact in a single voice call. Whilst it can be very useful service it also has several key disadvantages. This thesis study investigated the options for improving the user experience of the mobile teleconferencing applications. In particular, the use of 3D, spatial audio and visualinteractive functionality was investigated as the means of improving the intelligibility and audio perception during the audio...
Augmenting Environmental Interaction in Audio Feedback Systems

Directory of Open Access Journals (Sweden)

Seunghun Kim

2016-04-01

Full Text Available Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by enriching the influence of existing acoustic information such as room response and ambient noise. This ecological approach to audio feedback emphasizes mutual sonic interaction between signal processing and the acoustic environment. Mappings from analyses of the received signal to signal-processing parameters are designed to emphasize this specificity as an aesthetic goal. Our feedback system presents four types of mappings: approximate analyses of room reverberation to tempo-scale characteristics, ambient noise to amplitude and two different approximations of resonances to timbre. These mappings are validated computationally and evaluated experimentally in different acoustic conditions.
PhD students at Science & Technology exploring student learning in a collaborative video-circle

DEFF Research Database (Denmark)

Nielsen, Birgitte Lund; Hougaard, Rikke F.

examined using video, audio and questionnaires. The TAs reported a high level of outcomes and referred to the importance of the video as evidence supporting the discussions. The importance of the collaboration between peers and staff (educational developers) was emphasized: highlighting the benefit...
Detection Of Alterations In Audio Files Using Spectrograph Analysis

Directory of Open Access Journals (Sweden)

Anandha Krishnan G

2015-08-01

Full Text Available The corresponding study was carried out to detect changes in audio file using spectrograph. An audio file format is a file format for storing digital audio data on a computer system. A sound spectrograph is a laboratory instrument that displays a graphical representation of the strengths of the various component frequencies of a sound as time passes. The objectives of the study were to find the changes in spectrograph of audio after altering them to compare altering changes with spectrograph of original files and to check for similarity and difference in mp3 and wav. Five different alterations were carried out on each audio file to analyze the differences between the original and the altered file. For altering the audio file MP3 or WAV by cutcopy the file was opened in Audacity. A different audio was then pasted to the audio file. This new file was analyzed to view the differences. By adjusting the necessary parameters the noise was reduced. The differences between the new file and the original file were analyzed. By adjusting the parameters from the dialog box the necessary changes were made. The edited audio file was opened in the software named spek where after analyzing a graph is obtained of that particular file which is saved for further analysis. The original audio graph received was combined with the edited audio file graph to see the alterations.
Multi-modal highlight generation for sports videos using an information-theoretic excitability measure

Science.gov (United States)

Hasan, Taufiq; Bořil, Hynek; Sangwan, Abhijeet; L Hansen, John H.

2013-12-01

The ability to detect and organize `hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on the likelihood of the segmental features residing in certain regions of their joint probability density function space which are considered both exciting and rare. The proposed measure is used to rank order the partitioned segments to compress the overall video sequence and produce a contiguous set of highlights. Experiments are performed on baseball videos based on signal processing advancements for excitement assessment in the commentators' speech, audio energy, slow motion replay, scene cut density, and motion activity as features. Detailed analysis on correlation between user excitability and various speech production parameters is conducted and an effective scheme is designed to estimate the excitement level of commentator's speech from the sports videos. Subjective evaluation of excitability and ranking of video segments demonstrate a higher correlation with the proposed measure compared to well-established techniques indicating the effectiveness of the overall approach.
IELTS speaking instruction through audio/voice conferencing

Directory of Open Access Journals (Sweden)

Hamed Ghaemi

2012-02-01

Full Text Available The currentstudyaimsatinvestigatingtheimpactofAudio/Voiceconferencing,asanewapproachtoteaching speaking, on the speakingperformanceand/orspeakingband score ofIELTScandidates.Experimentalgroupsubjectsparticipated in an audio conferencing classwhile those of the control group enjoyed attending in a traditional IELTS Speakingclass. At the endofthestudy,allsubjectsparticipatedinanIELTSExaminationheldonNovemberfourthin Tehran,Iran.To compare thegroupmeansforthestudy,anindependentt-testanalysiswasemployed.Thedifferencebetween experimental and control groupwasconsideredtobestatisticallysignificant(P<0.01.Thatisthecandidates in experimental group have outperformed the ones in control group in IELTS Speaking test scores.
Audio-visual identification of place of articulation and voicing in white and babble noise.

Science.gov (United States)

Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

2009-07-01

Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.
A System for the Semantic Multimodal Analysis of News Audio-Visual Content

Directory of Open Access Journals (Sweden)

Michael G. Strintzis

2010-01-01

Full Text Available News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news content, the automatic analysis and annotation that is required for supporting intelligent search and delivery of this content remains an open issue. In this paper, a complete architecture for knowledge-assisted multimodal analysis of news-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text separately and proposes a novel fusion technique based on the particular characteristics of news-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic annotations.
EVALUASI KEPUASAN PENGGUNA TERHADAP APLIKASI AUDIO BOOKS

Directory of Open Access Journals (Sweden)

Raditya Maulana Anuraga

2017-02-01

Full Text Available Listeno is the first application audio books in Indonesia so that the users can get the book in audio form like listen to music, Listeno have problems in a feature request Listeno offline mode that have not been released, a security problem mp3 files that must be considered, and the target Listeno not yet reached 100,000 active users. This research has the objective to evaluate user satisfaction to Audio Books with research method approach, Nielsen. The analysis in this study using Importance Performance Analysis (IPA is combined with the index of User Satisfaction (IKP based on the indicators used are: Benefit (Usefulness, Utility (Utility, Usability (Usability, easy to understand (Learnability, Efficient (efficiency , Easy to remember (Memorability, Error (Error, and satisfaction (satisfaction. The results showed Applications User Satisfaction Audio books are quite satisfied with the results of the calculation IKP 69.58%..
Perancangan Sistem Audio Mobil Berbasiskan Sistem Pakar dan Web

Directory of Open Access Journals (Sweden)

Djunaidi Santoso

2011-12-01

Full Text Available Designing car audio that fits user’s needs is a fun activity. However, the design often consumes more time and costly since it should be consulted to the experts several times. For easy access to information in designing a car audio system as well as error prevention, an car audio system based on expert system and web is designed for those who do not have sufficient time and expense to consult directly to experts. This system consists of tutorial modules designed using the HyperText Preprocessor (PHP and MySQL as database. This car audio system design is evaluated uses black box testing method which focuses on the functional needs of the application. Tests are performed by providing inputs and produce outputs corresponding to the function of each module. The test results prove the correspondence between input and output, which means that the program meet the initial goals of the design.
Teaching the blind to find their way by playing video games.

Directory of Open Access Journals (Sweden)

Lotfi B Merabet

Full Text Available Computer based video games are receiving great interest as a means to learn and acquire new skills. As a novel approach to teaching navigation skills in the blind, we have developed Audio-based Environment Simulator (AbES; a virtual reality environment set within the context of a video game metaphor. Despite the fact that participants were naïve to the overall purpose of the software, we found that early blind users were able to acquire relevant information regarding the spatial layout of a previously unfamiliar building using audio based cues alone. This was confirmed by a series of behavioral performance tests designed to assess the transfer of acquired spatial information to a large-scale, real-world indoor navigation task. Furthermore, learning the spatial layout through a goal directed gaming strategy allowed for the mental manipulation of spatial information as evidenced by enhanced navigation performance when compared to an explicit route learning strategy. We conclude that the immersive and highly interactive nature of the software greatly engages the blind user to actively explore the virtual environment. This in turn generates an accurate sense of a large-scale three-dimensional space and facilitates the learning and transfer of navigation skills to the physical world.

The presentation of expert testimony via live audio-visual communication.

Science.gov (United States)

Miller, R D

1991-01-01

As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities.
Low Delay Video Streaming on the Internet of Things Using Raspberry Pi

Directory of Open Access Journals (Sweden)

Ulf Jennehag

2016-09-01

Full Text Available The Internet of Things is predicted to consist of over 50 billion devices aiming to solve problems in most areas of our digital society. A large part of the data communicated is expected to consist of various multimedia contents, such as live audio and video. This article presents a solution for the communication of high definition video in low-delay scenarios (<200 ms under the constraints of devices with limited hardware resources, such as the Raspberry Pi. We verify that it is possible to enable low delay video streaming between Raspberry Pi devices using a distributed Internet of Things system called the SensibleThings platform. Specifically, our implementation transfers a 6 Mbps H.264 video stream of 1280 × 720 pixels at 25 frames per second between devices with a total delay of 181 ms on the public Internet, of which the overhead of the distributed Internet of Things communication platform only accounts for 18 ms of this delay. We have found that the most significant bottleneck of video transfer on limited Internet of Things devices is the video coding and not the distributed communication platform, since the video coding accounts for 90% of the total delay.
Editing Audio with Audacity

Directory of Open Access Journals (Sweden)

Brandon Walsh

2016-08-01

Full Text Available For those interested in audio, basic sound editing skills go a long way. Being able to handle and manipulate the materials can help you take control of your object of study: you can zoom in and extract particular moments to analyze, process the audio, and upload the materials to a server to compliment a blog post on the topic. On a more practical level, these skills could also allow you to record and package recordings of yourself or others for distribution. That guest lecture taking place in your department? Record it and edit it yourself! Doing so is a lightweight way to distribute resources among various institutions, and it also helps make the materials more accessible for readers and listeners with a wide variety of learning needs. In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files. Sound editing platforms are often expensive and offer extensive capabilities that can be overwhelming to the first-time user, but Audacity is a free and open source alternative that offers powerful capabilities for sound editing with a low barrier for entry. For this lesson we will work with two audio files: a recording of Bach’s Goldberg Variations available from MusOpen and another recording of your own voice that will be made in the course of the lesson. This tutorial uses Audacity 2.1.2, released January 2016.
EMPOWERING STUDENTS OF INTERNATIONAL CLASS PROGRAMIAIN SALATIGA IN THE PRODUCTION OF PRAYING TUTORIAL VIDEO

Directory of Open Access Journals (Sweden)

Rifqi Aulia Erlangga

2017-02-01

responses from students. They feel that by watching the video and see the audio and visual movements of the prayer contained in the video can make them able to remember more clearly how to do praying, especially for praying that is not done everyday.
Audio Twister

DEFF Research Database (Denmark)

Cermak, Daniel; Moreno Garcia, Rodrigo; Monastiridis, Stefanos

2015-01-01

Daniel Cermak-Sassenrath, Rodrigo Moreno Garcia, Stefanos Monastiridis. Audio Twister. Installation. P-Hack Copenhagen 2015, Copenhagen, DK, Apr 24, 2015.......Daniel Cermak-Sassenrath, Rodrigo Moreno Garcia, Stefanos Monastiridis. Audio Twister. Installation. P-Hack Copenhagen 2015, Copenhagen, DK, Apr 24, 2015....
Back to basics audio

CERN Document Server

Nathan, Julian

1998-01-01

Back to Basics Audio is a thorough, yet approachable handbook on audio electronics theory and equipment. The first part of the book discusses electrical and audio principles. Those principles form a basis for understanding the operation of equipment and systems, covered in the second section. Finally, the author addresses planning and installation of a home audio system.Julian Nathan joined the audio service and manufacturing industry in 1954 and moved into motion picture engineering and production in 1960. He installed and operated recording theaters in Sydney, Austra
A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

Science.gov (United States)

Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

2006-12-01

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.
Analisa Kinerja Jaringan Provider untuk Aplikasi Video Chatting (Studi Kasus di Daerah Marpoyan)

OpenAIRE

Danur, Jeri Dwi; ', Febrizal '

2016-01-01

Video chat is a communication that allows users to chat and can see the other person's face directly. Video chatting requires a good network to transmit audio and video data in a short time. This paper examines the performance of three mobile network provider (Provider A and Provider B Provider C) in a video chat in the area Marpoyan and the effect of time when doing video chat on the performance of each provider. The method used is the method using packet sniffing software wireshark to see Q...
Anatomy of a Cancer Treatment Scam

Medline Plus

Full Text Available ... Calendar All Events Weekly Calendar Weekly Calendar Archive Speeches Audio/Video Featured Videos FTC Events For Consumers ... Press Releases Commission Actions Media Resources Events Calendar Speeches Audio/Video Social Media Blogs Contests Enforcement Cases ...
Modular Short Form Videos for Library Instruction

Directory of Open Access Journals (Sweden)

Cindy Craig

2017-10-01

Full Text Available In Brief Expensive software isn’t necessary to create effective tutorials. Quick, unedited tutorials created on social media, such as on Instagram or Snapchat, may be more effective. These short form videos (SFVs combine the advantages of animated GIFs with the advantages of screencasts: modularity, repetition of steps, and animated visuals supported by pertinent audio. SFVs are cheap (or free and easy to make with materials libraries already possess, such as Internet access, computers, and smartphones. They are easily replaceable if the subject changes. The short form forces librarians to get right to the point. Finally, SFVs are easily disseminated on social media and have the potential to go viral.
Music Genre Classification Using MIDI and Audio Features

Directory of Open Access Journals (Sweden)

Abdullah Sonmez

2007-01-01

Full Text Available We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD. NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.
Formal usability evaluation of audio track widget graphical representation for two-dimensional stage audio mixing interface

OpenAIRE

Dewey, Christopher; Wakefield, Jonathan P.

2017-01-01

The two-dimensional stage paradigm (2DSP) has been suggested as an alternative audio mixing interface (AMI). This study seeks to refine the 2DSP by formally evaluating graphical track visualisation styles. Track visualisations considered were text only, circles containing text, individually coloured circles containing text, circles colour coded by instrument type with text, icons with text superimposed, circles with RMS related dynamic opacity and a traditional AMI. The usability evaluation f...
Topological mappings of video and audio data.

Science.gov (United States)

Fyfe, Colin; Barbakh, Wesam; Ooi, Wei Chuan; Ko, Hanseok

2008-12-01

We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).(1) But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.(2) We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.
Evaluating the Use of Problem-Based Video Podcasts to Teach Mathematics in Higher Education

Science.gov (United States)

Kay, Robin; Kletskin, Ilona

2012-01-01

Problem-based video podcasts provide short, web-based, audio-visual explanations of how to solve specific procedural problems in subject areas such as mathematics or science. A series of 59 problem-based video podcasts covering five key areas (operations with functions, solving equations, linear functions, exponential and logarithmic functions,…
The Simple Video Coder: A free tool for efficiently coding social video data.

Science.gov (United States)

Barto, Daniel; Bird, Clark W; Hamilton, Derek A; Fink, Brandi C

2017-08-01

Videotaping of experimental sessions is a common practice across many disciplines of psychology, ranging from clinical therapy, to developmental science, to animal research. Audio-visual data are a rich source of information that can be easily recorded; however, analysis of the recordings presents a major obstacle to project completion. Coding behavior is time-consuming and often requires ad-hoc training of a student coder. In addition, existing software is either prohibitively expensive or cumbersome, which leaves researchers with inadequate tools to quickly process video data. We offer the Simple Video Coder-free, open-source software for behavior coding that is flexible in accommodating different experimental designs, is intuitive for students to use, and produces outcome measures of event timing, frequency, and duration. Finally, the software also offers extraction tools to splice video into coded segments suitable for training future human coders or for use as input for pattern classification algorithms.
Identifying Unsafe Videos on Online Public Media using Real-time Crowdsourcing

OpenAIRE

Mridha, Sankar Kumar; Sarkar, Braznev; Chatterjee, Sujoy; Bhattacharyya, Malay

2017-01-01

Due to the significant growth of social networking and human activities through the web in recent years, attention to analyzing big data using real-time crowdsourcing has increased. This data may appear in the form of streaming images, audio or videos. In this paper, we address the problem of deciding the appropriateness of streaming videos in public media with the help of crowdsourcing in real-time.
The effect of music video exposure on students' perceived clinical applications of popular music in the field of music therapy: a pilot study.

Science.gov (United States)

Gooding, Lori F; Mori-Inoue, Satoko

2011-01-01

The purpose of this study was to examine the effect of video exposure on music therapy students' perceptions of clinical applications of popular music in the field of music therapy. Fifty-one participants were randomly divided into two groups and exposed to a popular song in either audio-only or music video format. Participants were asked to indicate clinical applications; specifically, participants chose: (a) possible population(s), (b) most appropriate population(s), (c) possible age range(s), (d) most appropriate age ranges, (e) possible goal area(s) and (f) most appropriate goal area. Data for each of these categories were compiled and analyzed, with no significant differences found in the choices made by the audio-only and video groups. Three items, (a) selection of the bereavement population, (b) selection of bereavement as the most appropriate population and (c) selection of the age ranges of pre teen/mature adult, were additionally selected for further analysis due to their relationship to the video content. Analysis results revealed a significant difference between the video and audio-only groups for the selection of these specific items, with the video group's selections more closely aligned to the video content. Results of this pilot study suggest that music video exposure to popular music can impact how students choose to implement popular songs in the field of music therapy.
Digital Augmented Reality Audio Headset

Directory of Open Access Journals (Sweden)

Jussi Rämö

2012-01-01

Full Text Available Augmented reality audio (ARA combines virtual sound sources with the real sonic environment of the user. An ARA system can be realized with a headset containing binaural microphones. Ideally, the ARA headset should be acoustically transparent, that is, it should not cause audible modification to the surrounding sound. A practical implementation of an ARA mixer requires a low-latency headphone reproduction system with additional equalization to compensate for the attenuation and the modified ear canal resonances caused by the headphones. This paper proposes digital IIR filters to realize the required equalization and evaluates a real-time prototype ARA system. Measurements show that the throughput latency of the digital prototype ARA system can be less than 1.4 ms, which is sufficiently small in practice. When the direct and processed sounds are combined in the ear, a comb filtering effect is brought about and appears as notches in the frequency response. The comb filter effect in speech and music signals was studied in a listening test and it was found to be inaudible when the attenuation is 20 dB. Insert ARA headphones have a sufficient attenuation at frequencies above about 1 kHz. The proposed digital ARA system enables several immersive audio applications, such as a virtual audio tourist guide and audio teleconferencing.
AUDIO CRYPTANALYSIS- AN APPLICATION OF SYMMETRIC KEY CRYPTOGRAPHY AND AUDIO STEGANOGRAPHY

Directory of Open Access Journals (Sweden)

Smita Paira

2016-09-01

Full Text Available In the recent trend of network and technology, “Cryptography” and “Steganography” have emerged out as the essential elements of providing network security. Although Cryptography plays a major role in the fabrication and modification of the secret message into an encrypted version yet it has certain drawbacks. Steganography is the art that meets one of the basic limitations of Cryptography. In this paper, a new algorithm has been proposed based on both Symmetric Key Cryptography and Audio Steganography. The combination of a randomly generated Symmetric Key along with LSB technique of Audio Steganography sends a secret message unrecognizable through an insecure medium. The Stego File generated is almost lossless giving a 100 percent recovery of the original message. This paper also presents a detailed experimental analysis of the algorithm with a brief comparison with other existing algorithms and a future scope. The experimental verification and security issues are promising.
Frequency Hopping Method for Audio Watermarking

Directory of Open Access Journals (Sweden)

A. Anastasijević

2012-11-01

Full Text Available This paper evaluates the degradation of audio content for a perceptible removable watermark. Two different approaches to embedding the watermark in the spectral domain were investigated. The frequencies for watermark embedding are chosen according to a pseudorandom sequence making the methods robust. Consequentially, the lower quality audio can be used for promotional purposes. For a fee, the watermark can be removed with a secret watermarking key. Objective and subjective testing was conducted in order to measure degradation level for the watermarked music samples and to examine residual distortion for different parameters of the watermarking algorithm and different music genres.

The Use of Audio and Animation in Computer Based Instruction.

Science.gov (United States)

Koroghlanian, Carol; Klein, James D.

This study investigated the effects of audio, animation, and spatial ability in a computer-based instructional program for biology. The program presented instructional material via test or audio with lean text and included eight instructional sequences presented either via static illustrations or animations. High school students enrolled in a…
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Directory of Open Access Journals (Sweden)

Petar S. Aleksic

2002-11-01

Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

Science.gov (United States)

Abdous, M'hammed; He, Wu

2011-01-01

Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

International Nuclear Information System (INIS)

George, Rohini; Chung, Theodore D.; Vedam, Sastry S.; Ramakrishnan, Viswanathan; Mohan, Radhe; Weiss, Elisabeth; Keall, Paul J.

2006-01-01

Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathed without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating
COMPOSITIONAL AND CONTENT-RELATED PARTICULARITIES OF POLITICAL MEDIA TEXTS (THROUGH THE EXAMPLE OF THE TEXTS OF POLITICAL VIDEO CLIPS ISSUED BY THE CANDIDATES FOR PRESIDENCY IN FRANCE IN 2017

Directory of Open Access Journals (Sweden)

Dmitrieva, A.V.

2017-09-01

Full Text Available The article examines the texts of political advertising video clips issued by the candidates for presidency in France during the campaign before the first round of elections in 2017. The mentioned examples of media texts are analysed from the compositional point of view as well as from that of the content particularities which are directly connected to the text structure. In general, the majority of the studied clips have a similar structure and consist of three parts: introduction, main part and conclusion. However, as a result of the research, a range of advantages marking well-structured videos was revealed. These include: addressing the voters and stating the speech topic clearly at the beginning of the clip, a relevant attention-grabbing opening phrase, consistency and clarity of the information presentation, appropriate use of additional video plots, conclusion at the end of the clip.
Anatomy of a Cancer Treatment Scam

Medline Plus

Full Text Available ... Speeches Audio/Video Featured Videos FTC Events For Consumers For Business En Español Social Media FTC Social Media Chats Tweeting FTC Events Blogs ... Actions Media Resources Events Calendar Speeches Audio/Video Social ... Sentinel Network Criminal Liaison Unit Policy Advocacy Advisory ...
Perceived Audio Quality Analysis in Digital Audio Broadcasting Plus System Based on PEAQ

Directory of Open Access Journals (Sweden)

K. Ulovec

2018-04-01

Full Text Available Broadcasters need to decide on bitrates of the services in the multiplex transmitted via Digital Audio Broadcasting Plus system. The bitrate should be set as low as possible for maximal number of services, but with high quality, not lower than in conventional analog systems. In this paper, the objective method Perceptual Evaluation of Audio Quality is used to analyze the perceived audio quality for appropriate codecs --- MP2 and AAC offering three profiles. The main aim is to determine dependencies on the type of signal --- music and speech, the number of channels --- stereo and mono, and the bitrate. Results indicate that only MP2 codec and AAC Low Complexity profile reach imperceptible quality loss. The MP2 codec needs higher bitrate than AAC Low Complexity profile for the same quality. For the both versions of AAC High-Efficiency profiles, the limit bitrates are determined above which less complex profiles outperform the more complex ones and higher bitrates above these limits are not worth using. It is shown that stereo music has worse quality than stereo speech generally, whereas for mono, the dependencies vary upon the codec/profile. Furthermore, numbers of services satisfying various quality criteria are presented.
The Fungible Audio-Visual Mapping and its Experience

Directory of Open Access Journals (Sweden)

Adriana Sa

2014-12-01

Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole.
Forensic analysis of video steganography tools

Directory of Open Access Journals (Sweden)

Thomas Sloan

2015-05-01

Full Text Available Steganography is the art and science of concealing information in such a way that only the sender and intended recipient of a message should be aware of its presence. Digital steganography has been used in the past on a variety of media including executable files, audio, text, games and, notably, images. Additionally, there is increasing research interest towards the use of video as a media for steganography, due to its pervasive nature and diverse embedding capabilities. In this work, we examine the embedding algorithms and other security characteristics of several video steganography tools. We show how all feature basic and severe security weaknesses. This is potentially a very serious threat to the security, privacy and anonymity of their users. It is important to highlight that most steganography users have perfectly legal and ethical reasons to employ it. Some common scenarios would include citizens in oppressive regimes whose freedom of speech is compromised, people trying to avoid massive surveillance or censorship, political activists, whistle blowers, journalists, etc. As a result of our findings, we strongly recommend ceasing any use of these tools, and to remove any contents that may have been hidden, and any carriers stored, exchanged and/or uploaded online. For many of these tools, carrier files will be trivial to detect, potentially compromising any hidden data and the parties involved in the communication. We finish this work by presenting our steganalytic results, that highlight a very poor current state of the art in practical video steganography tools. There is unfortunately a complete lack of secure and publicly available tools, and even commercial tools offer very poor security. We therefore encourage the steganography community to work towards the development of more secure and accessible video steganography tools, and make them available for the general public. The results presented in this work can also be seen as a useful
Effectiveness of a Video-Versus Text-Based Computer-Tailored Intervention for Obesity Prevention after One Year: A Randomized Controlled Trial

Directory of Open Access Journals (Sweden)

Kei Long Cheung

2017-10-01

Full Text Available Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1 the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI, and (2 the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM difference = −205.40, p = 0.00 and 12 months (LSM difference = −128.14, p = 0.03. Only video intervention resulted in lower average daily energy intake after one year (d = 0.12. Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring.
El Digital Audio Tape Recorder. Contra autores y creadores

Directory of Open Access Journals (Sweden)

Jun Ono

2015-01-01

Full Text Available La llamada "DAT" (abreviatura por "digital audio tape recorder" / grabadora digital de audio ha recibido cobertura durante mucho tiempo en los medios masivos de Japón y otros países, como un producto acústico electrónico nuevo y controversial de la industria japonesa de artefactos electrónicos. ¿Qué ha pasado con el objeto de esta controversia?
Digital signal processor for silicon audio playback devices; Silicon audio saisei kikiyo digital signal processor

Energy Technology Data Exchange (ETDEWEB)

NONE

2000-03-01

The digital audio signal processor (DSP) TC9446F series has been developed silicon audio playback devices with a memory medium of, e.g., flash memory, DVD players, and AV devices, e.g., TV sets. It corresponds to AAC (advanced audio coding) (2ch) and MP3 (MPEG1 Layer3), as the audio compressing techniques being used for transmitting music through an internet. It also corresponds to compressed types, e.g., Dolby Digital, DTS (digital theater system) and MPEG2 audio, being adopted for, e.g., DVDs. It can carry a built-in audio signal processing program, e.g., Dolby ProLogic, equalizer, sound field controlling, and 3D sound. TC9446XB has been lined up anew. It adopts an FBGA (fine pitch ball grid array) package for portable audio devices. (translated by NEDO)
Network Degradation Effects on Different Codec Types and Characteristics of Video Streaming

Directory of Open Access Journals (Sweden)

Jaroslav Frnda

2014-01-01

Full Text Available Nowadays, there is a quickly growing demand for the transmission of voice, video and data over an IP based network. Multimedia, whether we are talking about broadcast, audio and video transmission and others, from a global perspective is growing exponentially with time. With incoming requests from users, new technologies for data transfer are continually developing. Data must be delivered reliably and with the fewest losses at such high speed. Video quality as part of multimedia technology has a very important role nowadays. It is influenced by several factors, where each of them can have many forms and processing. Network performance is the major degradation effect that influences the quality of resulting image. Poor network performance (lack of link capacity, high network loadâ€¦ causes data packet losses or different delivery time for each packet. This work focuses exactly on these network phenomena. It examines the impact of different delays and packet losses on the quality parameters of triple play services, to evaluate the results using objective methods. The aim of this work is to bring a detailed view on the performance of video streaming over IP-based networks.
Perbandingan Algoritma Enkripsi GGHN(8,8 dan RC4 Untuk Video Conference

Directory of Open Access Journals (Sweden)

Satriyo Satriyo

2012-01-01

Full Text Available Encryption is a solutions to secure the information transmitted through the medium of communication that can not be intercepted by unauthorized parties. This study implemented two algorithms i.e. RC4 and GGHN (8.8 in video conference. This study compared the speed of both encryption and decryption of the encryption algorithm, and perform measurements of bandwidth . From the results of measurements of the speed can be concluded that the RC4 algorithm is 1.53 times and 1.34 times faster than GGHN(8,8 for encryption and decryption audio data. The encryption and decryption video data RC4 are 1.44 times and 1.09 times faster than GHN(8,8. Bandwidth usage for video conferencing without encryption and bandwidth usage on a video conference with RC4 encryption and GGHN (8.8 is the same.Keywords: Video Conference; RC4; GGHN(8,8; Real Time Protocol.
Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

Directory of Open Access Journals (Sweden)

Pachet François

2009-01-01

Full Text Available We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs, a construct designed to support the representation of knowledge about audio signal processing. We describe the most important aspects of AFs, in particular their dimensional type system, on which are based pattern-based random generators, heuristics, and rewriting rules. We show how AFs generalize or improve previous approaches used in feature generation. We report on several projects using AFs for difficult audio classification tasks, demonstrating their advantage over standard audio features. More generally, we propose analytical features as a paradigm to bring raw signals into the world of symbolic computation.
A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos.

Science.gov (United States)

Zhao, Baoquan; Xu, Songhua; Lin, Shujin; Luo, Xiaonan; Duan, Lian

2016-04-01

Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today's keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users' information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly. The authors collected and processed around 25 000 YouTube videos, which collectively last for a total length of about 4000 h, in the broad field of biomedical sciences for our experiment. For each video, its semantic clues are first extracted automatically through computationally analyzing audio and visual signals, as well as text either accompanying or embedded in the video. These extracted clues are subsequently stored in a metadata database and indexed by a high-performance text search engine. During the online retrieval stage, the system renders video search results as dynamic web pages using a JavaScript library that allows users to interactively and intuitively explore video content both efficiently and effectively.ResultsThe authors produced a prototype implementation of the proposed system, which is publicly accessible athttps://patentq.njit.edu/oer To examine the overall advantage of the proposed system for exploring biomedical OER videos, the authors further conducted a user study of a modest scale. The study results encouragingly demonstrate the functional effectiveness and user-friendliness of the new system for facilitating information seeking from and content exploration among massive biomedical OER videos. Using the proposed tool, users can efficiently and effectively find videos of interest, precisely locate video segments delivering personally valuable
Medical Student and Tutor Perceptions of Video Versus Text in an Interactive Online Virtual Patient for Problem-Based Learning: A Pilot Study

Science.gov (United States)

Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil

2015-01-01

Background The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. Objective A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. Methods An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George’s, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Results Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students’ ability to review and critically appraise the presented information. Conclusions Our findings suggest that text was perceived to be a
Medical Student and Tutor Perceptions of Video Versus Text in an Interactive Online Virtual Patient for Problem-Based Learning: A Pilot Study.

Science.gov (United States)

Woodham, Luke A; Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil

2015-06-18

The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George's, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students' ability to review and critically appraise the presented information. Our findings suggest that text was perceived to be a better source of information than video in virtual
Video Tutorial of Continental Food

Science.gov (United States)

Nurani, A. S.; Juwaedah, A.; Mahmudatussa'adah, A.

2018-02-01

This research is motivated by the belief in the importance of media in a learning process. Media as an intermediary serves to focus on the attention of learners. Selection of appropriate learning media is very influential on the success of the delivery of information itself both in terms of cognitive, affective and skills. Continental food is a course that studies food that comes from Europe and is very complex. To reduce verbalism and provide more real learning, then the tutorial media is needed. Media tutorials that are audio visual can provide a more concrete learning experience. The purpose of this research is to develop tutorial media in the form of video. The method used is the development method with the stages of analyzing the learning objectives, creating a story board, validating the story board, revising the story board and making video tutorial media. The results show that the making of storyboards should be very thorough, and detailed in accordance with the learning objectives to reduce errors in video capture so as to save time, cost and effort. In video capturing, lighting, shooting angles, and soundproofing make an excellent contribution to the quality of tutorial video produced. In shooting should focus more on tools, materials, and processing. Video tutorials should be interactive and two-way.
Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

Directory of Open Access Journals (Sweden)

Saadia Zahid

2015-01-01

Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

High-Fidelity Piezoelectric Audio Device

Science.gov (United States)

Woodward, Stanley E.; Fox, Robert L.; Bryant, Robert G.

2003-01-01

ModalMax is a very innovative means of harnessing the vibration of a piezoelectric actuator to produce an energy efficient low-profile device with high-bandwidth high-fidelity audio response. The piezoelectric audio device outperforms many commercially available speakers made using speaker cones. The piezoelectric device weighs substantially less (4 g) than the speaker cones which use magnets (10 g). ModalMax devices have extreme fabrication simplicity. The entire audio device is fabricated by lamination. The simplicity of the design lends itself to lower cost. The piezoelectric audio device can be used without its acoustic chambers and thereby resulting in a very low thickness of 0.023 in. (0.58 mm). The piezoelectric audio device can be completely encapsulated, which makes it very attractive for use in wet environments. Encapsulation does not significantly alter the audio response. Its small size (see Figure 1) is applicable to many consumer electronic products, such as pagers, portable radios, headphones, laptop computers, computer monitors, toys, and electronic games. The audio device can also be used in automobile or aircraft sound systems.
Effectiveness of a Video-Versus Text-Based Computer-Tailored Intervention for Obesity Prevention after One Year: A Randomized Controlled Trial

Science.gov (United States)

Cheung, Kei Long; Schwabe, Inga; Walthouwer, Michel J. L.; Oenema, Anke; de Vries, Hein

2017-01-01

Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1) the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI), and (2) the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM) difference = −205.40, p = 0.00) and 12 months (LSM difference = −128.14, p = 0.03). Only video intervention resulted in lower average daily energy intake after one year (d = 0.12). Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring. PMID:29065545
Modality of Input and Vocabulary Acquisition

Directory of Open Access Journals (Sweden)

Tetyana Sydorenko

2010-06-01

Full Text Available This study examines the effect of input modality (video, audio, and captions, i.e., on-screen text in the same language as audio on (a the learning of written and aural word forms, (b overall vocabulary gains, (c attention to input, and (d vocabulary learning strategies of beginning L2 learners. Twenty-six second-semester learners of Russian participated in this study. Group one (N = 8 saw video with audio and captions (VAC; group two (N = 9 saw video with audio (VA; group three (N = 9 saw video with captions (VC. All participants completed written and aural vocabulary tests and a final questionnaire.The results indicate that groups with captions (VAC and VC scored higher on written than on aural recognition of word forms, while the reverse applied to the VA group. The VAC group learned more word meanings than the VA group. Results from the questionnaire suggest that learners paid most attention to captions, followed by video and audio, and acquired most words by associating them with visual images. Pedagogical implications of this study are that captioned video tends to aid recognition of written word forms and the learning of word meaning, while non-captioned video tends to improve listening comprehension as it facilitates recognition of aural word forms.
A Joint Audio-Visual Approach to Audio Localization

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2015-01-01

Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes), a...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....
Modeling Audio Fingerprints : Structure, Distortion, Capacity

NARCIS (Netherlands)

Doets, P.J.O.

2010-01-01

An audio fingerprint is a compact low-level representation of a multimedia signal. An audio fingerprint can be used to identify audio files or fragments in a reliable way. The use of audio fingerprints for identification consists of two phases. In the enrollment phase known content is fingerprinted,
All About Audio Equalization: Solutions and Frontiers

Directory of Open Access Journals (Sweden)

Vesa Välimäki

2016-05-01

Full Text Available Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in audio equalization, their relative merits and applications. Digital signal processing techniques for modifying the spectral balance in audio signals and applications of these techniques are reviewed, ranging from classic equalizers to emerging designs based on new advances in signal processing and machine learning. Emphasis is placed on putting the range of approaches within a common mathematical and conceptual framework. The application areas discussed herein are diverse, and include well-defined, solvable problems of filter design subject to constraints, as well as newly emerging challenges that touch on problems in semantics, perception and human computer interaction. Case studies are given in order to illustrate key concepts and how they are applied in practice. We also recommend preferred signal processing approaches for important audio equalization problems. Finally, we discuss current challenges and the uncharted frontiers in this field. The source code for methods discussed in this paper is made available at https://code.soundsoftware.ac.uk/projects/allaboutaudioeq.
Introduction to audio analysis a MATLAB approach

CERN Document Server

Giannakopoulos, Theodoros

2014-01-01

Introduction to Audio Analysis serves as a standalone introduction to audio analysis, providing theoretical background to many state-of-the-art techniques. It covers the essential theory necessary to develop audio engineering applications, but also uses programming techniques, notably MATLAB®, to take a more applied approach to the topic. Basic theory and reproducible experiments are combined to demonstrate theoretical concepts from a practical point of view and provide a solid foundation in the field of audio analysis. Audio feature extraction, audio classification, audio segmentation, au
Audiovisual focus of attention and its application to Ultra High Definition video compression

Science.gov (United States)

Rerabek, Martin; Nemoto, Hiromi; Lee, Jong-Seok; Ebrahimi, Touradj

2014-02-01

Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.
Advances in audio source seperation and multisource audio content retrieval

Science.gov (United States)

Vincent, Emmanuel

2012-06-01

Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. We present a Flexible Audio Source Separation Toolkit (FASST) and discuss its advantages compared to earlier approaches such as independent component analysis (ICA) and sparse component analysis (SCA). We explain how cues as diverse as harmonicity, spectral envelope, temporal fine structure or spatial location can be jointly exploited by this toolkit. We subsequently present the uncertainty decoding (UD) framework for the integration of audio source separation and audio content retrieval. We show how the uncertainty about the separated source signals can be accurately estimated and propagated to the features. Finally, we explain how this uncertainty can be efficiently exploited by a classifier, both at the training and the decoding stage. We illustrate the resulting performance improvements in terms of speech separation quality and speaker recognition accuracy.
A combined video and synchronous VSAT data network

Science.gov (United States)

Rowse, William

Private Satellite Network currently operates Business Television networks for Fortune 500 companies. Several of these satellite-based networks, using VSAT technology, are combining the transmission of video with the broadcast of one-way data. This is made possible by use of the PSN Business Television Terminal which incorporates Scientific Atlanta's B-MAC system. In addition to providing high quality video, B-MAC can provide six channels of 204.5 kbs audio. Four of the six channels may be used to directly carry up to 19.2 kbs of asynchronous data or up to 56 kbs of synchronous data using circuitry jointly developed by PSN and Scientific Atlanta. The approach PSN has taken to provide one network customer in the financial industry with both video and broadcast data is described herein.
The Safeguard of Audio Collections: A Computer Science Based Approach to Quality Control—The Case of the Sound Archive of the Arena di Verona

Directory of Open Access Journals (Sweden)

Federica Bressan

2013-01-01

Full Text Available In the field of multimedia, very little attention is given to the activities involved in the preservation of audio documents. At the same time, more and more archives storing audio and video documents face the problem of obsolescing and degrading media, which could largely benefit from the instruments and the methodologies of research in multimedia. This paper presents the methodology and the results of the Italian project REVIVAL, aimed at the development of a hardware/software platform to support the active preservation of the audio collection of the Fondazione Arena di Verona, one of the finest in Europe for the operatic genre, with a special attention on protocols and tools for quality control. On the scientific side, the most significant objectives achieved by the project are (i the setup of a working environment inside the archive, (ii the knowledge transfer to the archival personnel, (iii the realization of chemical analyses on magnetic tapes in collaboration with experts in the fields of materials science and chemistry, and (iv the development of original open-source software tools. On the cultural side, the recovery, the safeguard, and the access to unique copies of unpublished live recordings of artists the calibre of Domingo and Pavarotti are of great musicological and economical value.
Classification of dual language audio-visual content: Introduction to the VideoCLEF 2008 pilot benchmark evaluation task

NARCIS (Netherlands)

Larson, M.; Newman, E.; Jones, G.J.F.; Köhler, J.; Larson, M.; de Jong, F.M.G.; Kraaij, W.; Ordelman, R.J.F.

2008-01-01

VideoCLEF is a new track for the CLEF 2008 campaign. This track aims to develop and evaluate tasks in analyzing multilingual video content. A pilot of a Vid2RSS task involving assigning thematic class labels to video kicks off the VideoCLEF track in 2008. Task participants deliver classification
Improving Students� Ability in Writing Hortatory Exposition Texts by Using Process-Genre Based Approach with YouTube Videos as the Media

Directory of Open Access Journals (Sweden)

fifin naili rizkiyah

2017-06-01

Full Text Available Abstract: This research is aimed at finding out how Process-Genre Based Approach strategy with YouTube Videos as the media are employed to improve the students� ability in writing hortatory exposition texts. This study uses collaborative classroom action research design following the procedures namely planning, implementing, observing, and reflecting. The procedures of carrying out the strategy are: (1 relating several issues/ cases to the students� background knowledge and introducing the generic structures and linguistic features of hortatory exposition text as the BKoF stage, (2 analyzing the generic structure and the language features used in the text and getting model on how to write a hortatory exposition text by using the YouTube Video as the MoT stage, (3 writing a hortatory exposition text collaboratively in a small group and in pairs through process writing as the JCoT stage, and (4 writing a hortatory exposition text individually as the ICoT stage. The result shows that the use of Process-Genre Based Approach and YouTube Videos can improve the students� ability in writing hortatory exposition texts. The percentage of the students achieving the score above the minimum passing grade (70 had improved from only 15.8% (3 out of 19 students in the preliminary study to 100% (22 students in the Cycle 1. Besides, the score of each aspect; content, organization, vocabulary, grammar, and mechanics also improved. � Key Words: writing ability, hortatory exposition text, process-genre based approach, youtube video
Audible Aliasing Distortion in Digital Audio Synthesis

Directory of Open Access Journals (Sweden)

J. Schimmel

2012-04-01

Full Text Available This paper deals with aliasing distortion in digital audio signal synthesis of classic periodic waveforms with infinite Fourier series, for electronic musical instruments. When these waveforms are generated in the digital domain then the aliasing appears due to its unlimited bandwidth. There are several techniques for the synthesis of these signals that have been designed to avoid or reduce the aliasing distortion. However, these techniques have high computing demands. One can say that today's computers have enough computing power to use these methods. However, we have to realize that today’s computer-aided music production requires tens of multi-timbre voices generated simultaneously by software synthesizers and the most of the computing power must be reserved for hard-disc recording subsystem and real-time audio processing of many audio channels with a lot of audio effects. Trivially generated classic analog synthesizer waveforms are therefore still effective for sound synthesis. We cannot avoid the aliasing distortion but spectral components produced by the aliasing can be masked with harmonic components and thus made inaudible if sufficient oversampling ratio is used. This paper deals with the assessment of audible aliasing distortion with the help of a psychoacoustic model of simultaneous masking and compares the computing demands of trivial generation using oversampling with those of other methods.
Could Audio-Described Films Benefit from Audio Introductions? An Audience Response Study

Science.gov (United States)

Romero-Fresco, Pablo; Fryer, Louise

2013-01-01

Introduction: Time constraints limit the quantity and type of information conveyed in audio description (AD) for films, in particular the cinematic aspects. Inspired by introductory notes for theatre AD, this study developed audio introductions (AIs) for "Slumdog Millionaire" and "Man on Wire." Each AI comprised 10 minutes of…
Location audio simplified capturing your audio and your audience

CERN Document Server

Miles, Dean

2014-01-01

From the basics of using camera, handheld, lavalier, and shotgun microphones to camera calibration and mixer set-ups, Location Audio Simplified unlocks the secrets to clean and clear broadcast quality audio no matter what challenges you face. Author Dean Miles applies his twenty-plus years of experience as a professional location operator to teach the skills, techniques, tips, and secrets needed to produce high-quality production sound on location. Humorous and thoroughly practical, the book covers a wide array of topics, such as:* location selection* field mixing* boo
Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

Directory of Open Access Journals (Sweden)

Warrick eRoseboom

2013-04-01

Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity

Directory of Open Access Journals (Sweden)

Maoshen Jia

2017-12-01

Full Text Available Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT domain than in the Short Time Fourier Transform (STFT domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA approach and Spatial Audio Object Coding (SAOC in cases where eight objects were jointly encoded.
Turkish Music Genre Classification using Audio and Lyrics Features

Directory of Open Access Journals (Sweden)

Önder ÇOBAN

2017-05-01

Full Text Available Music Information Retrieval (MIR has become a popular research area in recent years. In this context, researchers have developed music information systems to find solutions for such major problems as automatic playlist creation, hit song detection, and music genre or mood classification. Meta-data information, lyrics, or melodic content of music are used as feature resource in previous works. However, lyrics do not often used in MIR systems and the number of works in this field is not enough especially for Turkish. In this paper, firstly, we have extended our previously created Turkish MIR (TMIR dataset, which comprises of Turkish lyrics, by including the audio file of each song. Secondly, we have investigated the effect of using audio and textual features together or separately on automatic Music Genre Classification (MGC. We have extracted textual features from lyrics using different feature extraction models such as word2vec and traditional Bag of Words. We have conducted our experiments on Support Vector Machine (SVM algorithm and analysed the impact of feature selection and different feature groups on MGC. We have considered lyrics based MGC as a text classification task and also investigated the effect of term weighting method. Experimental results show that textual features can also be effective as well as audio features for Turkish MGC, especially when a supervised term weighting method is employed. We have achieved the highest success rate as 99,12\\% by using both audio and textual features together.
Design And Construction Of 300W Audio Power Amplifier For Classroom

Directory of Open Access Journals (Sweden)

Shune Lei Aung

2015-07-01

Full Text Available Abstract This paper describes the design and construction of 300W audio power amplifier for classroom. In the construction of this amplifier microphone preamplifier tone preamplifier equalizer line amplifier output power amplifier and sound level indicator are included. The output power amplifier is designed as O.C.L system and constructed by using Class B among many types of amplifier classes. There are two types in O.C.L system quasi system and complementary system. Between them the complementary system is used in the construction of 300W audio power amplifier. The Multisim software is utilized for the construction of audio power amplifier.

Créer des ressources audio pour le cours de FLE

Directory of Open Access Journals (Sweden)

Florence Gérard Lojacono

2010-01-01

Full Text Available These last ten years, web applicationshave gained ascendency over the consumersociety as shown by the success of iTunesand the increase of podcasting. The academicworld, particularly in the field oflanguage teaching, could take advantage ofthis massive use of audio files. The creationand the diffusion of customized ad hocaudio files and the broadcast of these resourcesthrough educational podcasts addressthe upcoming challenges of a knowledgebased society. Teaching and learningwith audio files also meet the recommendationsof the European Higher EducationArea (EHEA. This paper will provide languageteachers, especially French teachers,with the tools to create, edit, upload andplay their own audio files. No specific computerskills are required.
Structure Learning in Audio

DEFF Research Database (Denmark)

Nielsen, Andreas Brinch

By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach...... investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected. A more general model...
Local Control of Audio Environment: A Review of Methods and Applications

Directory of Open Access Journals (Sweden)

Jussi Kuutti

2014-02-01

Full Text Available The concept of a local audio environment is to have sound playback locally restricted such that, ideally, adjacent regions of an indoor or outdoor space could exhibit their own individual audio content without interfering with each other. This would enable people to listen to their content of choice without disturbing others next to them, yet, without any headphones to block conversation. In practice, perfect sound containment in free air cannot be attained, but a local audio environment can still be satisfactorily approximated using directional speakers. Directional speakers may be based on regular audible frequencies or they may employ modulated ultrasound. Planar, parabolic, and array form factors are commonly used. The directivity of a speaker improves as its surface area and sound frequency increases, making these the main design factors for directional audio systems. Even directional speakers radiate some sound outside the main beam, and sound can also reflect from objects. Therefore, directional speaker systems perform best when there is enough ambient noise to mask the leaking sound. Possible areas of application for local audio include information and advertisement audio feed in commercial facilities, guiding and narration in museums and exhibitions, office space personalization, control room messaging, rehabilitation environments, and entertainment audio systems.
Music Identification System Using MPEG-7 Audio Signature Descriptors

Directory of Open Access Journals (Sweden)

Shingchern D. You

2013-01-01

Full Text Available This paper describes a multiresolution system based on MPEG-7 audio signature descriptors for music identification. Such an identification system may be used to detect illegally copied music circulated over the Internet. In the proposed system, low-resolution descriptors are used to search likely candidates, and then full-resolution descriptors are used to identify the unknown (query audio. With this arrangement, the proposed system achieves both high speed and high accuracy. To deal with the problem that a piece of query audio may not be inside the system’s database, we suggest two different methods to find the decision threshold. Simulation results show that the proposed method II can achieve an accuracy of 99.4% for query inputs both inside and outside the database. Overall, it is highly possible to use the proposed system for copyright control.
Technical Evaluation Report 31: Internet Audio Products (3/ 3

Directory of Open Access Journals (Sweden)

Jim Rudolph

2004-08-01

Full Text Available Two contrasting additions to the online audio market are reviewed: iVocalize, a browser-based audio-conferencing software, and Skype, a PC-to-PC Internet telephone tool. These products are selected for review on the basis of their success in gaining rapid popular attention and usage during 2003-04. The iVocalize review emphasizes the product’s role in the development of a series of successful online audio communities – notably several serving visually impaired users. The Skype review stresses the ease with which the product may be used for simultaneous PC-to-PC communication among up to five users. Editor’s Note: This paper serves as an introduction to reports about online community building, and reviews of online products for disabled persons, in the next ten reports in this series. JPB, Series Ed.
A video for teaching english tenses

Directory of Open Access Journals (Sweden)

Frida Unsiah

2017-04-01

Students of English Language Education Program in Faculty of Cultural Studies Universitas Brawijaya ideally master Grammar before taking the degree of Sarjana Pendidikan. However, the fact shows that they are still weak in Grammar especially tenses. Therefore, the researchers initiate to develop a video as a media to teach tenses. Objectively, by using video, students get better understanding on tenses so that they can communicate using English accurately and contextually. To develop the video, the researchers used ADDIE model (Analysis, Design, Development, Implementation, Evaluation. First, the researchers analyzed the students’ learning need to determine the product that would be developed, in this case was a movie about English tenses. Then, the researchers developed a video as the product. The product then was validated by media expert who validated attractiveness, typography, audio, image, and usefulness and content expert and validated by a content expert who validated the language aspects and tenses of English used by the actors in the video dealing with the grammar content, pronunciation, and fluency performed by the actors. The result of validation shows that the video developed was considered good. Theoretically, it is appropriate to be used English Grammar classes. However, the media expert suggests that it still needs some improvement for the next development especially dealing with the synchronization between lips movement and sound on the scenes while the content expert suggests that the Grammar content of the video should focus on one tense only to provide more detailed concept of the tense.
A centralized audio presentation manager

Energy Technology Data Exchange (ETDEWEB)

Papp, A.L. III; Blattner, M.M.

1994-05-16

The centralized audio presentation manager addresses the problems which occur when multiple programs running simultaneously attempt to use the audio output of a computer system. Time dependence of sound means that certain auditory messages must be scheduled simultaneously, which can lead to perceptual problems due to psychoacoustic phenomena. Furthermore, the combination of speech and nonspeech audio is examined; each presents its own problems of perceptibility in an acoustic environment composed of multiple auditory streams. The centralized audio presentation manager receives abstract parameterized message requests from the currently running programs, and attempts to create and present a sonic representation in the most perceptible manner through the use of a theoretically and empirically designed rule set.
Instrumental Landing Using Audio Indication

Science.gov (United States)

Burlak, E. A.; Nabatchikov, A. M.; Korsun, O. N.

2018-02-01

The paper proposes an audio indication method for presenting to a pilot the information regarding the relative positions of an aircraft in the tasks of precision piloting. The implementation of the method is presented, the use of such parameters of audio signal as loudness, frequency and modulation are discussed. To confirm the operability of the audio indication channel the experiments using modern aircraft simulation facility were carried out. The simulated performed the instrument landing using the proposed audio method to indicate the aircraft deviations in relation to the slide path. The results proved compatible with the simulated instrumental landings using the traditional glidescope pointers. It inspires to develop the method in order to solve other precision piloting tasks.
Bit rates in audio source coding

NARCIS (Netherlands)

Veldhuis, Raymond N.J.

1992-01-01

The goal is to introduce and solve the audio coding optimization problem. Psychoacoustic results such as masking and excitation pattern models are combined with results from rate distortion theory to formulate the audio coding optimization problem. The solution of the audio optimization problem is a
Descriptive analysis of YouTube music therapy videos.

Science.gov (United States)

Gooding, Lori F; Gregory, Dianne

2011-01-01

The purpose of this study was to conduct a descriptive analysis of music therapy-related videos on YouTube. Preliminary searches using the keywords music therapy, music therapy session, and "music therapy session" resulted in listings of 5000, 767, and 59 videos respectively. The narrowed down listing of 59 videos was divided between two investigators and reviewed in order to determine their relationship to actual music therapy practice. A total of 32 videos were determined to be depictions of music therapy sessions. These videos were analyzed using a 16-item investigator-created rubric that examined both video specific information and therapy specific information. Results of the analysis indicated that audio and visual quality was adequate, while narrative descriptions and identification information were ineffective in the majority of the videos. The top 5 videos (based on the highest number of viewings in the sample) were selected for further analysis in order to investigate demonstration of the Professional Level of Practice Competencies set forth in the American Music Therapy Association (AMTA) Professional Competencies (AMTA, 2008). Four of the five videos met basic competency criteria, with the quality of the fifth video precluding evaluation of content. Of particular interest is the fact that none of the videos included credentialing information. Results of this study suggest the need to consider ways to ensure accurate dissemination of music therapy-related information in the YouTube environment, ethical standards when posting music therapy session videos, and the possibility of creating AMTA standards for posting music therapy related video.
Selective attention modulates the direction of audio-visual temporal recalibration.

Directory of Open Access Journals (Sweden)

Nara Ikumi

Full Text Available Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging, was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Producing a College Video: The Sweat (and Success) Is in the Details.

Science.gov (United States)

Hays, Tim

1994-01-01

Introduces specifics related to production elements and message elements of college videos. Outlines aspects of lighting, audio, narration, backing music, and performance music. Discusses elements of pace, physical plant, people, and programs with regard to marketing. Suggests the goal is to create a unified vision to attract the target audience.…
Affective video retrieval: violence detection in Hollywood movies by large-scale segmental feature extraction.

Science.gov (United States)

Eyben, Florian; Weninger, Felix; Lehment, Nicolas; Schuller, Björn; Rigoll, Gerhard

2013-01-01

Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.
Training of audio descriptors: the cinematographic aesthetics as basis for the learning of the audio description aesthetics – materials, methods and products

Directory of Open Access Journals (Sweden)

Soraya Ferreira Alves

2016-12-01

Full Text Available Audio description (AD, a resource used to make theater, cinema, TV, and visual works of art accessible to people with visual impairments, is slowly being implemented in Brazil and demanding qualified professionals. Based on this statement, this article reports the results of a research developed during post-doctoral studies. The study is dedicated to the confrontation of film aesthetics with audio description techniques to check how the knowledge of the former can contribute to audiodescritor training. Through action research, a short film adapted from a Mario de Andrade’s, a Brazilian writer, short story called O Peru de Natal (Christmas Turkey was produced. The film as well as its audio description were carried out involving students and teachers from the discipline Intersemiotic Translation at the State University of Ceará. Thus, we intended to suggest pedagogical procedures generated by the students experiences by evaluating their choices and their implications.
Comparative evaluation of audio and audio - tactile methods to improve oral hygiene status of visually impaired school children

OpenAIRE

R Krishnakumar; Swarna Swathi Silla; Sugumaran K Durai; Mohan Govindarajan; Syed Shaheed Ahamed; Logeshwari Mathivanan

2016-01-01

Background: Visually impaired children are unable to maintain good oral hygiene, as their tactile abilities are often underdeveloped owing to their visual disturbances. Conventional brushing techniques are often poorly comprehended by these children and hence, it was decided to evaluate the effectiveness of audio and audio-tactile methods in improving the oral hygiene of these children. Objective: To evaluate and compare the effectiveness of audio and audio-tactile methods in improving oral h...
Audio Frequency Analysis in Mobile Phones

Science.gov (United States)

Aguilar, Horacio Munguía

2016-01-01

A new experiment using mobile phones is proposed in which its audio frequency response is analyzed using the audio port for inputting external signal and getting a measurable output. This experiment shows how the limited audio bandwidth used in mobile telephony is the main cause of the poor speech quality in this service. A brief discussion is…
Comparing a Video and Text Version of a Web-Based Computer-Tailored Intervention for Obesity Prevention: A Randomized Controlled Trial.

Science.gov (United States)

Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein

2015-10-19

Web-based computer-tailored interventions often suffer from small effect sizes and high drop-out rates, particularly among people with a low level of education. Using videos as a delivery format can possibly improve the effects and attractiveness of these interventions The main aim of this study was to examine the effects of a video and text version of a Web-based computer-tailored obesity prevention intervention on dietary intake, physical activity, and body mass index (BMI) among Dutch adults. A second study aim was to examine differences in appreciation between the video and text version. The final study aim was to examine possible differences in intervention effects and appreciation per educational level. A three-armed randomized controlled trial was conducted with a baseline and 6 months follow-up measurement. The intervention consisted of six sessions, lasting about 15 minutes each. In the video version, the core tailored information was provided by means of videos. In the text version, the same tailored information was provided in text format. Outcome variables were self-reported and included BMI, physical activity, energy intake, and appreciation of the intervention. Multiple imputation was used to replace missing values. The effect analyses were carried out with multiple linear regression analyses and adjusted for confounders. The process evaluation data were analyzed with independent samples t tests. The baseline questionnaire was completed by 1419 participants and the 6 months follow-up measurement by 1015 participants (71.53%). No significant interaction effects of educational level were found on any of the outcome variables. Compared to the control condition, the video version resulted in lower BMI (B=-0.25, P=.049) and lower average daily energy intake from energy-dense food products (B=-175.58, PWeb-based computer-tailored obesity prevention intervention was the most effective intervention and most appreciated. Future research needs to examine if the
AKTIVITAS SEKUNDER AUDIO UNTUK MENJAGA KEWASPADAAN PENGEMUDI MOBIL INDONESIA

Directory of Open Access Journals (Sweden)

Iftikar Zahedi Sutalaksana

2013-03-01

Full Text Available Tingkat kecelakaan lalu lintas yang melibatkan mobil di Indonesia semakin mengkhawatirkan. Tingginya peran faktor manusia sebagai penyebab utama kejadian kecelakaan patut diperhatikan. Penurunan kewaspadaan saat mengemudi akibat kantuk atau kelelahan merupakan salah satu kondisi yang mendorong terjadinya kecelakaan. Tulisan ini memaparkan aplikasi audio response test sebagai aktivitas sekunder dalam mengemudikan mobil. Response test yang dimaksud merupakan seperangkat aplikasi pada dashboard mobil yang menuntut respon pengemudi setiap stimulus suara bekerja. Audio response test ini diusulkan sebagai pemantau tingkat kewaspadaan pengemudi selama berkendara. Kewaspadaan pengemudi merupakan kondisi selama berkendara yang terjaga, awas, dan mampu memproses semua stimulus dengan baik. Hasil studi ini menghasilkan suatu bentuk audio response test yang terintegrasi dengan sistem berkendara di dalam mobil. Sumber bunyi diperdengarkan dengan intensitas konstan antara 80-85 dB. Bunyi akan berhenti jika pengemudi memberikan respon atas stimulus suara tersebut. Response test ini dirancang untuk mampu memantau tingkat kewaspadaan pengemudi selama berkendara. Penerapannya diharapkan mampu membantu menekan tingkat kecelakaan lalu lintas di Indonesia. Kata kunci: mengemudi, aktivitas sekunder, audio, kewaspadaan, response test Abstract The level of traffic accidents involving cars in Indonesia increasingly alarming. The high role of the human factor as the main cause of accident noteworthy. Decreased alertness while driving due to sleepiness or fatigue is one of the conditions that led to the accident. This paper describes an audio application response test as a secondary activity of driving a car. Response test is a set of applications on the dashboard of a car that demands a response driver each stimulus voice work. Audio response was proposed as test monitors the driver's level of alertness while driving. Vigilance driver was driving conditions during
Presence and the utility of audio spatialization

DEFF Research Database (Denmark)

Bormann, Karsten

2005-01-01

The primary concern of this paper is whether the utility of audio spatialization, as opposed to the fidelity of audio spatialization, impacts presence. An experiment is reported that investigates the presence-performance relationship by decoupling spatial audio fidelity (realism) from task...... performance by varying the spatial fidelity of the audio independently of its relevance to performance on the search task that subjects were to perform. This was achieved by having conditions in which subjects searched for a music-playing radio (an active sound source) and having conditions in which...... supplied only nonattenuated audio was detrimental to performance. Even so, this group of subjects consistently had the largest increase in presence scores over the baseline experiment. Further, the Witmer and Singer (1998) presence questionnaire was more sensitive to whether the audio source was active...
Digital audio watermarking fundamentals, techniques and challenges

CERN Document Server

Xiang, Yong; Yan, Bin

2017-01-01

This book offers comprehensive coverage on the most important aspects of audio watermarking, from classic techniques to the latest advances, from commonly investigated topics to emerging research subdomains, and from the research and development achievements to date, to current limitations, challenges, and future directions. It also addresses key topics such as reversible audio watermarking, audio watermarking with encryption, and imperceptibility control methods. The book sets itself apart from the existing literature in three main ways. Firstly, it not only reviews classical categories of audio watermarking techniques, but also provides detailed descriptions, analysis and experimental results of the latest work in each category. Secondly, it highlights the emerging research topic of reversible audio watermarking, including recent research trends, unique features, and the potentials of this subdomain. Lastly, the joint consideration of audio watermarking and encryption is also reviewed. With the help of this...

Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

Directory of Open Access Journals (Sweden)

Warrick Roseboom

2011-04-01

Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Directory of Open Access Journals (Sweden)

Koji Iwano

2007-03-01

Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.
47 CFR 73.3617 - Information available on the Internet.

Science.gov (United States)

2010-10-01

... include copies of public notices and texts of recent decisions. The Media Bureau's address is http://www.fcc.gov/mb/; the Audio Division's address is http://www.fcc.gov/mmb/audio; the Video Division's address is http://www.fcc.gov/mb/video; the Policy Division's address is http://www.fcc.gov/mb/policy; the...
Effect of Cartoon Illustrations on the Comprehension and Evaluation of Information Presented in the Print and Audio Mode.

Science.gov (United States)

Sewell, Edward H., Jr.

This study investigates the effects of cartoon illustrations on female and male college student comprehension and evaluation of information presented in several combinations of print, audio, and visual formats. Subjects were assigned to one of five treatment groups: printed text, printed text with cartoons, audiovisual presentations, audio only…
Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-12-01

Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
Distortion Estimation in Compressed Music Using Only Audio Fingerprints

NARCIS (Netherlands)

Doets, P.J.O.; Lagendijk, R.L.

2008-01-01

An audio fingerprint is a compact yet very robust representation of the perceptually relevant parts of an audio signal. It can be used for content-based audio identification, even when the audio is severely distorted. Audio compression changes the fingerprint slightly. We show that these small
Detecting double compression of audio signal

Science.gov (United States)

Yang, Rui; Shi, Yun Q.; Huang, Jiwu

2010-01-01

MP3 is the most popular audio format nowadays in our daily life, for example music downloaded from the Internet and file saved in the digital recorder are often in MP3 format. However, low bitrate MP3s are often transcoded to high bitrate since high bitrate ones are of high commercial value. Also audio recording in digital recorder can be doctored easily by pervasive audio editing software. This paper presents two methods for the detection of double MP3 compression. The methods are essential for finding out fake-quality MP3 and audio forensics. The proposed methods use support vector machine classifiers with feature vectors formed by the distributions of the first digits of the quantized MDCT (modified discrete cosine transform) coefficients. Extensive experiments demonstrate the effectiveness of the proposed methods. To the best of our knowledge, this piece of work is the first one to detect double compression of audio signal.
PSQM-based RR and NR video quality metrics

Science.gov (United States)

Lu, Zhongkang; Lin, Weisi; Ong, Eeping; Yang, Xiaokang; Yao, Susu

2003-06-01

This paper presents a new and general concept, PQSM (Perceptual Quality Significance Map), to be used in measuring the visual distortion. It makes use of the selectivity characteristic of HVS (Human Visual System) that it pays more attention to certain area/regions of visual signal due to one or more of the following factors: salient features in image/video, cues from domain knowledge, and association of other media (e.g., speech or audio). PQSM is an array whose elements represent the relative perceptual-quality significance levels for the corresponding area/regions for images or video. Due to its generality, PQSM can be incorporated into any visual distortion metrics: to improve effectiveness or/and efficiency of perceptual metrics; or even to enhance a PSNR-based metric. A three-stage PQSM estimation method is also proposed in this paper, with an implementation of motion, texture, luminance, skin-color and face mapping. Experimental results show the scheme can improve the performance of current image/video distortion metrics.
Mining Contextual Information for Ephemeral Digital Video Preservation

OpenAIRE

Shah, Chirag

2009-01-01

For centuries the archival community has understood and practiced the art of adding contextual information while preserving an artifact. The question now is how these practices can be transferred to the digital domain. With the growing expansion of production and consumption of digital objects (documents, audio, video, etc.) it has become essential to identify and study issues related to their representation. A curator in the digital realm may be said to have the same responsibilities as on...
Elicitation of attributes for the evaluation of audio-on audio-interference

DEFF Research Database (Denmark)

Francombe, Jon; Mason, R.; Dewhirst, M.

2014-01-01

procedure was used to reduce these phrases into a comprehensive set of attributes. Groups of experienced and inexperienced listeners determined nine and eight attributes, respectively. These attribute sets were combined by the listeners to produce a final set of 12 attributes: masking, calming, distraction......An experiment to determine the perceptual attributes of the experience of listening to a target audio program in the presence of an audio interferer was performed. The first stage was a free elicitation task in which a total of 572 phrases were produced. In the second stage, a consensus vocabulary...
Video retrieval by still-image analysis with ImageMiner

Science.gov (United States)

Kreyss, Jutta; Roeper, M.; Alshuth, Peter; Hermes, Thorsten; Herzog, Otthein

1997-01-01

The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval. The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence. In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single images. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.
CERN automatic audio-conference service

CERN Multimedia

Sierra Moral, R

2009-01-01

Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first Euro...
CERN automatic audio-conference service

CERN Document Server

Sierra Moral, R

2010-01-01

Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first Euro...
Debugging of Class-D Audio Power Amplifiers

DEFF Research Database (Denmark)

Crone, Lasse; Pedersen, Jeppe Arnsdorf; Mønster, Jakob Døllner

2012-01-01

Determining and optimizing the performance of a Class-D audio power amplier can be very dicult without knowledge of the use of audio performance measuring equipment and of how the various noise and distortion sources in uence the audio performance. This paper gives an introduction on how to measure...
Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

Directory of Open Access Journals (Sweden)

Butko Taras

2011-01-01

Full Text Available Abstract Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzín-2010 evaluation campaign. That evaluation consisted of segmenting audio from the 3/24 Catalan TV channel into five acoustic classes: music, speech, speech over music, speech over noise, and the other. The evaluation results displayed the difficulty of this segmentation task. In this article, after presenting the database and metric, as well as the feature extraction methods and segmentation techniques used by the submitted systems, the experimental results are analyzed and compared, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.
Design of a WAV audio player based on K20

Directory of Open Access Journals (Sweden)

Xu Yu

2016-01-01

Full Text Available The designed player uses the Freescale Company’s MK20DX128VLH7 as the core control ship, and its hardware platform is equipped with VS1003 audio decoder, OLED display interface, USB interface and SD card slot. The player uses the open source embedded real-time operating system μC/OS-II, Freescale USB Stack V4.1.1 and FATFS, and a graphical user interface is developed to improve the user experience based on CGUI. In general, the designed WAV audio player has a strong applicability and a good practical value.
Design of an audio advertisement dataset

Science.gov (United States)

Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

2015-12-01

Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
One Size Does Not Fit All: Older Adults Benefit From Redundant Text in Multimedia Instruction

Directory of Open Access Journals (Sweden)

Barbara eFenesi

2015-07-01

Full Text Available The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration, Redundant (audio narration with redundant text, or Complementary (audio narration with non–redundant text and images. Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity, appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in working memory capacity. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning.
Students' Learning Experiences from Didactic Teaching Sessions Including Patient Case Examples as Either Text or Video: A Qualitative Study.

Science.gov (United States)

Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte; Mors, Ole; Ringsted, Charlotte; Morcke, Anne Mette

2017-10-06

The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed interviews. Students taught with text-based patient cases emphasized excitement and drama towards the personal clinical narratives presented by the teachers during the course, but never referred to the patient cases. Authority and boundary setting were regarded as important in managing patients. Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. The format of patient cases included in teaching may have a substantial impact on students' patient-centeredness. Video-based patient cases are probably more effective than text-based patient cases in fostering patient-centered perspectives in medical students. Teachers sharing stories from their own clinical experiences stimulates both engagement and excitement, but may also provoke unintended stigma and influence an authoritative approach in medical students towards managing patients in clinical psychiatry.
Efficient Audio Power Amplification - Challenges

DEFF Research Database (Denmark)

Andersen, Michael Andreas E.

2005-01-01

For more than a decade efficient audio power amplification has evolved and today switch-mode audio power amplification in various forms are the state-of-the-art. The technical steps that lead to this evolution are described and in addition many of the challenges still to be faced and where...

A video, text, and speech-driven realistic 3-d virtual head for human-machine interface.

Science.gov (United States)

Yu, Jun; Wang, Zeng-Fu

2015-05-01

A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.
Consequence of audio visual collection in school libraries

OpenAIRE

Kuri, Ramesh

2016-01-01

The collection of Audio-Visual in library plays important role in teaching and learning. The importance of audio visual (AV) technology in education should not be underestimated. If audio-visual collection in library is carefully planned and designed, it can provide a rich learning environment. In this article, an author discussed the consequences of Audio-Visual collection in libraries especially for students of school library
SDI HDTV video transmission unit; SDI high vision eizo denso sochi

Energy Technology Data Exchange (ETDEWEB)

NONE

1999-03-01

The number is increasing of broadcasting stations who use HDTV (high-definition television) signals for program formulation for the improvement of video quality for general broadcasting programs. The above-named unit, which transmits HDTV signals which are broadcasting materials over a long distance with their high quality well preserved, is capable of the optical transmission of SDI (serial digital interface) signals meeting the standards of BTA (Broadcasting Technologies Association) on 1 video channel and 2 digital audio channels. It can also work on the light wavelength multiplex system. It was developed under the guidance of TTNet (Tokyo Telecommunication Network Co., Inc.), and gained public favor when it successfully transmitted programs over a distance of approximately 400km from Nagano to Tokyo during the Nagano Olympic Games. It now transmits various high-quality video signals for instance in the relay of baseball games, thereby contributing to the improvement of video quality for broadcasting. (translated by NEDO)
Automating Commercial Video Game Development using Computational Intelligence

OpenAIRE

Tse G. Tan; Jason Teo; Patricia Anthony

2011-01-01

Problem statement: The retail sales of computer and video games have grown enormously during the last few years, not just in United States (US), but also all over the world. This is the reason a lot of game developers and academic researchers have focused on game related technologies, such as graphics, audio, physics and Artificial Intelligence (AI) with the goal of creating newer and more fun games. In recent years, there has been an increasing interest in game AI for pro...
A review of video security training and assessment-systems and their applications

International Nuclear Information System (INIS)

Cellucci, J.; Hall, R.J.

1991-01-01

This paper reports that during the last 10 years computer-aided video data collection and playback systems have been used as nuclear facility security training and assessment tools with varying degrees of success. These mobile systems have been used by trained security personnel for response force training, vulnerability assessment, force-on-force exercises and crisis management. Typically, synchronous recordings from multiple video cameras, communications audio, and digital sensor inputs; are played back to the exercise participants and then edited for training and briefing. Factors that have influence user acceptance include: frequency of use, the demands placed on security personnel, fear of punishment, user training requirements and equipment cost. The introduction of S-VHS video and new software for scenario planning, video editing and data reduction; should bring about a wider range of security applications and supply the opportunity for significant cost sharing with other user groups
New audio applications of beryllium metal

International Nuclear Information System (INIS)

Sato, M.

1977-01-01

The major applications of beryllium metal in the field of audio appliances are for the vibrating cones for the two types of speakers 'TWITTER' for high range sound and 'SQUAWKER' for mid range sound, and also for beryllium cantilever tube assembled in stereo cartridge. These new applications are based on the characteristic property of beryllium having high ratio of modulus of elasticity to specific gravity. The production of these audio parts is described, and the audio response is shown. (author)
The Transfer of Learning Associated with Audio Feedback on Written Work

Directory of Open Access Journals (Sweden)

Tanya Martini

2014-11-01

Full Text Available This study examined whether audio feedback provided to undergraduates (N=51 about one paper would prove beneficial in terms of improving their grades on another, unrelated paper of the same type. We examined this issue both in terms of student beliefs about learning transfer, as well as their actual ability to transfer what had been learned on one assignment to another, subsequent assignment. Results indicated that students believed that they would be able to transfer what they had learned via audio feedback. Moreover, results also suggested that students actually did generalize the overarching comments about content and structure made in the audio files to a subsequent paper, the content of which differed substantially from the initial one. Both students and teaching assistants demonstrated very favourable responses to this type of feedback, suggesting that it was both clear and comprehensive.
Vertigo with sudden hearing loss: audio-vestibular characteristics.

Science.gov (United States)

Pogson, Jacob M; Taylor, Rachael L; Young, Allison S; McGarvie, Leigh A; Flanagan, Sean; Halmagyi, G Michael; Welgampola, Miriam S

2016-10-01

Acute vertigo with sudden sensorineural hearing loss (SSNHL) is a rare clinical emergency. Here, we report the audio-vestibular test profiles of 27 subjects who presented with these symptoms. The vestibular test battery consisted of a three-dimensional video head impulse test (vHIT) of semicircular canal function and recording ocular and cervical vestibular-evoked myogenic potentials (oVEMP, cVEMP) to test otolith dysfunction. Unlike vestibular neuritis, where the horizontal and anterior canals with utricular function are more frequently impaired, 74 % of subjects with vertigo and SSNHL demonstrated impairment of the posterior canal gain (0.45 ± 0.20). Only 41 % showed impairment of the horizontal canal gains (0.78 ± 0.27) and 30 % of the anterior canal gains (0.79 ± 0.26), while 38 % of oVEMPs [asymmetry ratio (AR) = 41.0 ± 41.3 %] and 33 % of cVEMPs (AR = 47.3 ± 41.2 %) were significantly asymmetrical. Twenty-three subjects were diagnosed with labyrinthitis/labyrinthine infarction in the absence of evidence for an underlying pathology. Four subjects had a definitive diagnosis [Ramsay Hunt Syndrome, vestibular schwannoma, anterior inferior cerebellar artery (AICA) infarction, and traction injury]. Ischemia involving the common-cochlear or vestibulo-cochlear branches of the labyrinthine artery could be the simplest explanation for vertigo with SSNHL. Audio-vestibular tests did not provide easy separation between ischaemic and non-ischaemic causes of vertigo with SSNHL.
Audio watermarking robust against D/A and A/D conversions

Directory of Open Access Journals (Sweden)

Xiang Shijun

2011-01-01

Full Text Available Abstract Digital audio watermarking robust against digital-to-analog (D/A and analog-to-digital (A/D conversions is an important issue. In a number of watermark application scenarios, D/A and A/D conversions are involved. In this article, we first investigate the degradation due to DA/AD conversions via sound cards, which can be decomposed into volume change, additional noise, and time-scale modification (TSM. Then, we propose a solution for DA/AD conversions by considering the effect of the volume change, additional noise and TSM. For the volume change, we introduce relation-based watermarking method by modifying groups of the energy relation of three adjacent DWT coefficient sections. For the additional noise, we pick up the lowest-frequency coefficients for watermarking. For the TSM, the synchronization technique (with synchronization codes and an interpolation processing operation is exploited. Simulation tests show the proposed audio watermarking algorithm provides a satisfactory performance to DA/AD conversions and those common audio processing manipulations.
Identifying Engagement Patterns with Video Annotation Activities: A Case Study in Professional Development

Science.gov (United States)

Mirriahi, Negin; Jovanovic, Jelena; Dawson, Shane; Gaševic, Dragan; Pardo, Abelardo

2018-01-01

The rapid growth of blended and online learning models in higher education has resulted in a parallel increase in the use of audio-visual resources among students and teachers. Despite the heavy adoption of video resources, there have been few studies investigating their effect on learning processes and even less so in the context of academic…
Peningkatan Hasil Belajar melalui Pembelajaran Kooperatif Tipe STAD pada Pembelajaran Dasar Sinyal Video

Directory of Open Access Journals (Sweden)

Santi Utami

2016-01-01

Full Text Available This study aims at improving the student’s learning outcomes through the cooperative learning strategy with Student Teams Achievement Division (STAD at SMKN 1 Saptosari. This was classroom action research (CAR consisting of planning, action, observation, and reflection. The research subjects were the grade X students of Audio Video Engineering A. The data collection technique used the documentation of daily tests to show the improvement. The data was analyzed with descriptive statistics. This study consisted of three cycles. The mean of test results for the first, second and third cycles were 7.06, 5.9 and 7.09 respectively. It showed that the cooperative learning strategy with STAD could improve the students’ achievement to fulfill the Minimum Mastery Criterion (MMC.
Efficient audio power amplification - challenges

Energy Technology Data Exchange (ETDEWEB)

Andersen, Michael A.E.

2005-07-01

For more than a decade efficient audio power amplification has evolved and today switch-mode audio power amplification in various forms are the state-of-the-art. The technical steps that lead to this evolution are described and in addition many of the challenges still to be faced and where extensive research and development are needed is covered. (au)
Publicación de materiales audiovisuales a través de un servidor de video-streaming Publication of audio-visual materials through a streaming video server

Directory of Open Access Journals (Sweden)

Acevedo Clavijo Edwin Jovanny

2010-07-01

Full Text Available Esta propuesta tiene como objetivo estudiar varias alternativas de servidores Streaming para determinar la mejor herramienta para el desarrollo de la publicación de material audiovisual educativo. Se evaluaron las plataformas más utilizadas teniendo en cuenta sus características y beneficios que tiene cada servidor entre las los cuales están: Hélix Universal Server, Windows Media Server de Microsoft, Peer Cast y Darwin Server. implementando un servidor con mayores capacidades y beneficios para la publicación de videos con fines académicos a través de la intranet de la Universidad Cooperativa de Colombia seccional Barrancabermeja This proposal has as an principal objective to study different alternatives for streaming servers to determine the best tool in the project’s development. Platforms most used were evaluated features and benefits in each served such as: Helix Universal Server, Microsoft Windows Media Server, Peer Cast and Darwin Server. Implementing a server with more capabilities and benefits for the publication of videos for academic purposes through the intranet of the Cooperative University of Colombia Barrancabermeja’s sectional
Audio segmentation using Flattened Local Trimmed Range for ecological acoustic space analysis

Directory of Open Access Journals (Sweden)

Giovany Vega

2016-06-01

Full Text Available The acoustic space in a given environment is filled with footprints arising from three processes: biophony, geophony and anthrophony. Bioacoustic research using passive acoustic sensors can result in thousands of recordings. An important component of processing these recordings is to automate signal detection. In this paper, we describe a new spectrogram-based approach for extracting individual audio events. Spectrogram-based audio event detection (AED relies on separating the spectrogram into background (i.e., noise and foreground (i.e., signal classes using a threshold such as a global threshold, a per-band threshold, or one given by a classifier. These methods are either too sensitive to noise, designed for an individual species, or require prior training data. Our goal is to develop an algorithm that is not sensitive to noise, does not need any prior training data and works with any type of audio event. To do this, we propose: (1 a spectrogram filtering method, the Flattened Local Trimmed Range (FLTR method, which models the spectrogram as a mixture of stationary and non-stationary energy processes and mitigates the effect of the stationary processes, and (2 an unsupervised algorithm that uses the filter to detect audio events. We measured the performance of the algorithm using a set of six thoroughly validated audio recordings and obtained a sensitivity of 94% and a positive predictive value of 89%. These sensitivity and positive predictive values are very high, given that the validated recordings are diverse and obtained from field conditions. The algorithm was then used to extract audio events in three datasets. Features of these audio events were plotted and showed the unique aspects of the three acoustic communities.
Automatic Detection and Classification of Audio Events for Road Surveillance Applications

Directory of Open Access Journals (Sweden)

Noor Almaadeed

2018-06-01

Full Text Available This work investigates the problem of detecting hazardous events on roads by designing an audio surveillance system that automatically detects perilous situations such as car crashes and tire skidding. In recent years, research has shown several visual surveillance systems that have been proposed for road monitoring to detect accidents with an aim to improve safety procedures in emergency cases. However, the visual information alone cannot detect certain events such as car crashes and tire skidding, especially under adverse and visually cluttered weather conditions such as snowfall, rain, and fog. Consequently, the incorporation of microphones and audio event detectors based on audio processing can significantly enhance the detection accuracy of such surveillance systems. This paper proposes to combine time-domain, frequency-domain, and joint time-frequency features extracted from a class of quadratic time-frequency distributions (QTFDs to detect events on roads through audio analysis and processing. Experiments were carried out using a publicly available dataset. The experimental results conform the effectiveness of the proposed approach for detecting hazardous events on roads as demonstrated by 7% improvement of accuracy rate when compared against methods that use individual temporal and spectral features.
One size does not fit all: older adults benefit from redundant text in multimedia instruction.

Science.gov (United States)

Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A; Shore, David I; Heisz, Jennifer J

2015-01-01

The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non-redundant text and images). Younger participants learned better when audio narration was paired with relevant images compared to when audio narration was paired with redundant text. However, older participants learned best when audio narration was paired with redundant text. Younger adults, who presumably have a higher working memory capacity (WMC), appear to benefit more from complementary information that may drive deeper conceptual processing. In contrast, older adults learn better from presentations that support redundant coding across modalities, which may help mitigate the effects of age-related decline in WMC. Additionally, several misconceptions of design quality appeared across age groups: both younger and older participants positively rated less effective designs. Findings suggest that one-size does not fit all, with older adults requiring unique multimedia design tailored to their cognitive abilities for effective learning.
Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform

Directory of Open Access Journals (Sweden)

Umapathy Karthikeyan

2007-01-01

Full Text Available Wide band digital audio signals have a very high data-rate associated with them due to their complex nature and demand for high-quality reproduction. Although recent technological advancements have significantly reduced the cost of bandwidth and miniaturized storage facilities, the rapid increase in the volume of digital audio content constantly compels the need for better compression algorithms. Over the years various perceptually lossless compression techniques have been introduced, and transform-based compression techniques have made a significant impact in recent years. In this paper, we propose one such transform-based compression technique, where the joint time-frequency (TF properties of the nonstationary nature of the audio signals were exploited in creating a compact energy representation of the signal in fewer coefficients. The decomposition coefficients were processed and perceptually filtered to retain only the relevant coefficients. Perceptual filtering (psychoacoustics was applied in a novel way by analyzing and performing TF specific psychoacoustics experiments. An added advantage of the proposed technique is that, due to its signal adaptive nature, it does not need predetermined segmentation of audio signals for processing. Eight stereo audio signal samples of different varieties were used in the study. Subjective (mean opinion score—MOS listening tests were performed and the subjective difference grades (SDG were used to compare the performance of the proposed coder with MP3, AAC, and HE-AAC encoders. Compression ratios in the range of 8 to 40 were achieved by the proposed technique with subjective difference grades (SDG ranging from –0.53 to –2.27.
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Directory of Open Access Journals (Sweden)

Md. Rabiul Islam

2014-01-01

Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.
One size does not fit all: older adults benefit from redundant text in multimedia instruction

OpenAIRE

Fenesi, Barbara; Vandermorris, Susan; Kim, Joseph A.; Shore, David I.; Heisz, Jennifer J.

2015-01-01

The multimedia design of presentations typically ignores that younger and older adults have varying cognitive strengths and weaknesses. We examined whether differential instructional design may enhance learning in these populations. Younger and older participants viewed one of three computer-based presentations: Audio only (narration), Redundant (audio narration with redundant text), or Complementary (audio narration with non–redundant text and images). Younger participants learned better ...
Survey of WBSNs for Pre-Hospital Assistance: Trends to Maximize the Network Lifetime and Video Transmission Techniques

Directory of Open Access Journals (Sweden)

Enrique Gonzalez

2015-05-01

Full Text Available This survey aims to encourage the multidisciplinary communities to join forces for innovation in the mobile health monitoring area. Specifically, multidisciplinary innovations in medical emergency scenarios can have a significant impact on the effectiveness and quality of the procedures and practices in the delivery of medical care. Wireless body sensor networks (WBSNs are a promising technology capable of improving the existing practices in condition assessment and care delivery for a patient in a medical emergency. This technology can also facilitate the early interventions of a specialist physician during the pre-hospital period. WBSNs make possible these early interventions by establishing remote communication links with video/audio support and by providing medical information such as vital signs, electrocardiograms, etc. in real time. This survey focuses on relevant issues needed to understand how to setup a WBSN for medical emergencies. These issues are: monitoring vital signs and video transmission, energy efficient protocols, scheduling, optimization and energy consumption on a WBSN.

MRI-compatible audio/visual system: impact on pediatric sedation

International Nuclear Information System (INIS)

Harned, R.K. II; Strain, J.D.

2001-01-01

Background. While sedation is necessary for much pediatric imaging, there are new alternatives that may help patients hold still without medication. Objective. We examined the effect of an audio/visual system consisting of video goggles and earphones on the need for sedation during magnetic resonance imaging (MRI). Materials and methods. All MRI examinations from May 1999 to October 1999 performed after installation of the MRVision 2000 (Resonance Technology, Inc.) were compared to the same 6-month period in 1998. Imaging and sedation protocols remained constant. Data collected included: patient age, type of examination, use of intravenous contrast enhancement, and need for sedation. The average supply charge and nursing cost per sedated patient were calculated. Results. The 955 patients from 1998 and 1,112 patients from 1999 were similar in demographics and examination distribution. There was an overall reduction in the percent of patients requiring sedation in the group using the video goggle system from 49 to 40 % (P < 0.001). There was no significant change for 0-2 years (P = 0.805), but there was a reduction from 53 to 40 % for age 3-10 years (P < 0.001) and 16 to 8 % for those older than 10 years (P < 0.001). There was a 17 % decrease in MRI room time for those patients whose examinations could be performed without sedation. Sedation costs per patient were $80 for nursing and $29 for supplies. Conclusion. The use of this video system reduced the number of children requiring sedation for MRI examination by 18 %. In addition to reducing patient risk, this can potentially reduce cost. (orig.)
Audio Recording of Children with Dyslalia

OpenAIRE

Stefan Gheorghe Pentiuc; Maria D. Schipor; Ovidiu A. Schipor

2008-01-01

In this paper we present our researches regarding automat parsing of audio recordings. These recordings are obtained from children with dyslalia and are necessary for an accurate identification of speech problems. We develop a software application that helps parsing audio, real time, recordings.
Comparison of Linear Prediction Models for Audio Signals

Directory of Open Access Journals (Sweden)

2009-03-01

Full Text Available While linear prediction (LP has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consisting of a number of sinusoids can be perfectly predicted based on an (all-pole LP model with a model order that is twice the number of sinusoids. We provide an explanation why this result cannot simply be extrapolated to LP of audio signals. If noise is taken into account in the tonal signal model, a low-order all-pole model appears to be only appropriate when the tonal components are uniformly distributed in the Nyquist interval. Based on this observation, different alternatives to the conventional LP model can be suggested. Either the model should be changed to a pole-zero, a high-order all-pole, or a pitch prediction model, or the conventional LP model should be preceded by an appropriate frequency transform, such as a frequency warping or downsampling. By comparing these alternative LP models to the conventional LP model in terms of frequency estimation accuracy, residual spectral flatness, and perceptual frequency resolution, we obtain several new and promising approaches to LP-based audio modeling.
Audio Visual Center

Data.gov (United States)

Federal Laboratory Consortium — The Audiovisual Services Center provides still photographic documentation with laboratory support, video documentation, video editing, video duplication, photo/video...
A Novel Robust Audio Watermarking Algorithm by Modifying the Average Amplitude in Transform Domain

Directory of Open Access Journals (Sweden)

Qiuling Wu

2018-05-01

Full Text Available In order to improve the robustness and imperceptibility in practical application, a novel audio watermarking algorithm with strong robustness is proposed by exploring the multi-resolution characteristic of discrete wavelet transform (DWT and the energy compaction capability of discrete cosine transform (DCT. The human auditory system is insensitive to the minor changes in the frequency components of the audio signal, so the watermarks can be embedded by slightly modifying the frequency components of the audio signal. The audio fragments segmented from the cover audio signal are decomposed by DWT to obtain several groups of wavelet coefficients with different frequency bands, and then the fourth level detail coefficient is selected to be divided into the former packet and the latter packet, which are executed for DCT to get two sets of transform domain coefficients (TDC respectively. Finally, the average amplitudes of the two sets of TDC are modified to embed the binary image watermark according to the special embedding rule. The watermark extraction is blind without the carrier audio signal. Experimental results confirm that the proposed algorithm has good imperceptibility, large payload capacity and strong robustness when resisting against various attacks such as MP3 compression, low-pass filtering, re-sampling, re-quantization, amplitude scaling, echo addition and noise corruption.
Economic and legal aspects of introducing novel ICT instruments: integrating sound into social media marketing - from audio branding to soundscaping

OpenAIRE

Daj, A.

2013-01-01

The pervasive expansion and implementation of ICT based marketing instruments imposes a new economic investigation of business models and regulatory solutions. Moreover, the current status of Social Media research indicates that the use of social networking and collaboration technologies is deeply changing the way people communicate, consume and cooperate with each other. Against the backdrop of widespread availability of digital audio-video content and the growing number of “smart” mobile de...
FPGAs Implementation of fast algorithms oriented to mp3 audio decompression

Directory of Open Access Journals (Sweden)

Antonio Benavides

2012-01-01

Full Text Available La ejecución de los algoritmos de descompresión de audio exige procesadores potentes con alto nivel de desempeño, sin embargo, dichos algoritmos no son apropiados para aplicaciones óptimas en dispositivos móviles. En este trabajo se lleva a cabo una exploración de algunos algoritmos cuya implementación en hardware permite mejorar el desempeño de los procesadores usados en dispositivos móviles que ejecutan tareas de descompresión de audio. Se presentan algunos resultados experimentales y análisis comparativos.
Parametric time-frequency domain spatial audio

CERN Document Server

Delikaris-Manias, Symeon; Politis, Archontis

2018-01-01

This book provides readers with the principles and best practices in spatial audio signal processing. It describes how sound fields and their perceptual attributes are captured and analyzed within the time-frequency domain, how essential representation parameters are coded, and how such signals are efficiently reproduced for practical applications. The book is split into four parts starting with an overview of the fundamentals. It then goes on to explain the reproduction of spatial sound before offering an examination of signal-dependent spatial filtering. The book finishes with coverage of both current and future applications and the direction that spatial audio research is heading in. Parametric Time-frequency Domain Spatial Audio focuses on applications in entertainment audio, including music, home cinema, and gaming--covering the capturing and reproduction of spatial sound as well as its generation, transduction, representation, transmission, and perception. This book will teach readers the tools needed...
Predicting the Overall Spatial Quality of Automotive Audio Systems

Science.gov (United States)

Koya, Daisuke

The spatial quality of automotive audio systems is often compromised due to their unideal listening environments. Automotive audio systems need to be developed quickly due to industry demands. A suitable perceptual model could evaluate the spatial quality of automotive audio systems with similar reliability to formal listening tests but take less time. Such a model is developed in this research project by adapting an existing model of spatial quality for automotive audio use. The requirements for the adaptation were investigated in a literature review. A perceptual model called QESTRAL was reviewed, which predicts the overall spatial quality of domestic multichannel audio systems. It was determined that automotive audio systems are likely to be impaired in terms of the spatial attributes that were not considered in developing the QESTRAL model, but metrics are available that might predict these attributes. To establish whether the QESTRAL model in its current form can accurately predict the overall spatial quality of automotive audio systems, MUSHRA listening tests using headphone auralisation with head tracking were conducted to collect results to be compared against predictions by the model. Based on guideline criteria, the model in its current form could not accurately predict the overall spatial quality of automotive audio systems. To improve prediction performance, the QESTRAL model was recalibrated and modified using existing metrics of the model, those that were proposed from the literature review, and newly developed metrics. The most important metrics for predicting the overall spatial quality of automotive audio systems included those that were interaural cross-correlation (IACC) based, relate to localisation of the frontal audio scene, and account for the perceived scene width in front of the listener. Modifying the model for automotive audio systems did not invalidate its use for domestic audio systems. The resulting model predicts the overall spatial
Underdetermined Blind Audio Source Separation Using Modal Decomposition

Directory of Open Access Journals (Sweden)

Abdeldjalil Aïssa-El-Bey

2007-03-01

Full Text Available This paper introduces new algorithms for the blind separation of audio sources using modal decomposition. Indeed, audio signals and, in particular, musical signals can be well approximated by a sum of damped sinusoidal (modal components. Based on this representation, we propose a two-step approach consisting of a signal analysis (extraction of the modal components followed by a signal synthesis (grouping of the components belonging to the same source using vector clustering. For the signal analysis, two existing algorithms are considered and compared: namely the EMD (empirical mode decomposition algorithm and a parametric estimation algorithm using ESPRIT technique. A major advantage of the proposed method resides in its validity for both instantaneous and convolutive mixtures and its ability to separate more sources than sensors. Simulation results are given to compare and assess the performance of the proposed algorithms.
Underdetermined Blind Audio Source Separation Using Modal Decomposition

Directory of Open Access Journals (Sweden)

Aïssa-El-Bey Abdeldjalil

2007-01-01

Full Text Available This paper introduces new algorithms for the blind separation of audio sources using modal decomposition. Indeed, audio signals and, in particular, musical signals can be well approximated by a sum of damped sinusoidal (modal components. Based on this representation, we propose a two-step approach consisting of a signal analysis (extraction of the modal components followed by a signal synthesis (grouping of the components belonging to the same source using vector clustering. For the signal analysis, two existing algorithms are considered and compared: namely the EMD (empirical mode decomposition algorithm and a parametric estimation algorithm using ESPRIT technique. A major advantage of the proposed method resides in its validity for both instantaneous and convolutive mixtures and its ability to separate more sources than sensors. Simulation results are given to compare and assess the performance of the proposed algorithms.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration ... Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: ...
Use and Effectiveness of a Video- and Text-Driven Web-Based Computer-Tailored Intervention: Randomized Controlled Trial.

Science.gov (United States)

Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein

2015-09-25

Many Web-based computer-tailored interventions are characterized by high dropout rates, which limit their potential impact. This study had 4 aims: (1) examining if the use of a Web-based computer-tailored obesity prevention intervention can be increased by using videos as the delivery format, (2) examining if the delivery of intervention content via participants' preferred delivery format can increase intervention use, (3) examining if intervention effects are moderated by intervention use and matching or mismatching intervention delivery format preference, (4) and identifying which sociodemographic factors and intervention appreciation variables predict intervention use. Data were used from a randomized controlled study into the efficacy of a video and text version of a Web-based computer-tailored obesity prevention intervention consisting of a baseline measurement and a 6-month follow-up measurement. The intervention consisted of 6 weekly sessions and could be used for 3 months. ANCOVAs were conducted to assess differences in use between the video and text version and between participants allocated to a matching and mismatching intervention delivery format. Potential moderation by intervention use and matching/mismatching delivery format on self-reported body mass index (BMI), physical activity, and energy intake was examined using regression analyses with interaction terms. Finally, regression analysis was performed to assess determinants of intervention use. In total, 1419 participants completed the baseline questionnaire (follow-up response=71.53%, 1015/1419). Intervention use declined rapidly over time; the first 2 intervention sessions were completed by approximately half of the participants and only 10.9% (104/956) of the study population completed all 6 sessions of the intervention. There were no significant differences in use between the video and text version. Intervention use was significantly higher among participants who were allocated to an
Blood Sampling in Newborns: A Systematic Review of YouTube Videos.

Science.gov (United States)

Bueno, Mariana; Nishi, Érika Tihemi; Costa, Taine; Freire, Laís Machado; Harrison, Denise

Objective of this study was to conduct a systematic review of YouTube videos showing neonatal blood sampling, and to evaluate pain management and comforting interventions used. Selected videos were consumer- or professional-produced videos showing human newborns undergoing heel lancing or venipuncture for blood sampling, videos showing the entire blood sampling procedure (from the first attempt or puncture to the time of application of a cotton ball or bandage), publication date prior to October 2014, Portuguese titles, available audio. Search terms included "neonate," "newborn," "neonatal screening," and "blood collection." Two reviewers independently screened the videos and extracted the following data. A total of 13 140 videos were retrieved, of which 1354 were further evaluated, and 68 were included. Videos were mostly consumer produced (97%). Heel lancing was performed in 62 (91%). Forty-nine infants (72%) were held by an adult during the procedure. Median pain score immediately after puncture was 4 (interquartile range [IQR] = 0-5), and median length of cry throughout the procedure was 61 seconds (IQR = 88). Breastfeeding (3%) and swaddling (1.5%) were rarely implemented. Posted YouTube videos in Portuguese of newborns undergoing blood collection demonstrate minimal use of pain treatment, and maximal distress during procedures. Knowledge translation strategies are needed to implement effective measures for neonatal pain relief and comfort.
Fusion for Audio-Visual Laughter Detection

NARCIS (Netherlands)

Reuderink, B.

2007-01-01

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

OpenAIRE

Sager, Sebastian; Elizalde, Benjamin; Borth, Damian; Schulze, Christian; Raj, Bhiksha; Lane, Ian

2016-01-01

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus cons...
Tourism research and audio methods

DEFF Research Database (Denmark)

Jensen, Martin Trandberg

2016-01-01

• Audio methods enriches sensuous tourism ethnographies. • The note suggests five research avenues for future auditory scholarship. • Sensuous tourism research has neglected the role of sounds in embodied tourism experiences.......• Audio methods enriches sensuous tourism ethnographies. • The note suggests five research avenues for future auditory scholarship. • Sensuous tourism research has neglected the role of sounds in embodied tourism experiences....
Selective encryption for H.264/AVC video coding

Science.gov (United States)

Shi, Tuo; King, Brian; Salama, Paul

2006-02-01

Due to the ease with which digital data can be manipulated and due to the ongoing advancements that have brought us closer to pervasive computing, the secure delivery of video and images has become a challenging problem. Despite the advantages and opportunities that digital video provide, illegal copying and distribution as well as plagiarism of digital audio, images, and video is still ongoing. In this paper we describe two techniques for securing H.264 coded video streams. The first technique, SEH264Algorithm1, groups the data into the following blocks of data: (1) a block that contains the sequence parameter set and the picture parameter set, (2) a block containing a compressed intra coded frame, (3) a block containing the slice header of a P slice, all the headers of the macroblock within the same P slice, and all the luma and chroma DC coefficients belonging to the all the macroblocks within the same slice, (4) a block containing all the ac coefficients, and (5) a block containing all the motion vectors. The first three are encrypted whereas the last two are not. The second method, SEH264Algorithm2, relies on the use of multiple slices per coded frame. The algorithm searches the compressed video sequence for start codes (0x000001) and then encrypts the next N bits of data.
DAFX Digital Audio Effects

CERN Document Server

Zö

2011-01-01

The rapid development in various fields of Digital Audio Effects, or DAFX, has led to new algorithms and this second edition of the popular book, DAFX: Digital Audio Effects has been updated throughout to reflect progress in the field. It maintains a unique approach to DAFX with a lecture-style introduction into the basics of effect processing. Each effect description begins with the presentation of the physical and acoustical phenomena, an explanation of the signal processing techniques to achieve the effect, followed by a discussion of musical applications and the control of effect parameter
Technical Skills Training for Veterinary Students: A Comparison of Simulators and Video for Teaching Standardized Cardiac Dissection.

Science.gov (United States)

Allavena, Rachel E; Schaffer-White, Andrea B; Long, Hanna; Alawneh, John I

The goal of the study was to evaluate alternative student-centered approaches that could replace autopsy sessions and live demonstration and to explore refinements in assessment procedures for standardized cardiac dissection. Simulators and videos were identified as feasible, economical, student-centered teaching methods for technical skills training in medical contexts, and a direct comparison was undertaken. A low-fidelity anatomically correct simulator approximately the size of a horse's heart with embedded dissection pathways was constructed and used with a series of laminated photographs of standardized cardiac dissection. A video of a standardized cardiac dissection of a normal horse's heart was recorded and presented with audio commentary. Students were allowed to nominate a preference for learning method, and students who indicated no preference were randomly allocated to keep group numbers even. Objective performance data from an objective structure assessment criterion and student perception data on confidence and competency from surveys showed both innovations were similarly effective. Evaluator reflections as well as usage logs to track patterns of student use were both recorded. A strong selection preference was identified for kinesthetic learners choosing the simulator and visual learners choosing the video. Students in the video cohort were better at articulating the reasons for dissection procedures and sequence due to the audio commentary, and student satisfaction was higher with the video. The major conclusion of this study was that both methods are effective tools for technical skills training, but consideration should be given to the preferred learning style of adult learners to maximize educational outcomes.

47 CFR 10.520 - Common audio attention signal.

Science.gov (United States)

2010-10-01

... 47 Telecommunication 1 2010-10-01 2010-10-01 false Common audio attention signal. 10.520 Section... Equipment Requirements § 10.520 Common audio attention signal. A Participating CMS Provider and equipment manufacturers may only market devices for public use under part 10 that include an audio attention signal that...
Video Retrieval Berdasarkan Teks dan Gambar

Directory of Open Access Journals (Sweden)

Rahmi Hidayati

2013-01-01

Abstract Retrieval video has been used to search a video based on the query entered by user which were text and image. This system could increase the searching ability on video browsing and expected to reduce the video’s retrieval time. The research purposes were designing and creating a software application of retrieval video based on the text and image on the video. The index process for the text is tokenizing, filtering (stopword, stemming. The results of stemming to saved in the text index table. Index process for the image is to create an image color histogram and compute the mean and standard deviation at each primary color red, green and blue (RGB of each image. The results of feature extraction is stored in the image table The process of video retrieval using the query text, images or both. To text query system to process the text query by looking at the text index tables. If there is a text query on the index table system will display information of the video according to the text query. To image query system to process the image query by finding the value of the feature extraction means red, green means, means blue, red standard deviation, standard deviation and standard deviation of blue green. If the value of the six features extracted query image on the index table image will display the video information system according to the query image. To query text and query images, the system will display the video information if the query text and query images have a relationship that is query text and query image has the same film title. Keywords— video, index, retrieval, text, image
Teaching the blind to find their way by playing video games.

Science.gov (United States)

Merabet, Lotfi B; Connors, Erin C; Halko, Mark A; Sánchez, Jaime

2012-01-01

Computer based video games are receiving great interest as a means to learn and acquire new skills. As a novel approach to teaching navigation skills in the blind, we have developed Audio-based Environment Simulator (AbES); a virtual reality environment set within the context of a video game metaphor. Despite the fact that participants were naïve to the overall purpose of the software, we found that early blind users were able to acquire relevant information regarding the spatial layout of a previously unfamiliar building using audio based cues alone. This was confirmed by a series of behavioral performance tests designed to assess the transfer of acquired spatial information to a large-scale, real-world indoor navigation task. Furthermore, learning the spatial layout through a goal directed gaming strategy allowed for the mental manipulation of spatial information as evidenced by enhanced navigation performance when compared to an explicit route learning strategy. We conclude that the immersive and highly interactive nature of the software greatly engages the blind user to actively explore the virtual environment. This in turn generates an accurate sense of a large-scale three-dimensional space and facilitates the learning and transfer of navigation skills to the physical world.
Music Genre Classification Using MIDI and Audio Features

Science.gov (United States)

Cataltepe, Zehra; Yaslan, Yusuf; Sonmez, Abdullah

2007-12-01

We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinaga's. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.
Efficiently Synchronized Spread-Spectrum Audio Watermarking with Improved Psychoacoustic Model

Directory of Open Access Journals (Sweden)

Xing He

2008-01-01

Full Text Available This paper presents an audio watermarking scheme which is based on an efficiently synchronized spread-spectrum technique and a new psychoacoustic model computed using the discrete wavelet packet transform. The psychoacoustic model takes advantage of the multiresolution analysis of a wavelet transform, which closely approximates the standard critical band partition. The goal of this model is to include an accurate time-frequency analysis and to calculate both the frequency and temporal masking thresholds directly in the wavelet domain. Experimental results show that this watermarking scheme can successfully embed watermarks into digital audio without introducing audible distortion. Several common watermark attacks were applied and the results indicate that the method is very robust to those attacks.
Distortion-Free 1-Bit PWM Coding for Digital Audio Signals

Directory of Open Access Journals (Sweden)

John Mourjopoulos

2007-01-01

Full Text Available Although uniformly sampled pulse width modulation (UPWM represents a very efficient digital audio coding scheme for digital-to-analog conversion and full-digital amplification, it suffers from strong harmonic distortions, as opposed to benign non-harmonic artifacts present in analog PWM (naturally sampled PWM, NPWM. Complete elimination of these distortions usually requires excessive oversampling of the source PCM audio signal, which results to impractical realizations of digital PWM systems. In this paper, a description of digital PWM distortion generation mechanism is given and a novel principle for their minimization is proposed, based on a process having some similarity to the dithering principle employed in multibit signal quantization. This conditioning signal is termed “jither” and it can be applied either in the PCM amplitude or the PWM time domain. It is shown that the proposed method achieves significant decrement of the harmonic distortions, rendering digital PWM performance equivalent to that of source PCM audio, for mild oversampling (e.g., ×4 resulting to typical PWM clock rates of 90 MHz.
Distortion-Free 1-Bit PWM Coding for Digital Audio Signals

Directory of Open Access Journals (Sweden)

Mourjopoulos John

2007-01-01

Full Text Available Although uniformly sampled pulse width modulation (UPWM represents a very efficient digital audio coding scheme for digital-to-analog conversion and full-digital amplification, it suffers from strong harmonic distortions, as opposed to benign non-harmonic artifacts present in analog PWM (naturally sampled PWM, NPWM. Complete elimination of these distortions usually requires excessive oversampling of the source PCM audio signal, which results to impractical realizations of digital PWM systems. In this paper, a description of digital PWM distortion generation mechanism is given and a novel principle for their minimization is proposed, based on a process having some similarity to the dithering principle employed in multibit signal quantization. This conditioning signal is termed "jither" and it can be applied either in the PCM amplitude or the PWM time domain. It is shown that the proposed method achieves significant decrement of the harmonic distortions, rendering digital PWM performance equivalent to that of source PCM audio, for mild oversampling (e.g., resulting to typical PWM clock rates of 90 MHz.
Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform

Directory of Open Access Journals (Sweden)

Karthikeyan Umapathy

2007-08-01

Full Text Available Wide band digital audio signals have a very high data-rate associated with them due to their complex nature and demand for high-quality reproduction. Although recent technological advancements have significantly reduced the cost of bandwidth and miniaturized storage facilities, the rapid increase in the volume of digital audio content constantly compels the need for better compression algorithms. Over the years various perceptually lossless compression techniques have been introduced, and transform-based compression techniques have made a significant impact in recent years. In this paper, we propose one such transform-based compression technique, where the joint time-frequency (TF properties of the nonstationary nature of the audio signals were exploited in creating a compact energy representation of the signal in fewer coefficients. The decomposition coefficients were processed and perceptually filtered to retain only the relevant coefficients. Perceptual filtering (psychoacoustics was applied in a novel way by analyzing and performing TF specific psychoacoustics experiments. An added advantage of the proposed technique is that, due to its signal adaptive nature, it does not need predetermined segmentation of audio signals for processing. Eight stereo audio signal samples of different varieties were used in the study. Subjective (mean opinion scoreÃ¢Â€Â”MOS listening tests were performed and the subjective difference grades (SDG were used to compare the performance of the proposed coder with MP3, AAC, and HE-AAC encoders. Compression ratios in the range of 8 to 40 were achieved by the proposed technique with subjective difference grades (SDG ranging from Ã¢Â€Â“0.53 to Ã¢Â€Â“2.27.
Realtime Audio with Garbage Collection

OpenAIRE

Matheussen, Kjetil Svalastog

2010-01-01

Two non-moving concurrent garbage collectors tailored for realtime audio processing are described. Both collectors work on copies of the heap to avoid cache misses and audio-disruptive synchronizations. Both collectors are targeted at multiprocessor personal computers. The first garbage collector works in uncooperative environments, and can replace Hans Boehm's conservative garbage collector for C and C++. The collector does not access the virtual memory system. Neither doe...
Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

Directory of Open Access Journals (Sweden)

Georgios Mantokoudis

Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
CERN automatic audio-conference service

International Nuclear Information System (INIS)

Sierra Moral, Rodrigo

2010-01-01

Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first European pilot and several improvements (such as billing, security, redundancy...) were implemented based on CERN's recommendations. The new automatic conference system has been operational since the second half of 2006. It is very popular for the users and has doubled the number of conferences in the past two years.
CERN automatic audio-conference service

Energy Technology Data Exchange (ETDEWEB)

Sierra Moral, Rodrigo, E-mail: Rodrigo.Sierra@cern.c [CERN, IT Department 1211 Geneva-23 (Switzerland)

2010-04-01

Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first European pilot and several improvements (such as billing, security, redundancy...) were implemented based on CERN's recommendations. The new automatic conference system has been operational since the second half of 2006. It is very popular for the users and has doubled the number of conferences in the past two years.
CERN automatic audio-conference service

Science.gov (United States)

Sierra Moral, Rodrigo

2010-04-01

Scientists from all over the world need to collaborate with CERN on a daily basis. They must be able to communicate effectively on their joint projects at any time; as a result telephone conferences have become indispensable and widely used. Managed by 6 operators, CERN already has more than 20000 hours and 5700 audio-conferences per year. However, the traditional telephone based audio-conference system needed to be modernized in three ways. Firstly, to provide the participants with more autonomy in the organization of their conferences; secondly, to eliminate the constraints of manual intervention by operators; and thirdly, to integrate the audio-conferences into a collaborative working framework. The large number, and hence cost, of the conferences prohibited externalization and so the CERN telecommunications team drew up a specification to implement a new system. It was decided to use a new commercial collaborative audio-conference solution based on the SIP protocol. The system was tested as the first European pilot and several improvements (such as billing, security, redundancy...) were implemented based on CERN's recommendations. The new automatic conference system has been operational since the second half of 2006. It is very popular for the users and has doubled the number of conferences in the past two years.
Training value of laparoscopic colorectal videos on the World Wide Web: a pilot study on the educational quality of laparoscopic right hemicolectomy videos.

Science.gov (United States)

Celentano, V; Browning, M; Hitchins, C; Giglio, M C; Coleman, M G

2017-11-01

Instructive laparoscopy videos with appropriate exposition could be ideal for initial training in laparoscopic surgery, but unfortunately there are no guidelines for annotating these videos or agreed methods to measure the educational content and the safety of the procedure presented. Aim of this study is to systematically search the World Wide Web to determine the availability of laparoscopic colorectal surgery videos and to objectively establish their potential training value. A search for laparoscopic right hemicolectomy videos was performed on the three most used English language web search engines Google.com, Bing.com, and Yahoo.com; moreover, a survey among 25 local trainees was performed to identify additional websites for inclusion. All laparoscopic right hemicolectomy videos with an English language title were included. Videos of open surgery, single incision laparoscopic surgery, robotic, and hand-assisted surgery were excluded. The safety of the demonstrated procedure was assessed with a validated competency assessment tool specifically designed for laparoscopic colorectal surgery and data on the educational content of the video were extracted. Thirty-one websites were identified and 182 surgical videos were included. One hundred and seventy-three videos (95%) detailed the year of publication; this demonstrated a significant increase in the number of videos published per year from 2009. Characteristics of the patient were rarely presented, only 10 videos (5.4%) reported operating time and only 6 videos (3.2%) reported 30-day morbidity; 34 videos (18.6%) underwent a peer-review process prior to publication. Formal case presentation, the presence of audio narration, the use of diagrams, and snapshots and a step-by-step approach are all characteristics of peer-reviewed videos but no significant difference was found in the safety of the procedure. Laparoscopic videos can be a useful adjunct to operative training. There is a large and increasing amount of
Near-field Localization of Audio

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2014-01-01

Localization of audio sources using microphone arrays has been an important research problem for more than two decades. Many traditional methods for solving the problem are based on a two-stage procedure: first, information about the audio source, such as time differences-of-arrival (TDOAs......) and gain ratios-of-arrival (GROAs) between microphones is estimated, and, second, this knowledge is used to localize the audio source. These methods often have a low computational complexity, but this comes at the cost of a limited estimation accuracy. Therefore, we propose a new localization approach......, where the desired signal is modeled using TDOAs and GROAs, which are determined by the source location. This facilitates the derivation of one-stage, maximum likelihood methods under a white Gaussian noise assumption that is applicable in both near- and far-field scenarios. Simulations show...
Automatic defect detection in video archives: application to Montreux Jazz Festival digital archives

Science.gov (United States)

Hanhart, Philippe; Rerabek, Martin; Ivanov, Ivan; Dufaux, Alain; Jones, Caryl; Delidais, Alexandre; Ebrahimi, Touradj

2013-09-01

Archival of audio-visual databases has become an important discipline in multimedia. Various defects are typ- ically present in such archives. Among those, one can mention recording related defects such as interference between audio and video signals, optical related artifacts, recording and play out artifacts such as horizontal lines, and dropouts, as well as those due to digitization such as diagonal lines. An automatic or semi-automatic detection to identify such defects is useful, especially for large databases. In this paper, we propose two auto- matic algorithms for detection of horizontal and diagonal lines, as well as dropouts that are among the most typical artifacts encountered. We then evaluate the performance of these algorithms by making use of ground truth scores obtained by human subjects.
Musical Audio Synthesis Using Autoencoding Neural Nets

OpenAIRE

Sarroff, Andy; Casey, Michael A.

2014-01-01

With an optimal network topology and tuning of hyperpa-\\ud rameters, artificial neural networks (ANNs) may be trained\\ud to learn a mapping from low level audio features to one\\ud or more higher-level representations. Such artificial neu-\\ud ral networks are commonly used in classification and re-\\ud gression settings to perform arbitrary tasks. In this work\\ud we suggest repurposing autoencoding neural networks as\\ud musical audio synthesizers. We offer an interactive musi-\\ud cal audio synt...
14 CFR 382.69 - What requirements must carriers meet concerning the accessibility of videos, DVDs, and other...

Science.gov (United States)

2010-01-01

... concerning the accessibility of videos, DVDs, and other audio-visual presentations shown on-aircraft to individuals who are deaf or hard of hearing? 382.69 Section 382.69 Aeronautics and Space OFFICE OF THE... BASIS OF DISABILITY IN AIR TRAVEL Accessibility of Aircraft § 382.69 What requirements must carriers...
Spatial audio reproduction with primary ambient extraction

CERN Document Server

He, JianJun

2017-01-01

This book first introduces the background of spatial audio reproduction, with different types of audio content and for different types of playback systems. A literature study on the classical and emerging Primary Ambient Extraction (PAE) techniques is presented. The emerging techniques aim to improve the extraction performance and also enhance the robustness of PAE approaches in dealing with more complex signals encountered in practice. The in-depth theoretical study helps readers to understand the rationales behind these approaches. Extensive objective and subjective experiments validate the feasibility of applying PAE in spatial audio reproduction systems. These experimental results, together with some representative audio examples and MATLAB codes of the key algorithms, illustrate clearly the differences among various approaches and also help readers gain insights on selecting different approaches for different applications.
An Analysis of Quality of Service (QoS In Live Video Streaming Using Evolved HSPA Network Media

Directory of Open Access Journals (Sweden)

Achmad Zakaria Azhar

2016-10-01

Full Text Available Evolved High Speed Packet Access (HSPA+ is a mobile telecommunication system technology and the evolution of HSPA technology. This technology has a packet data based service with downlink speeds up to 21.1 Mbps and uplink speed up to 11.5 Mbps on the bandwidth 5MHz. This technology is expected to fulfill and support the needs for information that involves all aspects of multimedia such as video and audio, especially live video streaming. By utilizing this technology it will facilitate communicating the information, for example to monitoring the situation of the house, the news coverage at some certain area, and other events in real time. This thesis aims to identify and test the Quality of Service (QoS performance on the network that is used for live video streaming with the parameters of throughput, delay, jitter and packet loss. The software used for monitoring the data traffic of the live video streaming network is wireshark network analyzer. From the test results it is obtained that the average throughput of provider B is 5,295 Kbps bigger than the provider A, the average delay of provider B is 0.618 ms smaller than the provider A, the average jitter of provider B is 0.420 ms smaller than the provider A and the average packet loss of provider B is 0.451% smaller than the provider A.

Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-06-01

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
Collusion-resistant audio fingerprinting system in the modulated complex lapped transform domain.

Directory of Open Access Journals (Sweden)

Jose Juan Garcia-Hernandez

Full Text Available Collusion-resistant fingerprinting paradigm seems to be a practical solution to the piracy problem as it allows media owners to detect any unauthorized copy and trace it back to the dishonest users. Despite the billionaire losses in the music industry, most of the collusion-resistant fingerprinting systems are devoted to digital images and very few to audio signals. In this paper, state-of-the-art collusion-resistant fingerprinting ideas are extended to audio signals and the corresponding parameters and operation conditions are proposed. Moreover, in order to carry out fingerprint detection using just a fraction of the pirate audio clip, block-based embedding and its corresponding detector is proposed. Extensive simulations show the robustness of the proposed system against average collusion attack. Moreover, by using an efficient Fast Fourier Transform core and standard computer machines it is shown that the proposed system is suitable for real-world scenarios.
Pre-recorded instructional audio vs. dispatchers’ conversational assistance in telephone cardiopulmonary resuscitation: A randomized controlled simulation study

Science.gov (United States)

Birkun, Alexei; Glotov, Maksim; Ndjamen, Herman Franklin; Alaiye, Esther; Adeleke, Temidara; Samarin, Sergey

2018-01-01

BACKGROUND: To assess the effectiveness of the telephone chest-compression-only cardiopulmonary resuscitation (CPR) guided by a pre-recorded instructional audio when compared with dispatcher-assisted resuscitation. METHODS: It was a prospective, blind, randomised controlled study involving 109 medical students without previous CPR training. In a standardized mannequin scenario, after the step of dispatcher-assisted cardiac arrest recognition, the participants performed compression-only resuscitation guided over the telephone by either: (1) the pre-recorded instructional audio (n=57); or (2) verbal dispatcher assistance (n=52). The simulation video records were reviewed to assess the CPR performance using a 13-item checklist. The interval from call reception to the first compression, total number and rate of compressions, total number and duration of pauses after the first compression were also recorded. RESULTS: There were no significant differences between the recording-assisted and dispatcher-assisted groups based on the overall performance score (5.6±2.2 vs. 5.1±1.9, P>0.05) or individual criteria of the CPR performance checklist. The recording-assisted group demonstrated significantly shorter time interval from call receipt to the first compression (86.0±14.3 vs. 91.2±14.2 s, PCPR. Future studies are warranted to further assess feasibility of using instructional audio aid as a potential alternative to dispatcher assistance.
Current-Driven Switch-Mode Audio Power Amplifiers

DEFF Research Database (Denmark)

Knott, Arnold; Buhl, Niels Christian; Andersen, Michael A. E.

2012-01-01

The conversion of electrical energy into sound waves by electromechanical transducers is proportional to the current through the coil of the transducer. However virtually all audio power amplifiers provide a controlled voltage through the interface to the transducer. This paper is presenting...... a switch-mode audio power amplifier not only providing controlled current but also being supplied by current. This results in an output filter size reduction by a factor of 6. The implemented prototype shows decent audio performance with THD + N below 0.1 %....
Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Directory of Open Access Journals (Sweden)

Yue Zhao

2012-12-01

Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.
High-Order Sparse Linear Predictors for Audio Processing

DEFF Research Database (Denmark)

Giacobello, Daniele; van Waterschoot, Toon; Christensen, Mads Græsbøll

2010-01-01

Linear prediction has generally failed to make a breakthrough in audio processing, as it has done in speech processing. This is mostly due to its poor modeling performance, since an audio signal is usually an ensemble of different sources. Nevertheless, linear prediction comes with a whole set...... of interesting features that make the idea of using it in audio processing not far fetched, e.g., the strong ability of modeling the spectral peaks that play a dominant role in perception. In this paper, we provide some preliminary conjectures and experiments on the use of high-order sparse linear predictors...... in audio processing. These predictors, successfully implemented in modeling the short-term and long-term redundancies present in speech signals, will be used to model tonal audio signals, both monophonic and polyphonic. We will show how the sparse predictors are able to model efﬁciently the different...
Audio Mining with emphasis on Music Genre Classification

DEFF Research Database (Denmark)

Meng, Anders

2004-01-01

Audio is an important part of our daily life, basically it increases our impression of the world around us whether this is communication, music, danger detection etc. Currently the field of Audio Mining, which here includes areas of music genre, music recognition / retrieval, playlist generation...... the world the problem of detecting environments from the input audio is researched as to increase the life quality of hearing-impaired. Basically there is a lot of work within the field of audio mining. The presentation will mainly focus on music genre classification where we have a fixed amount of genres...... to choose from. Basically every audio mining system is more or less consisting of the same stages as for the music genre setting. My research so far has mainly focussed on finding relevant features for music genre classification living at different timescales using early and late information fusion. It has...
Coupled auralization and virtual video for immersive multimedia displays

Science.gov (United States)

Henderson, Paul D.; Torres, Rendell R.; Shimizu, Yasushi; Radke, Richard; Lonsway, Brian

2003-04-01

The implementation of maximally-immersive interactive multimedia in exhibit spaces requires not only the presentation of realistic visual imagery but also the creation of a perceptually accurate aural experience. While conventional implementations treat audio and video problems as essentially independent, this research seeks to couple the visual sensory information with dynamic auralization in order to enhance perceptual accuracy. An implemented system has been developed for integrating accurate auralizations with virtual video techniques for both interactive presentation and multi-way communication. The current system utilizes a multi-channel loudspeaker array and real-time signal processing techniques for synthesizing the direct sound, early reflections, and reverberant field excited by a moving sound source whose path may be interactively defined in real-time or derived from coupled video tracking data. In this implementation, any virtual acoustic environment may be synthesized and presented in a perceptually-accurate fashion to many participants over a large listening and viewing area. Subject tests support the hypothesis that the cross-modal coupling of aural and visual displays significantly affects perceptual localization accuracy.
Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

NARCIS (Netherlands)

Pocta, P.; Beerends, J.G.

2015-01-01

This paper investigates the impact of different audio codecs typically deployed in current digital audio broadcasting (DAB) systems and web-casting applications, which represent a main source of quality impairment in these systems and applications, on the quality perceived by the end user. Both
Investigating the impact of audio instruction and audio-visual biofeedback for lung cancer radiation therapy

Science.gov (United States)

George, Rohini

Lung cancer accounts for 13% of all cancers in the Unites States and is the leading cause of deaths among both men and women. The five-year survival for lung cancer patients is approximately 15%.(ACS facts & figures) Respiratory motion decreases accuracy of thoracic radiotherapy during imaging and delivery. To account for respiration, generally margins are added during radiation treatment planning, which may cause a substantial dose delivery to normal tissues and increase the normal tissue toxicity. To alleviate the above-mentioned effects of respiratory motion, several motion management techniques are available which can reduce the doses to normal tissues, thereby reducing treatment toxicity and allowing dose escalation to the tumor. This may increase the survival probability of patients who have lung cancer and are receiving radiation therapy. However the accuracy of these motion management techniques are inhibited by respiration irregularity. The rationale of this thesis was to study the improvement in regularity of respiratory motion by breathing coaching for lung cancer patients using audio instructions and audio-visual biofeedback. A total of 331 patient respiratory motion traces, each four minutes in length, were collected from 24 lung cancer patients enrolled in an IRB-approved breathing-training protocol. It was determined that audio-visual biofeedback significantly improved the regularity of respiratory motion compared to free breathing and audio instruction, thus improving the accuracy of respiratory gated radiotherapy. It was also observed that duty cycles below 30% showed insignificant reduction in residual motion while above 50% there was a sharp increase in residual motion. The reproducibility of exhale based gating was higher than that of inhale base gating. Modeling the respiratory cycles it was found that cosine and cosine 4 models had the best correlation with individual respiratory cycles. The overall respiratory motion probability distribution
Students' Learning Experiences from Didactic Teaching Sessions Including Patient Case Examples as Either Text or Video

DEFF Research Database (Denmark)

Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte

2017-01-01

OBJECTIVES: The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students......' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. METHODS: The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed...
Content-based video indexing and searching with wavelet transformation

Science.gov (United States)

Stumpf, Florian; Al-Jawad, Naseer; Du, Hongbo; Jassim, Sabah

2006-05-01

Biometric databases form an essential tool in the fight against international terrorism, organised crime and fraud. Various government and law enforcement agencies have their own biometric databases consisting of combination of fingerprints, Iris codes, face images/videos and speech records for an increasing number of persons. In many cases personal data linked to biometric records are incomplete and/or inaccurate. Besides, biometric data in different databases for the same individual may be recorded with different personal details. Following the recent terrorist atrocities, law enforcing agencies collaborate more than before and have greater reliance on database sharing. In such an environment, reliable biometric-based identification must not only determine who you are but also who else you are. In this paper we propose a compact content-based video signature and indexing scheme that can facilitate retrieval of multiple records in face biometric databases that belong to the same person even if their associated personal data are inconsistent. We shall assess the performance of our system using a benchmark audio visual face biometric database that has multiple videos for each subject but with different identity claims. We shall demonstrate that retrieval of relatively small number of videos that are nearest, in terms of the proposed index, to any video in the database results in significant proportion of that individual biometric data.
Transmisión de audio usando redes Zigbee

Directory of Open Access Journals (Sweden)

David Delgado León

2011-03-01

Full Text Available Normal 0 21 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} Normal 0 21 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} Zigbee es un protocolo de comunicaciones basado en el estándar para redes inalámbricas IEEE_802.15.4. Concebido para el control y la monitorización de redes de sensores tanto en entornos industriales, médicos, como domóticos, ha existido un creciente interés por evaluarlo en aplicaciones de multimedia. Aun sin garantizar QoS (Quality of service por su limitado ancho de banda existen un conjunto de aplicaciones para vigilancia, grupos de rescate y salvamento, seguridad en entornos domóticos, grupos desplegados en un área limitada con necesidad de comunicación donde un sistema de audio y video en tiempo real de bajo costo basado en tecnología Zigbee es una idea sumamente atractiva. Se presenta el diseño de un sistema que permita la comunicación de un grupo de usuarios desplegadas en un área limitada. Utiliza Microcontroladores RISC y tecnología Zigbee. Se investiga la factibilidad de usar la tecnología Zigbee para la transmisión de audio, se analizan
Anatomy of a Cancer Treatment Scam

Medline Plus

Full Text Available ... Video Featured Videos FTC Events For Consumers For Business En Español Social Media FTC Social Media Chats Tweeting FTC Events Blogs ... Actions Media Resources Events Calendar Speeches Audio/Video Social ... Consumers Business Center Competition Guidance I Would Like To... Submit ...
No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

Directory of Open Access Journals (Sweden)

Jean-Luc Schwartz

2014-07-01

Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
A Comparison of Video Modeling, Text-Based Instruction, and No Instruction for Creating Multiple Baseline Graphs in Microsoft Excel

Science.gov (United States)

Tyner, Bryan C.; Fienup, Daniel M.

2015-01-01

Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance.…
“Wrapping” X3DOM around Web Audio API

Directory of Open Access Journals (Sweden)

Andreas Stamoulias

2015-12-01

Full Text Available Spatial sound has a conceptual role in the Web3D environments, due to highly realism scenes that can provide. Lately the efforts are concentrated on the extension of the X3D/ X3DOM through spatial sound attributes. This paper presents a novel method for the introduction of spatial sound components in the X3DOM framework, based on X3D specification and Web Audio API. The proposed method incorporates the introduction of enhanced sound nodes for X3DOM which are derived by the implementation of the X3D standard components, enriched with accessional features of Web Audio API. Moreover, several examples-scenarios developed for the evaluation of our approach. The implemented examples established the achievability of new registered nodes in X3DOM, for spatial sound characteristics in Web3D virtual worlds.
Horatio Audio-Describes Shakespeare's "Hamlet": Blind and Low-Vision Theatre-Goers Evaluate an Unconventional Audio Description Strategy

Science.gov (United States)

Udo, J. P.; Acevedo, B.; Fels, D. I.

2010-01-01

Audio description (AD) has been introduced as one solution for providing people who are blind or have low vision with access to live theatre, film and television content. However, there is little research to inform the process, user preferences and presentation style. We present a study of a single live audio-described performance of Hart House…
Audio localization for mobile robots

OpenAIRE

de Guillebon, Thibaut; Grau Saldes, Antoni; Bolea Monte, Yolanda

2009-01-01

The department of the University for which I worked is developing a project based on the interaction with robots in the environment. My work was to define an audio system for the robot. This audio system that I have to realize consists on a mobile head which is able to follow the sound in its environment. This subject was treated as a research problem, with the liberty to find and develop different solutions and make them evolve in the chosen way.
Video coding standards AVS China, H.264/MPEG-4 PART 10, HEVC, VP6, DIRAC and VC-1

CERN Document Server

Rao, K R; Hwang, Jae Jeong

2014-01-01

Review by Ashraf A. Kassim, Professor, Department of Electrical & Computer Engineering, and Associate Dean, School of Engineering, National University of Singapore. The book consists of eight chapters of which the first two provide an overview of various video & image coding standards, and video formats. The next four chapters present in detail the Audio & video standard (AVS) of China, the H.264/MPEG-4 Advanced video coding (AVC) standard, High efficiency video coding (HEVC) standard and the VP6 video coding standard (now VP10) respectively. The performance of the wavelet based Dirac video codec is compared with H.264/MPEG-4 AVC in chapter 7. Finally in chapter 8, the VC-1 video coding standard is presented together with VC-2 which is based on the intra frame coding of Dirac and an outline of a H.264/AVC to VC-1 transcoder. The authors also present and discuss relevant research literature such as those which document improved methods & techniques, and also point to other related reso...

The effect of combined sensory and semantic components on audio-visual speech perception in older adults

Directory of Open Access Journals (Sweden)

Corrina eMaguinness

2011-12-01

Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.
Robust and Reversible Audio Watermarking by Modifying Statistical Features in Time Domain

Directory of Open Access Journals (Sweden)

Shijun Xiang

2017-01-01

Full Text Available Robust and reversible watermarking is a potential technique in many sensitive applications, such as lossless audio or medical image systems. This paper presents a novel robust reversible audio watermarking method by modifying the statistic features in time domain in the way that the histogram of these statistical values is shifted for data hiding. Firstly, the original audio is divided into nonoverlapped equal-sized frames. In each frame, the use of three samples as a group generates a prediction error and a statistical feature value is calculated as the sum of all the prediction errors in the frame. The watermark bits are embedded into the frames by shifting the histogram of the statistical features. The watermark is reversible and robust to common signal processing operations. Experimental results have shown that the proposed method not only is reversible but also achieves satisfactory robustness to MP3 compression of 64 kbps and additive Gaussian noise of 35 dB.
acceleration observed in an audio air gas discharge

International Nuclear Information System (INIS)

Ragheb, M.S.

2010-01-01

an audio air gas discharge enclosed in a pyrex glass of 34 mm diameter and 25 cm long , lead to trace the occurrence of an unusual phenomenon. injected relative huge light spots of intense brightness, distributed regularly on the contour and in the center of one of the discharge electrodes, are observed. very high heat is pronounced on both electrodes, while, one of them is higher than the other it attains 660 degree C in 3-4 minutes. series of photographs and registered video films define and clarify the sequence of events that describe the observed phenomenon. the plasma is created by applying an audio power through the electrodes of an air gas discharge of 10 khz and up to 500 watts power supply. the discharge voltage is up to 900 volts: the discharge current flowing through the plasma attains 360 mA. it is found that the discharge system must attain its optimal working conditions in order to produce the amazing phenomena. the obtained plasma is classified as the maximum conditions borders of a γ-discharge type. at these conditions, the corresponding maximum electron temperature and density are 16 eV and 10 15 cm -3 respectively . the observation system succeeded to reveal and to clarify the sequence of the phenomenon events. in addition, by means of the scanning electron microscope and the energy dispersive x- ray systems, the effects on the electrodes surface are investigated and analyzed. the optical observations, in conjunction with the micrograph and surface microanalysis,demonstrate the collision occurrence, of powered agglomerations groups, to the electrode surface. detailed interpretation of that phenomenon suggests a molecular acceleration gaining their energy from the formed plasma due to optimal discharge working conditions. as a consequence, due to the ions agglomerates size this procedure could be considered as a mesoscopic acceleration technique.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

Directory of Open Access Journals (Sweden)

Kirsten E Smayda

Full Text Available Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35 and thirty-three older adults (ages 60-90 to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger
Musical examination to bridge audio data and sheet music

Science.gov (United States)

Pan, Xunyu; Cross, Timothy J.; Xiao, Liangliang; Hei, Xiali

2015-03-01

The digitalization of audio is commonly implemented for the purpose of convenient storage and transmission of music and songs in today's digital age. Analyzing digital audio for an insightful look at a specific musical characteristic, however, can be quite challenging for various types of applications. Many existing musical analysis techniques can examine a particular piece of audio data. For example, the frequency of digital sound can be easily read and identified at a specific section in an audio file. Based on this information, we could determine the musical note being played at that instant, but what if you want to see a list of all the notes played in a song? While most existing methods help to provide information about a single piece of the audio data at a time, few of them can analyze the available audio file on a larger scale. The research conducted in this work considers how to further utilize the examination of audio data by storing more information from the original audio file. In practice, we develop a novel musical analysis system Musicians Aid to process musical representation and examination of audio data. Musicians Aid solves the previous problem by storing and analyzing the audio information as it reads it rather than tossing it aside. The system can provide professional musicians with an insightful look at the music they created and advance their understanding of their work. Amateur musicians could also benefit from using it solely for the purpose of obtaining feedback about a song they were attempting to play. By comparing our system's interpretation of traditional sheet music with their own playing, a musician could ensure what they played was correct. More specifically, the system could show them exactly where they went wrong and how to adjust their mistakes. In addition, the application could be extended over the Internet to allow users to play music with one another and then review the audio data they produced. This would be particularly
A listening test system for automotive audio

DEFF Research Database (Denmark)

Christensen, Flemming; Geoff, Martin; Minnaar, Pauli

2005-01-01

This paper describes a system for simulating automotive audio through headphones for the purposes of conducting listening experiments in the laboratory. The system is based on binaural technology and consists of a component for reproducing the sound of the audio system itself and a component...
Voice activity detection using audio-visual information

DEFF Research Database (Denmark)

Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

2009-01-01

An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post...
Parametric Packet-Layer Model for Evaluation Audio Quality in Multimedia Streaming Services

Science.gov (United States)

Egi, Noritsugu; Hayashi, Takanori; Takahashi, Akira

We propose a parametric packet-layer model for monitoring audio quality in multimedia streaming services such as Internet protocol television (IPTV). This model estimates audio quality of experience (QoE) on the basis of quality degradation due to coding and packet loss of an audio sequence. The input parameters of this model are audio bit rate, sampling rate, frame length, packet-loss frequency, and average burst length. Audio bit rate, packet-loss frequency, and average burst length are calculated from header information in received IP packets. For sampling rate, frame length, and audio codec type, the values or the names used in monitored services are input into this model directly. We performed a subjective listening test to examine the relationships between these input parameters and perceived audio quality. The codec used in this test was the Advanced Audio Codec-Low Complexity (AAC-LC), which is one of the international standards for audio coding. On the basis of the test results, we developed an audio quality evaluation model. The verification results indicate that audio quality estimated by the proposed model has a high correlation with perceived audio quality.
Audio power amplifier design handbook

CERN Document Server

Self, Douglas

2013-01-01

This book is essential for audio power amplifier designers and engineers for one simple reason...it enables you as a professional to develop reliable, high-performance circuits. The Author Douglas Self covers the major issues of distortion and linearity, power supplies, overload, DC-protection and reactive loading. He also tackles unusual forms of compensation and distortion produced by capacitors and fuses. This completely updated fifth edition includes four NEW chapters including one on The XD Principle, invented by the author, and used by Cambridge Audio. Cro
Audio stream classification for multimedia database search

Science.gov (United States)

Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

2013-03-01

Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.
Identification of Sparse Audio Tampering Using Distributed Source Coding and Compressive Sensing Techniques

Directory of Open Access Journals (Sweden)

Valenzise G

2009-01-01

Full Text Available In the past few years, a large amount of techniques have been proposed to identify whether a multimedia content has been illegally tampered or not. Nevertheless, very few efforts have been devoted to identifying which kind of attack has been carried out, especially due to the large data required for this task. We propose a novel hashing scheme which exploits the paradigms of compressive sensing and distributed source coding to generate a compact hash signature, and we apply it to the case of audio content protection. The audio content provider produces a small hash signature by computing a limited number of random projections of a perceptual, time-frequency representation of the original audio stream; the audio hash is given by the syndrome bits of an LDPC code applied to the projections. At the content user side, the hash is decoded using distributed source coding tools. If the tampering is sparsifiable or compressible in some orthonormal basis or redundant dictionary, it is possible to identify the time-frequency position of the attack, with a hash size as small as 200 bits/second; the bit saving obtained by introducing distributed source coding ranges between 20% to 70%.
Use of a verbal electronic audio reminder with a patient hand hygiene bundle to increase independent patient hand hygiene practices of older adults in an acute care setting.

Science.gov (United States)

Knighton, Shanina C; Dolansky, Mary; Donskey, Curtis; Warner, Camille; Rai, Herleen; Higgins, Patricia A

2018-03-01

We hypothesized that the addition of a novel verbal electronic audio reminder to an educational patient hand hygiene bundle would increase performance of self-managed patient hand hygiene. We conducted a 2-group comparative effectiveness study randomly assigning participants to patient hand hygiene bundle 1 (n = 41), which included a video, a handout, and a personalized verbal electronic audio reminder (EAR) that prompted hand cleansing at 3 meal times, or patient hand hygiene bundle 2 (n = 34), which included the identical video and handout, but not the EAR. The primary outcome was alcohol-based hand sanitizer use based on weighing bottles of hand sanitizer. Participants that received the EAR averaged significantly more use of hand sanitizer product over the 3 days of the study (mean ± SD, 29.97 ± 17.13 g) than participants with no EAR (mean ± SD, 10.88 ± 9.27 g; t 73 = 5.822; P ≤ .001). The addition of a novel verbal EAR to a patient hand hygiene bundle resulted in a significant increase in patient hand hygiene performance. Our results suggest that simple audio technology can be used to improve patient self-management of hand hygiene. Future research is needed to determine if the technology can be used to promote other healthy behaviors, reduce infections, and improve patient-centered care without increasing the workload of health care workers. Published by Elsevier Inc.
MP3 audio-editing software for the department of radiology

International Nuclear Information System (INIS)

Hong Qingfen; Sun Canhui; Li Ziping; Meng Quanfei; Jiang Li

2006-01-01

Objective: To evaluate the MP3 audio-editing software in the daily work in the department of radiology. Methods: The audio content of daily consultation seminar, held in the department of radiology every morning, was recorded and converted into MP3 audio format by a computer integrated recording device. The audio data were edited, archived, and eventually saved in the computer memory storage media, which was experimentally replayed and applied in the research or teaching. Results: MP3 audio-editing was a simple process and convenient for saving and searching the data. The record could be easily replayed. Conclusion: MP3 audio-editing perfectly records and saves the contents of consultation seminar, and has replaced the conventional hand writing notes. It is a valuable tool in both research and teaching in the department. (authors)
Anatomy of a Cancer Treatment Scam

Medline Plus

Full Text Available ... Calendar Weekly Calendar Archive Speeches Audio/Video Featured Videos FTC Events For Consumers For Business En Español Social Media FTC Social Media Chats Tweeting FTC Events Blogs Competition Matters Tech@FTC ...
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Neuromorphic Audio-Visual Sensor Fusion on a Sound-Localising Robot

Directory of Open Access Journals (Sweden)

Vincent Yue-Sek Chan

2012-02-01

Full Text Available This paper presents the first robotic system featuring audio-visual sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localisation through self-motion and visual feedback, using an adaptive ITD-based sound localisation algorithm. After training, the robot can localise sound sources (white or pink noise in a reverberant environment with an RMS error of 4 to 5 degrees in azimuth. In the second part of the paper, we investigate the source binding problem. An experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. The results show that this technique can be quite effective, despite its simplicity.
Learners' Use of Communication Strategies in Text-Based and Video-Based Synchronous Computer-Mediated Communication Environments: Opportunities for Language Learning

Science.gov (United States)

Hung, Yu-Wan; Higgins, Steve

2016-01-01

This study investigates the different learning opportunities enabled by text-based and video-based synchronous computer-mediated communication (SCMC) from an interactionist perspective. Six Chinese-speaking learners of English and six English-speaking learners of Chinese were paired up as tandem (reciprocal) learning dyads. Each dyad participated…
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... search for current job openings visit HHS USAJobs Home » NEI YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Tune in the Net with RealAudio.

Science.gov (United States)

Buchanan, Larry

1997-01-01

Describes how to connect to the RealAudio Web site to download a player that provides sound from Web pages to the computer through streaming technology. Explains hardware and software requirements and provides addresses for other RealAudio Web sites are provided, including weather information and current news. (LRW)
Semantic Labeling of Nonspeech Audio Clips

Directory of Open Access Journals (Sweden)

Xiaojuan Ma

2010-01-01

Full Text Available Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the communicative power of auditory nonlinguistic representations. We created a collection of short nonlinguistic auditory clips encoding familiar human activities, objects, animals, natural phenomena, machinery, and social scenes. We presented these sounds to a broad spectrum of anonymous human workers using Amazon Mechanical Turk and collected verbal sound labels. We analyzed the human labels in terms of their lexical and semantic properties to ascertain that the audio clips do evoke the information suggested by their pre-defined captions. We then measured the agreement with the semantically compatible labels for each sound clip. Finally, we examined which kinds of entities and events, when captured by nonlinguistic acoustic clips, appear to be well-suited to elicit information for communication, and which ones are less discriminable. Our work is set against the broader goal of creating resources that facilitate communication for people with some types of language loss. Furthermore, our data should prove useful for future research in machine analysis/synthesis of audio, such as computational auditory scene analysis, and annotating/querying large collections of sound effects.

Anatomy of a Cancer Treatment Scam

Medline Plus

Full Text Available ... Act FOIA Reading Rooms FOIA Request Fee Regulations Freedom of Information Act Contacts Frequently Asked Questions (FAQ) ... Calendar All Events Weekly Calendar Weekly Calendar Archive Speeches Audio/Video Featured Videos FTC Events For Consumers ...
Encryption for confidentiality of the network and influence of this to the quality of streaming video through network

Science.gov (United States)

Sevcik, L.; Uhrin, D.; Frnda, J.; Voznak, M.; Toral-Cruz, Homer; Mikulec, M.; Jakovlev, Sergej

2015-05-01

Nowadays, the interest in real-time services, like audio and video, is growing. These services are mostly transmitted over packet networks, which are based on IP protocol. It leads to analyses of these services and their behavior in such networks which are becoming more frequent. Video has become the significant part of all data traffic sent via IP networks. In general, a video service is one-way service (except e.g. video calls) and network delay is not such an important factor as in a voice service. Dominant network factors that influence the final video quality are especially packet loss, delay variation and the capacity of the transmission links. Analysis of video quality concentrates on the resistance of video codecs to packet loss in the network, which causes artefacts in the video. IPsec provides confidentiality in terms of safety, integrity and non-repudiation (using HMAC-SHA1 and 3DES encryption for confidentiality and AES in CBC mode) with an authentication header and ESP (Encapsulating Security Payload). The paper brings a detailed view of the performance of video streaming over an IP-based network. We compared quality of video with packet loss and encryption as well. The measured results demonstrated the relation between the video codec type and bitrate to the final video quality.
Measuring 3D Audio Localization Performance and Speech Quality of Conferencing Calls for a Multiparty Communication System

Directory of Open Access Journals (Sweden)

Mansoor Hyder

2013-07-01

Full Text Available Communication systems which support 3D (Three Dimensional audio offer a couple of advantages to the users/customers. Firstly, within the virtual acoustic environments all participants could easily be recognized through their placement/sitting positions. Secondly, all participants can turn their focus on any particular talker when multiple participants start talking at the same time by taking advantage of the natural listening tendency which is called the Cocktail Party Effect. On the other hand, 3D audio is known as a decreasing factor for overall speech quality because of the commencement of reverberations and echoes within the listening environment. In this article, we study the tradeoff between speech quality and human natural ability of localizing audio events/or talkers within our three dimensional audio supported telephony and teleconferencing solution. Further, we performed subjective user studies by incorporating two different HRTFs (Head Related Transfer Functions, different placements of the teleconferencing participants and different layouts of the virtual environments. Moreover, subjective user studies results for audio event localization and subjective speech quality are presented in this article. This subjective user study would help the research community to optimize the existing 3D audio systems and to design new 3D audio supported teleconferencing solutions based on the quality of experience requirements of the users/customers for agriculture personal in particular and for all potential users in general.
Car audio using DSP for active sound control. DSP ni yoru active seigyo wo mochiita audio

Energy Technology Data Exchange (ETDEWEB)

Yamada, K.; Asano, S.; Furukawa, N. (Mitsubishi Motor Corp., Tokyo (Japan))

1993-06-01

In the automobile cabin, there are some unique problems which spoil the quality of sound reproduction from audio equipment, such as the narrow space and/or the background noise. The audio signal processing by using DSP (digital signal processor) makes enable a solution to these problems. A car audio with a high amenity has been successfully made by the active sound control using DSP. The DSP consists of an adder, coefficient multiplier, delay unit, and connections. For the actual processing by DSP, are used functions, such as sound field correction, response and processing of noises during driving, surround reproduction, graphic equalizer processing, etc. High effectiveness of the method was confirmed through the actual driving evaluation test. The present paper describes the actual method of sound control technology using DSP. Especially, the dynamic processing of the noise during driving is discussed in detail. 1 ref., 12 figs., 1 tab.
Portable audio electronics for impedance-based measurements in microfluidics

International Nuclear Information System (INIS)

Wood, Paul; Sinton, David

2010-01-01

We demonstrate the use of audio electronics-based signals to perform on-chip electrochemical measurements. Cell phones and portable music players are examples of consumer electronics that are easily operated and are ubiquitous worldwide. Audio output (play) and input (record) signals are voltage based and contain frequency and amplitude information. A cell phone, laptop soundcard and two compact audio players are compared with respect to frequency response; the laptop soundcard provides the most uniform frequency response, while the cell phone performance is found to be insufficient. The audio signals in the common portable music players and laptop soundcard operate in the range of 20 Hz to 20 kHz and are found to be applicable, as voltage input and output signals, to impedance-based electrochemical measurements in microfluidic systems. Validated impedance-based measurements of concentration (0.1–50 mM), flow rate (2–120 µL min −1 ) and particle detection (32 µm diameter) are demonstrated. The prevailing, lossless, wave audio file format is found to be suitable for data transmission to and from external sources, such as a centralized lab, and the cost of all hardware (in addition to audio devices) is ∼10 USD. The utility demonstrated here, in combination with the ubiquitous nature of portable audio electronics, presents new opportunities for impedance-based measurements in portable microfluidic systems. (technical note)
Reflections on academic video

Directory of Open Access Journals (Sweden)

Thommy Eriksson

2012-11-01

Full Text Available As academics we study, research and teach audiovisual media, yet rarely disseminate and mediate through it. Today, developments in production technologies have enabled academic researchers to create videos and mediate audiovisually. In academia it is taken for granted that everyone can write a text. Is it now time to assume that everyone can make a video essay? Using the online journal of academic videos Audiovisual Thinking and the videos published in it as a case study, this article seeks to reflect on the emergence and legacy of academic audiovisual dissemination. Anchoring academic video and audiovisual dissemination of knowledge in two critical traditions, documentary theory and semiotics, we will argue that academic video is in fact already present in a variety of academic disciplines, and that academic audiovisual essays are bringing trends and developments that have long been part of academic discourse to their logical conclusion.
A comparison of video modeling, text-based instruction, and no instruction for creating multiple baseline graphs in Microsoft Excel.

Science.gov (United States)

Tyner, Bryan C; Fienup, Daniel M

2015-09-01

Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance. Participants who used VM constructed graphs significantly faster and with fewer errors than those who used text-based instruction or no instruction. Implications for instruction are discussed. © Society for the Experimental Analysis of Behavior.
The use of animation video in teaching to enhance the imagination and visualization of student in engineering drawing

Science.gov (United States)

Ismail M., E.; Mahazir I., Irwan; Othman, H.; Amiruddin M., H.; Ariffin, A.

2017-05-01

The rapid development of information technology today has given a new breath toward usage of computer in education. One of the increasingly popular nowadays is a multimedia technology that merges a variety of media such as text, graphics, animation, video and audio controlled by a computer. With this technology, a wide range of multimedia element can be developed to improve the quality of education. For that reason, this study aims to investigate the use of multimedia element based on animated video that was developed for Engineering Drawing subject according to the syllabus of Vocational College of Malaysia. The design for this study was a survey method using a quantitative approach and involved 30 respondents from Industrial Machining students. The instruments used in study is questionnaire with correlation coefficient value (0.83), calculated on Alpha-Cronbach. Data was collected and analyzed descriptive analyzed using SPSS. The study found that multimedia element for animation video was use significant have capable to increase imagination and visualization of student. The implications of this study provide information of use of multimedia element will student effect imagination and visualization. In general, these findings contribute to the formation of multimedia element of materials appropriate to enhance the quality of learning material for engineering drawing.
Challenges in Transcribing Multimodal Data: A Case Study

Science.gov (United States)

Helm, Francesca; Dooly, Melinda

2017-01-01

Computer-mediated communication (CMC) once meant principally text-based communication mediated by computers, but rapid technological advances in recent years have heralded an era of multimodal communication with a growing emphasis on audio and video synchronous interaction. As CMC, in all its variants (text chats, video chats, forums, blogs, SMS,…
CREATING AUDIO VISUAL DIALOGUE TASK AS STUDENTS’ SELF ASSESSMENT TO ENHANCE THEIR SPEAKING ABILITY

Directory of Open Access Journals (Sweden)

Novia Trisanti

2017-04-01

Full Text Available The study is about giving overview of employing audio visual dialogue task as students creativity task and self assessment in EFL speaking class of tertiary education to enhance the students speaking ability. The qualitative research was done in one of the speaking classes at English Department, Semarang State University, Central Java, Indonesia. The results that can be seen from the rubric of self assessment show that the oral performance through audio visual recorded tasks done by the students as their self assessment gave positive evidences. The audio visual dialogue task can be very beneficial since it can motivate the students learning and increase their learning experiences. The self-assessment can be a valuable additional means to improve their speaking ability since it is one of the motives that drive self- evaluatioan, along with self- verification and self- enhancement.
Modified DCTNet for audio signals classification

Science.gov (United States)

Xian, Yin; Pu, Yunchen; Gan, Zhe; Lu, Liang; Thompson, Andrew

2016-10-01

In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.
Desarrollo de herramientas de procesado y visualización para audio 3D con auriculares

OpenAIRE

Magro Sastre, Julio

2016-01-01

La Auralización o “realidad virtual acústica” es un término relativamente nuevo. Integra métodos de la física y la ingeniería acústica con la teoría de la Psicoacústica y de reproducción electroacústica [1]. El término Auralización es el análogo de la técnica de “visualización” en video 3D para el audio. En este Proyecto Fin de Carrera se describe el proceso de visualizar ciertas características, efectos o señales del sonido. Los sistemas estéreo convencionales son capaces de posicionar la im...
Learning Science Through Digital Video: Views on Watching and Creating Videos

Science.gov (United States)

Wade, P.; Courtney, A. R.

2013-12-01

In science, the use of digital video to document phenomena, experiments and demonstrations has rapidly increased during the last decade. The use of digital video for science education also has become common with the wide availability of video over the internet. However, as with using any technology as a teaching tool, some questions should be asked: What science is being learned from watching a YouTube clip of a volcanic eruption or an informational video on hydroelectric power generation? What are student preferences (e.g. multimedia versus traditional mode of delivery) with regard to their learning? This study describes 1) the efficacy of watching digital video in the science classroom to enhance student learning, 2) student preferences of instruction with regard to multimedia versus traditional delivery modes, and 3) the use of creating digital video as a project-based educational strategy to enhance learning. Undergraduate non-science majors were the primary focus group in this study. Students were asked to view video segments and respond to a survey focused on what they learned from the segments. Additionally, they were asked about their preference for instruction (e.g. text only, lecture-PowerPoint style delivery, or multimedia-video). A majority of students indicated that well-made video, accompanied with scientific explanations or demonstration of the phenomena was most useful and preferred over text-only or lecture instruction for learning scientific information while video-only delivery with little or no explanation was deemed not very useful in learning science concepts. The use of student generated video projects as learning vehicles for the creators and other class members as viewers also will be discussed.
Oral Computer-Mediated Interaction Between L2 Learners: It’s About Time!

Directory of Open Access Journals (Sweden)

Yanguas, Íñigo

2010-10-01

Full Text Available This study explores task-based, synchronous oral computer-mediated communication (CMC among intermediate-level learners of Spanish. In particular, this paper examines (a how learners in video and audio CMC groups negotiate for meaning during task-based interaction, (b possible differences between both oral CMC modes and traditional face-to-face (FTF communication, and (c how this oral computer mediated negotiation compares to that found in the text-based CMC literature. Fifteen learner-to-learner dyads were randomly assigned to an audio group, a video group, and a FTF control group to complete a jigsaw task that was seeded with 16 unknown lexical items. Experimental groups used Skype, free online communication software, to carry out the task. The transcripts of the conversations reveal that oral CMC groups do indeed negotiate for meaning in this multimedia context when non-understanding occurs between speakers. In addition, results showed differences in the way audio and video groups carry out these negotiations, which were mainly due to the lack of visual contact in the audio group. No differences were found between video and FTF groups. Furthermore, oral CMC turn-taking patterns were shown to be very similar to FTF patterns but opposite to those found in written synchronous CMC. Oral CMC interaction patterns are shown to be more versatile.
Efficiency Optimization in Class-D Audio Amplifiers

DEFF Research Database (Denmark)

Yamauchi, Akira; Knott, Arnold; Jørgensen, Ivan Harald Holger

2015-01-01

This paper presents a new power efficiency optimization routine for designing Class-D audio amplifiers. The proposed optimization procedure finds design parameters for the power stage and the output filter, and the optimum switching frequency such that the weighted power losses are minimized under...... the given constraints. The optimization routine is applied to minimize the power losses in a 130 W class-D audio amplifier based on consumer behavior investigations, where the amplifier operates at idle and low power levels most of the time. Experimental results demonstrate that the optimization method can...... lead to around 30 % of efficiency improvement at 1.3 W output power without significant effects on both audio performance and the efficiency at high power levels....
DOA Estimation of Audio Sources in Reverberant Environments

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Nielsen, Jesper Kjær; Heusdens, Richard

2016-01-01

Reverberation is well-known to have a detrimental impact on many localization methods for audio sources. We address this problem by imposing a model for the early reflections as well as a model for the audio source itself. Using these models, we propose two iterative localization methods...... that estimate the direction-of-arrival (DOA) of both the direct path of the audio source and the early reflections. In these methods, the contribution of the early reflections is essentially subtracted from the signal observations before localization of the direct path component, which may reduce the estimation...
Portable Audio Design

DEFF Research Database (Denmark)

Groth, Sanne Krogh

2014-01-01

attention to the specific genre; a grasping of the complex relationship between site and time, the actual and the virtual; and getting aquatint with the specific site’s soundscape by approaching it both intuitively and systematically. These steps will finally lead to an audio production that not only...
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ... *PDF files require the free Adobe® Reader® software for viewing. This website is maintained by the ...
A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration

Directory of Open Access Journals (Sweden)

Jensen Søren Holdt

2005-01-01

Full Text Available Psychoacoustical models have been used extensively within audio coding applications over the past decades. Recently, parametric coding techniques have been applied to general audio and this has created the need for a psychoacoustical model that is specifically suited for sinusoidal modelling of audio signals. In this paper, we present a new perceptual model that predicts masked thresholds for sinusoidal distortions. The model relies on signal detection theory and incorporates more recent insights about spectral and temporal integration in auditory masking. As a consequence, the model is able to predict the distortion detectability. In fact, the distortion detectability defines a (perceptually relevant norm on the underlying signal space which is beneficial for optimisation algorithms such as rate-distortion optimisation or linear predictive coding. We evaluate the merits of the model by combining it with a sinusoidal extraction method and compare the results with those obtained with the ISO MPEG-1 Layer I-II recommended model. Listening tests show a clear preference for the new model. More specifically, the model presented here leads to a reduction of more than 20% in terms of number of sinusoids needed to represent signals at a given quality level.
Pregnancy Prevention at Her Fingertips: A Text- and Mobile Video-Based Pilot Intervention to Promote Contraceptive Methods among College Women

Science.gov (United States)

Walsh-Buhi, Eric R.; Helmy, Hannah; Harsch, Kristin; Rella, Natalie; Godcharles, Cheryl; Ogunrunde, Adejoke; Lopez Castillo, Humberto

2016-01-01

Objective: This paper reports on a pilot study evaluating the feasibility and acceptability of a text- and mobile video-based intervention to educate women and men attending college about non-daily contraception, with a particular focus on long-acting reversible contraception (LARC). A secondary objective is to describe the process of intervention…

Readings on American Society. The Audio-Lingual Literary Series II.

Science.gov (United States)

Imamura, Shigeo; Ney, James W.

This text contains 11 lessons based on an adaptation of the 1964 essay "Automation: Road to Lifetime Jobs" by A.H. Raskin and 14 lessons based on an adaptation of John Fischer's 1948 essay "Unwritten Rules of American Politics." The format of the book and the lessons is the same as that of the other volumes of "The Audio-Lingual Literary Series."…
Adaptive DCTNet for Audio Signal Classification

OpenAIRE

Xian, Yin; Pu, Yunchen; Gan, Zhe; Lu, Liang; Thompson, Andrew

2016-01-01

In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to h...
Fall Detection Using Smartphone Audio Features.

Science.gov (United States)

Cheffena, Michael

2016-07-01

An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia NEI Home Contact Us A-Z Site Map NEI on Social Media Information in Spanish (Información en español) Website, ...
Audio-Tutorial Instruction: A Strategy For Teaching Introductory College Geology.

Science.gov (United States)

Fenner, Peter; Andrews, Ted F.

The rationale of audio-tutorial instruction is discussed, and the history and development of the audio-tutorial botany program at Purdue University is described. Audio-tutorial programs in geology at eleven colleges and one school are described, illustrating several ways in which programs have been developed and integrated into courses. Programs…
THE COMPARISON OF DESCRIPTIVE TEXT WRITING ABILITY USING YOU TUBE DOWNLOADED VIDEO AND SERIAL PICTURES AT THE STUDENTS’OF SMPN 2 METROACADEMIC YEAR 2012/2013

Directory of Open Access Journals (Sweden)

Eka Bayu Pramanca

2013-10-01

Full Text Available This research discusses about how two different techniques affect the students’ ability in descriptive text at SMP N 2 Metro. The objectives of this research are (1 to know the difference result of using YouTube Downloaded Video and Serial Pictures media toward students’ writing ability in descriptive text and (2 to know which one is more effective of students’ writing ability in descriptive text instruction between learning by using YouTube Downloaded Video and Serial Pictures media. The implemented method is quantitative research design in that both researchers use true experimental research design. In this research , experimental and control class pre-test and post test are conducted. It is carried out at the first grade of SMP N 2 Metro in academic year 2012/2013. The population in this research is 7 different classes with total number of 224 students. 2 classes of the total population are taken as the samples; VII.1 students in experimental class and VII.2 students in control class by using cluster random sampling technique. The instruments of the research are tests, treatment and post-test. The data analyzing procedure uses t-test and results the following output. The result of ttest is 3,96 and ttable is 2,06. It means that tcount > ttable with the criterion of ttest is Ha is accepted if tcount > ttable. So, there is any difference result of students’ writing ability using YouTube Downloaded Video and Serial Pictures Media. However; Youtube Downloaded Video media is more effective media than Serial Pictures media toward students’ writing ability. This research is consistent with the previous result of the studies and thus this technique is recommended to use in writing instruction especially in descriptive text in order that students may feel fun and enjoy during the learning process.
Listened To Any Good Books Lately? The Prosodic Analysis of Audio Book Narration

Directory of Open Access Journals (Sweden)

Smiljana Komar

2006-06-01

Full Text Available The popularity of audio books is increasing. In the USA fewer people are reading books but many more are listening to them on tapes, CD’s and in MP3 format. The phenomenon is redefining the notion of reading. The purpose of the paper is to present some pros and cons of listening to books instead of reading them. The conclusions have been reached on the basis of a linguistic analysis of parts of two audio books belonging to two different literary genres: a crime novel (Dan Brown, The Da Vinci Code and a comic one (Helen Fielding, Bridget Jones: The Edge of Reason.
Illustrating Geology With Customized Video in Introductory Geoscience Courses

Science.gov (United States)

Magloughlin, J. F.

2008-12-01

For the past several years, I have been creating short videos for use in large-enrollment introductory physical geology classes. The motivation for this project included 1) lack of appropriate depth in existing videos, 2) engagement of non-science students, 3) student indifference to traditional textbooks, 4) a desire to share the visual splendor of geology through virtual field trips, and 5) a desire to meld photography, animation, narration, and videography in self-contained experiences. These (HD) videos are information-intensive but short, allowing a focus on relatively narrow topics from numerous subdisciplines, incorporation into lectures to help create variety while minimally interrupting flow and holding students' attention, and manageable file sizes. Nearly all involve one or more field locations, including sites throughout the western and central continental U.S., as well as Hawaii, Italy, New Zealand, and Scotland. The limited scope of the project and motivations mentioned preclude a comprehensive treatment of geology. Instead, videos address geologic processes, locations, features, and interactions with humans. The videos have been made available via DVD and on-line streaming. Such a project requires an array of video and audio equipment and software, a broad knowledge of geology, very good computing power, adequate time, creativity, a substantial travel budget, liability insurance, elucidation of the separation (or non-separation) between such a project and other responsibilities, and, preferably but not essentially, the support of one's supervisor or academic unit. Involving students in such projects entails risks, but involving necessary technical expertise is virtually unavoidable. In my own courses, some videos are used in class and/or made available on-line as simply another aspect of the educational experience. Student response has been overwhelmingly positive, particularly when expectations of students regarding the content of the videos is made
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing Chronic Pain and Depression ...
Multimodal Feature Learning for Video Captioning

Directory of Open Access Journals (Sweden)

Sujin Lee

2018-01-01

Full Text Available Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD and Microsoft Research Video-to-Text (MSR-VTT, demonstrate the performance of the proposed model.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... questions Clinical Studies Publications Catalog Photos and Images Spanish Language Information Grants and Funding Extramural Research Division ... Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video ...
Extracting meaning from audio signals - a machine learning approach

DEFF Research Database (Denmark)

Larsen, Jan

2007-01-01

* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...
Improvements of ModalMax High-Fidelity Piezoelectric Audio Device

Science.gov (United States)

Woodard, Stanley E.

2005-01-01

ModalMax audio speakers have been enhanced by innovative means of tailoring the vibration response of thin piezoelectric plates to produce a high-fidelity audio response. The ModalMax audio speakers are 1 mm in thickness. The device completely supplants the need to have a separate driver and speaker cone. ModalMax speakers can perform the same applications of cone speakers, but unlike cone speakers, ModalMax speakers can function in harsh environments such as high humidity or extreme wetness. New design features allow the speakers to be completely submersed in salt water, making them well suited for maritime applications. The sound produced from the ModalMax audio speakers has sound spatial resolution that is readily discernable for headset users.
Advanced video coding systems

CERN Document Server

Gao, Wen

2015-01-01

This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV
Efficiency in audio processing : filter banks and transcoding

NARCIS (Netherlands)

Lee, Jun Wei

2007-01-01

Audio transcoding is the conversion of digital audio from one compressed form A to another compressed form B, where A and B have different compression properties, such as a different bit-rate, sampling frequency or compression method. This is typically achieved by decoding A to an intermediate
Decision-level fusion for audio-visual laughter detection

NARCIS (Netherlands)

Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

2008-01-01

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is
Decision-Level Fusion for Audio-Visual Laughter Detection

NARCIS (Netherlands)

Reuderink, B.; Poel, Mannes; Truong, Khiet Phuong; Poppe, Ronald Walter; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is
Content Discovery from Composite Audio : An unsupervised approach

NARCIS (Netherlands)

Lu, L.

2009-01-01

In this thesis, we developed and assessed a novel robust and unsupervised framework for semantic inference from composite audio signals. We focused on the problem of detecting audio scenes and grouping them into meaningful clusters. Our approach addressed all major steps in a general process of
Audio Visual Media Components in Educational Game for Elementary Students

Directory of Open Access Journals (Sweden)

Meilani Hartono

2016-12-01

Full Text Available The purpose of this research was to review and implement interactive audio visual media used in an educational game to improve elementary students’ interest in learning mathematics. The game was developed for desktop platform. The art of the game was set as 2D cartoon art with animation and audio in order to make students more interest. There were four mini games developed based on the researches on mathematics study. Development method used was Multimedia Development Life Cycle (MDLC that consists of requirement, design, development, testing, and implementation phase. Data collection methods used are questionnaire, literature study, and interview. The conclusion is elementary students interest with educational game that has fun and active (moving objects, with fast tempo of music, and carefree color like blue. This educational game is hoped to be an alternative teaching tool combined with conventional teaching method.
Design and develop a video conferencing framework for real-time telemedicine applications using secure group-based communication architecture.

Science.gov (United States)

Mat Kiah, M L; Al-Bakri, S H; Zaidan, A A; Zaidan, B B; Hussain, Muzammil

2014-10-01

One of the applications of modern technology in telemedicine is video conferencing. An alternative to traveling to attend a conference or meeting, video conferencing is becoming increasingly popular among hospitals. By using this technology, doctors can help patients who are unable to physically visit hospitals. Video conferencing particularly benefits patients from rural areas, where good doctors are not always available. Telemedicine has proven to be a blessing to patients who have no access to the best treatment. A telemedicine system consists of customized hardware and software at two locations, namely, at the patient's and the doctor's end. In such cases, the video streams of the conferencing parties may contain highly sensitive information. Thus, real-time data security is one of the most important requirements when designing video conferencing systems. This study proposes a secure framework for video conferencing systems and a complete management solution for secure video conferencing groups. Java Media Framework Application Programming Interface classes are used to design and test the proposed secure framework. Real-time Transport Protocol over User Datagram Protocol is used to transmit the encrypted audio and video streams, and RSA and AES algorithms are used to provide the required security services. Results show that the encryption algorithm insignificantly increases the video conferencing computation time.

Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... Your Arthritis Managing Chronic Pain and Depression in Arthritis Nutrition & Rheumatoid Arthritis Arthritis and Health-related Quality of Life ...
The persuasive aims of Metal Gear Solid: A discourse theoretical approach to the study of argumentation in video games

DEFF Research Database (Denmark)

Stamenkovic, Dusan; Jacevic, Milan; Wildfeuer, Janina

2017-01-01

in a representative video game combine so as to convince the player to act, play, and perhaps think accordingly. The multimodal approach employed in the paper combines the notion of discourse aims and the rhetorical and argumentative structure of Metal Gear Solid (Kojima, 1998), and analyses different narrative...... strategies and verbal cues, as well as the overall interface, control, and gameplay. The results suggest that verbal and textual cues combine with audio-visual elements and highly specific gameplay strategies in order to refrain the player from killing enemies. This might indicate that video games are likely...
Extraction of Information of Audio-Visual Contents

Directory of Open Access Journals (Sweden)

Carlos Aguilar

2011-10-01

Full Text Available In this article we show how it is possible to use Channel Theory (Barwise and Seligman, 1997 for modeling the process of information extraction realized by audiences of audio-visual contents. To do this, we rely on the concepts pro- posed by Channel Theory and, especially, its treatment of representational systems. We then show how the information that an agent is capable of extracting from the content depends on the number of channels he is able to establish between the content and the set of classifications he is able to discriminate. The agent can endeavor the extraction of information through these channels from the totality of content; however, we discuss the advantages of extracting from its constituents in order to obtain a greater number of informational items that represent it. After showing how the extraction process is endeavored for each channel, we propose a method of representation of all the informative values an agent can obtain from a content using a matrix constituted by the channels the agent is able to establish on the content (source classifications, and the ones he can understand as individual (destination classifications. We finally show how this representation allows reflecting the evolution of the informative items through the evolution of audio-visual content.
Sounding ruins: reflections on the production of an ‘audio drift’

Science.gov (United States)

Gallagher, Michael

2014-01-01

This article is about the use of audio media in researching places, which I term ‘audio geography’. The article narrates some episodes from the production of an ‘audio drift’, an experimental environmental sound work designed to be listened to on a portable MP3 player whilst walking in a ruinous landscape. Reflecting on how this work functions, I argue that, as well as representing places, audio geography can shape listeners’ attention and bodily movements, thereby reworking places, albeit temporarily. I suggest that audio geography is particularly apt for amplifying the haunted and uncanny qualities of places. I discuss some of the issues raised for research ethics, epistemology and spectral geographies. PMID:29708107
Interactive 3D audio: Enhancing awareness of details in immersive soundscapes?

DEFF Research Database (Denmark)

Schmidt, Mikkel Nørgaard; Schwartz, Stephen; Larsen, Jan

2012-01-01

Spatial audio and the possibility of interacting with the audio environment is thought to increase listeners' attention to details in a soundscape. This work examines if interactive 3D audio enhances listeners' ability to recall details in a soundscape. Nine different soundscapes were constructed...
Video Quality Prediction Models Based on Video Content Dynamics for H.264 Video over UMTS Networks

Directory of Open Access Journals (Sweden)

Asiya Khan

2010-01-01

Full Text Available The aim of this paper is to present video quality prediction models for objective non-intrusive, prediction of H.264 encoded video for all content types combining parameters both in the physical and application layer over Universal Mobile Telecommunication Systems (UMTS networks. In order to characterize the Quality of Service (QoS level, a learning model based on Adaptive Neural Fuzzy Inference System (ANFIS and a second model based on non-linear regression analysis is proposed to predict the video quality in terms of the Mean Opinion Score (MOS. The objective of the paper is two-fold. First, to find the impact of QoS parameters on end-to-end video quality for H.264 encoded video. Second, to develop learning models based on ANFIS and non-linear regression analysis to predict video quality over UMTS networks by considering the impact of radio link loss models. The loss models considered are 2-state Markov models. Both the models are trained with a combination of physical and application layer parameters and validated with unseen dataset. Preliminary results show that good prediction accuracy was obtained from both the models. The work should help in the development of a reference-free video prediction model and QoS control methods for video over UMTS networks.
Personalized Audio Systems - a Bayesian Approach

DEFF Research Database (Denmark)

Nielsen, Jens Brehm; Jensen, Bjørn Sand; Hansen, Toke Jansen

2013-01-01

Modern audio systems are typically equipped with several user-adjustable parameters unfamiliar to most users listening to the system. To obtain the best possible setting, the user is forced into multi-parameter optimization with respect to the users's own objective and preference. To address this......, the present paper presents a general inter-active framework for personalization of such audio systems. The framework builds on Bayesian Gaussian process regression in which a model of the users's objective function is updated sequentially. The parameter setting to be evaluated in a given trial is selected...
Audio effects on haptics perception during drilling simulation

Directory of Open Access Journals (Sweden)

Yair Valbuena

2017-06-01

Full Text Available Virtual reality has provided immersion and interactions through computer generated environments attempting to reproduce real life experiences through sensorial stimuli. Realism can be achieved through multimodal interactions which can enhance the user’s presence within the computer generated world. The most notorious advances in virtual reality can be seen in computer graphics visuals, where photorealism is the norm thriving to overcome the uncanny valley. Other advances have followed related to sound, haptics, and in a lesser manner smell and taste feedback. Currently, virtual reality systems (multimodal immersion and interactions through visual-haptic-sound are being massively used in entertainment (e.g., cinema, video games, art, and in non-entertainment scenarios (e.g., social inclusion, educational, training, therapy, and tourism. Moreover, the cost reduction of virtual reality technologies has resulted in the availability at a consumer-level of various haptic, headsets, and motion tracking devices. Current consumer-level devices offer low-fidelity experiences due to the properties of the sensors, displays, and other electro-mechanical devices, that may not be suitable for high-precision or realistic experiences requiring dexterity. However, research has been conducted on how toovercome or compensate the lack of high fidelity to provide an engaging user experience using storytelling, multimodal interactions and gaming elements. Our work focuses on analyzing the possible effects of auditory perception on haptic feedback within a drilling scenario. Drilling involves multimodal interactions and it is a task with multiple applications in medicine, crafting, and construction. We compare two drilling scenarios were two groups of participants had to drill through wood while listening to contextual and non-contextual audios. We gathered their perception using a survey after the task completion. From the results, we believe that sound does
A high efficiency PWM CMOS class-D audio power amplifier

Energy Technology Data Exchange (ETDEWEB)

Zhu Zhangming; Liu Lianxi; Yang Yintang [Institute of Microelectronics, Xidian University, Xi' an 710071 (China); Lei Han, E-mail: zmyh@263.ne [Xi' an Power-Rail Micro Co., Ltd, Xi' an 710075 (China)

2009-02-15

Based on the difference close-loop feedback technique and the difference pre-amp, a high efficiency PWM CMOS class-D audio power amplifier is proposed. A rail-to-rail PWM comparator with window function has been embedded in the class-D audio power amplifier. Design results based on the CSMC 0.5 mum CMOS process show that the max efficiency is 90%, the PSRR is -75 dB, the power supply voltage range is 2.5-5.5 V, the THD+N in 1 kHz input frequency is less than 0.20%, the quiescent current in no load is 2.8 mA, and the shutdown current is 0.5 muA. The active area of the class-D audio power amplifier is about 1.47 x 1.52 mm{sup 2}. With the good performance, the class-D audio power amplifier can be applied to several audio power systems.
A high efficiency PWM CMOS class-D audio power amplifier

International Nuclear Information System (INIS)

Zhu Zhangming; Liu Lianxi; Yang Yintang; Lei Han

2009-01-01

Based on the difference close-loop feedback technique and the difference pre-amp, a high efficiency PWM CMOS class-D audio power amplifier is proposed. A rail-to-rail PWM comparator with window function has been embedded in the class-D audio power amplifier. Design results based on the CSMC 0.5 μm CMOS process show that the max efficiency is 90%, the PSRR is -75 dB, the power supply voltage range is 2.5-5.5 V, the THD+N in 1 kHz input frequency is less than 0.20%, the quiescent current in no load is 2.8 mA, and the shutdown current is 0.5 μA. The active area of the class-D audio power amplifier is about 1.47 x 1.52 mm 2 . With the good performance, the class-D audio power amplifier can be applied to several audio power systems.
A high efficiency PWM CMOS class-D audio power amplifier

Science.gov (United States)

Zhangming, Zhu; Lianxi, Liu; Yintang, Yang; Han, Lei

2009-02-01

Based on the difference close-loop feedback technique and the difference pre-amp, a high efficiency PWM CMOS class-D audio power amplifier is proposed. A rail-to-rail PWM comparator with window function has been embedded in the class-D audio power amplifier. Design results based on the CSMC 0.5 μm CMOS process show that the max efficiency is 90%, the PSRR is -75 dB, the power supply voltage range is 2.5-5.5 V, the THD+N in 1 kHz input frequency is less than 0.20%, the quiescent current in no load is 2.8 mA, and the shutdown current is 0.5 μA. The active area of the class-D audio power amplifier is about 1.47 × 1.52 mm2. With the good performance, the class-D audio power amplifier can be applied to several audio power systems.
Implementation and Analysis Audio Steganography Used Parity Coding for Symmetric Cryptography Key Delivery

Directory of Open Access Journals (Sweden)

Afany Zeinata Firdaus

2013-12-01

Full Text Available In today's era of communication, online data transactions is increasing. Various information even more accessible, both upload and download. Because it takes a capable security system. Blowfish cryptographic equipped with Audio Steganography is one way to secure the data so that the data can not be accessed by unauthorized parties. In this study Audio Steganography technique is implemented using parity coding method that is used to send the key cryptography blowfish in e-commerce applications based on Android. The results obtained for the average computation time on stage insertion (embedding the secret message is shorter than the average computation time making phase (extracting the secret message. From the test results can also be seen that the more the number of characters pasted the greater the noise received, where the highest SNR is obtained when a character is inserted as many as 506 characters is equal to 11.9905 dB, while the lowest SNR obtained when a character is inserted as many as 2006 characters at 5,6897 dB . Keywords: audio steganograph, parity coding, embedding, extractin, cryptography blowfih.
Harmonic Enhancement in Low Bitrate Audio Coding Using an Efficient Long-Term Predictor

Directory of Open Access Journals (Sweden)

Song Jeongook

2010-01-01

Full Text Available This paper proposes audio coding using an efficient long-term prediction method to enhance the perceptual quality of audio codecs to speech input signals at low bit-rates. The MPEG-4 AAC-LTP exploited a similar concept, but its improvement was not significant because of small prediction gain due to long prediction lags and aliased components caused by the transformation with a time-domain aliasing cancelation (TDAC technique. The proposed algorithm increases the prediction gain by employing a deharmonizing predictor and a long-term compensation filter. The look-back memory elements are first constructed by applying the de-harmonizing predictor to the input signal, then the prediction residual is encoded and decoded by transform audio coding. Finally, the long-term compensation filter is applied to the updated look-back memory of the decoded prediction residual to obtain synthesized signals. Experimental results show that the proposed algorithm has much lower spectral distortion and higher perceptual quality than conventional approaches especially for harmonic signals, such as voiced speech.
SECRETS OF SONG VIDEO

Directory of Open Access Journals (Sweden)

Chernyshov Alexander V.

2014-04-01

Full Text Available The article focuses on the origins of the song videos as TV and Internet-genre. In addition, it considers problems of screen images creation depending on the musical form and the text of a songs in connection with relevant principles of accent and phraseological video editing and filming techniques as well as with additional frames and sound elements.
Dynamically-Loaded Hardware Libraries (HLL) Technology for Audio Applications

DEFF Research Database (Denmark)

Esposito, A.; Lomuscio, A.; Nunzio, L. Di

2016-01-01

In this work, we apply hardware acceleration to embedded systems running audio applications. We present a new framework, Dynamically-Loaded Hardware Libraries or HLL, to dynamically load hardware libraries on reconfigurable platforms (FPGAs). Provided a library of application-specific processors......, we load on-the-fly the specific processor in the FPGA, and we transfer the execution from the CPU to the FPGA-based accelerator. The proposed architecture provides excellent flexibility with respect to the different audio applications implemented, high quality audio, and an energy efficient solution....
A review of lossless audio compression standards and algorithms

Science.gov (United States)

Muin, Fathiah Abdul; Gunawan, Teddy Surya; Kartiwi, Mira; Elsheikh, Elsheikh M. A.

2017-09-01

Over the years, lossless audio compression has gained popularity as researchers and businesses has become more aware of the need for better quality and higher storage demand. This paper will analyse various lossless audio coding algorithm and standards that are used and available in the market focusing on Linear Predictive Coding (LPC) specifically due to its popularity and robustness in audio compression, nevertheless other prediction methods are compared to verify this. Advanced representation of LPC such as LSP decomposition techniques are also discussed within this paper.
Medical students' perceptions of video-linked lectures and video-streaming

Directory of Open Access Journals (Sweden)

Karen Mattick

2010-12-01

Full Text Available Video-linked lectures allow healthcare students across multiple sites, and between university and hospital bases, to come together for the purposes of shared teaching. Recording and streaming video-linked lectures allows students to view them at a later date and provides an additional resource to support student learning. As part of a UK Higher Education Academy-funded Pathfinder project, this study explored medical students' perceptions of video-linked lectures and video-streaming, and their impact on learning. The methodology involved semi-structured interviews with 20 undergraduate medical students across four sites and five year groups. Several key themes emerged from the analysis. Students generally preferred live lectures at the home site and saw interaction between sites as a major challenge. Students reported that their attendance at live lectures was not affected by the availability of streamed lectures and tended to be influenced more by the topic and speaker than the technical arrangements. These findings will inform other educators interested in employing similar video technologies in their teaching.Keywords: video-linked lecture; video-streaming; student perceptions; decisionmaking; cross-campus teaching.
WebGL and web audio software lightweight components for multimedia education

Science.gov (United States)

Chang, Xin; Yuksel, Kivanc; Skarbek, Władysław

2017-08-01

The paper presents the results of our recent work on development of contemporary computing platform DC2 for multimedia education usingWebGL andWeb Audio { the W3C standards. Using literate programming paradigm the WEBSA educational tools were developed. It offers for a user (student), the access to expandable collection of WEBGL Shaders and web Audio scripts. The unique feature of DC2 is the option of literate programming, offered for both, the author and the reader in order to improve interactivity to lightweightWebGL andWeb Audio components. For instance users can define: source audio nodes including synthetic sources, destination audio nodes, and nodes for audio processing such as: sound wave shaping, spectral band filtering, convolution based modification, etc. In case of WebGL beside of classic graphics effects based on mesh and fractal definitions, the novel image processing analysis by shaders is offered like nonlinear filtering, histogram of gradients, and Bayesian classifiers.
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with special ...
Multimedia Storytelling in Journalism: Exploring Narrative Techniques in Snow Fall

Directory of Open Access Journals (Sweden)

Kobie van Krieken

2018-05-01

Full Text Available News stories aim to create an immersive reading experience by virtually transporting the audience to the described scenes. In print journalism, this experience is facilitated by text-linguistic narrative techniques, such as detailed scene reconstructions, a chronological event structure, point-of-view writing, and speech and thought reports. The present study examines how these techniques are translated into journalistic multimedia stories and explores how the distinctive features of text, image, video, audio, and graphic animations are exploited to immerse the audience in otherwise distant news events. To that end, a case study of the New York Times multimedia story Snow Fall is carried out. Results show that scenes are vividly reconstructed through a combination of text, image, video, and graphic animation. The story’s event structure is expressed in text and picture, while combinations of text, video, and audio are used to represent the events from the viewpoints of news actors. Although text is still central to all narrative techniques, it is complemented with other media formats to create various multimedia combinations, each intensifying the experience of immersion.

Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with ...
Class D audio amplifiers for high voltage capacitive transducers

DEFF Research Database (Denmark)

Nielsen, Dennis

of high volume, weight, and cost. High efficient class D amplifiers are now widely available offering power densities, that their linear counterparts can not match. Unlike the technology of audio amplifiers, the loudspeaker is still based on the traditional electrodynamic transducer invented by C.W. Rice......Audio reproduction systems contains two key components, the amplifier and the loudspeaker. In the last 20 – 30 years the technology of audio amplifiers have performed a fundamental shift of paradigm. Class D audio amplifiers have replaced the linear amplifiers, suffering from the well-known issues...... with the low level of acoustical output power and complex amplifier requirements, have limited the commercial success of the technology. Horn or compression drivers are typically favoured, when high acoustic output power is required, this is however at the expense of significant distortion combined...
Streaming video - præsentation af trailere over internettet

DEFF Research Database (Denmark)

Jensen, Ole Riis; Forchhammer, Søren

1998-01-01

interaktiv tjeneste baseret på realtidsfremvisning af videotrailere, såkaldt streaming video og audio, og hvor brugeren kan få information om en film der tænkes udbudt på en pay-per-view kanal. Der i arbejdets forløb blevet opbygget en demo, der implementerer en sådan tjeneste.......Som et led i Tele Danmark Kabel TV's markedsføring ønsker man at kunne præsentere filmtrailere og andet materiale i form af levende billeder til kunderne via Wold Wide Web (WWW). Dette projekts hovedformål er at undersøge eksisterende metoder og udvikle redskaber til at præsentere en tilsvarende...
Theory and Application of Audio-Based Assessment of Cough

Directory of Open Access Journals (Sweden)

Yan Shi

2018-01-01

Full Text Available Cough is a common symptom of many respiratory diseases. Many medical literatures underline that a system for the automatic, objective, and reliable detection of cough events is important and very promising to detect pathology severity in chronic cough disease. In order to track the development status of an audio-based cough monitoring system, we briefly described the history of objective cough detection and then illustrated the cough sound generating principle. The probable endpoints of cough clinical studies, including cough frequency, intensity of coughing, and acoustic properties of cough sound, were analyzed in this paper. Finally, we introduce some successful cough monitoring equipment and their recognition algorithm in detail. It can be obtained that, firstly, acoustic variability of cough sounds within and between individuals makes it difficult to assess the intensity of coughing. Furthermore, now great progress in audio-based cough detection is being made. Moreover, accurate portable objective monitoring systems will be available and widely used in home care and clinical trials in the near future.
Impact of Interactive Video Communication Versus Text-Based Feedback on Teaching, Social, and Cognitive Presence in Online Learning Communities.

Science.gov (United States)

Seckman, Charlotte

A key element to online learning is the ability to create a sense of presence to improve learning outcomes. This quasi-experimental study evaluated the impact of interactive video communication versus text-based feedback and found a significant difference between the 2 groups related to teaching, social, and cognitive presence. Recommendations to enhance presence should focus on providing timely feedback, interactive learning experiences, and opportunities for students to establish relationships with peers and faculty.
Can audio recording improve patients' recall of outpatient consultations?

DEFF Research Database (Denmark)

Wolderslund, Maiken; Kofoed, Poul-Erik; Axboe, Mette

Introduction In order to give patients possibility to listen to their consultation again, we have designed a system which gives the patients access to digital audio recordings of their consultations. An Interactive Voice Response platform enables the audio recording and gives the patients access...... and those who have not (control).The audio recordings and the interviews are coded according to six themes: Test results, Treatment, Risks, Future tests, Advice and Plan. Afterwards the extent of patients recall is assessed by comparing the accuracy of the patient’s statements (interview...
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
StirMark Benchmark: audio watermarking attacks based on lossy compression

Science.gov (United States)

Steinebach, Martin; Lang, Andreas; Dittmann, Jana

2002-04-01

StirMark Benchmark is a well-known evaluation tool for watermarking robustness. Additional attacks are added to it continuously. To enable application based evaluation, in our paper we address attacks against audio watermarks based on lossy audio compression algorithms to be included in the test environment. We discuss the effect of different lossy compression algorithms like MPEG-2 audio Layer 3, Ogg or VQF on a selection of audio test data. Our focus is on changes regarding the basic characteristics of the audio data like spectrum or average power and on removal of embedded watermarks. Furthermore we compare results of different watermarking algorithms and show that lossy compression is still a challenge for most of them. There are two strategies for adding evaluation of robustness against lossy compression to StirMark Benchmark: (a) use of existing free compression algorithms (b) implementation of a generic lossy compression simulation. We discuss how such a model can be implemented based on the results of our tests. This method is less complex, as no real psycho acoustic model has to be applied. Our model can be used for audio watermarking evaluation of numerous application fields. As an example, we describe its importance for e-commerce applications with watermarking security.
Four-quadrant flyback converter for direct audio power amplification

DEFF Research Database (Denmark)

Ljusev, Petar; Andersen, Michael Andreas E.

2005-01-01

This paper presents a bidirectional, four-quadrant flyback converter for use in direct audio power amplification. When compared to the standard Class-D switching audio power amplifier with a separate power supply, the proposed four-quadrant flyback converter provides simple solution with better...
Audio Technology and Mobile Human Computer Interaction

DEFF Research Database (Denmark)

Chamberlain, Alan; Bødker, Mads; Hazzard, Adrian

2017-01-01

Audio-based mobile technology is opening up a range of new interactive possibilities. This paper brings some of those possibilities to light by offering a range of perspectives based in this area. It is not only the technical systems that are developing, but novel approaches to the design...... and understanding of audio-based mobile systems are evolving to offer new perspectives on interaction and design and support such systems to be applied in areas, such as the humanities....
Classifying laughter and speech using audio-visual feature prediction

NARCIS (Netherlands)

Petridis, Stavros; Asghar, Ali; Pantic, Maja

2010-01-01

In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and
Sistema de adquisición y procesamiento de audio

OpenAIRE

Pérez Segurado, Rubén

2015-01-01

El objetivo de este proyecto es el diseño y la implementación de una plataforma para un sistema de procesamiento de audio. El sistema recibirá una señal de audio analógica desde una fuente de audio, permitirá realizar un tratamiento digital de dicha señal y generará una señal procesada que se enviará a unos altavoces externos. Para la realización del sistema de procesamiento se empleará: - Un dispositivo FPGA de Lattice, modelo MachX02-7000-HE, en la cual estarán todas la...
MEL-IRIS: An Online Tool for Audio Analysis and Music Indexing

Directory of Open Access Journals (Sweden)

Dimitrios Margounakis

2009-01-01

Full Text Available Chroma is an important attribute of music and sound, although it has not yet been adequately defined in literature. As such, it can be used for further analysis of sound, resulting in interesting colorful representations that can be used in many tasks: indexing, classification, and retrieval. Especially in Music Information Retrieval (MIR, the visualization of the chromatic analysis can be used for comparison, pattern recognition, melodic sequence prediction, and color-based searching. MEL-IRIS is the tool which has been developed in order to analyze audio files and characterize music based on chroma. The tool implements specially designed algorithms and a unique way of visualization of the results. The tool is network-oriented and can be installed in audio servers, in order to manipulate large music collections. Several samples from world music have been tested and processed, in order to demonstrate the possible uses of such an analysis.
On Modeling Affect in Audio with Non-Linear Symbolic Dynamics

Directory of Open Access Journals (Sweden)

Pauline Mouawad

2017-09-01

Full Text Available The discovery of semantic information from complex signals is a task concerned with connecting humans’ perceptions and/or intentions with the signals content. In the case of audio signals, complex perceptions are appraised in a listener’s mind, that trigger affective responses that may be relevant for well-being and survival. In this paper we are interested in the broader question of relations between uncertainty in data as measured using various information criteria and emotions, and we propose a novel method that combines nonlinear dynamics analysis with a method of adaptive time series symbolization that finds the meaningful audio structure in terms of symbolized recurrence properties. In a first phase we obtain symbolic recurrence quantification measures from symbolic recurrence plots, without the need to reconstruct the phase space with embedding. Then we estimate symbolic dynamical invariants from symbolized time series, after embedding. The invariants are: correlation dimension, correlation entropy and Lyapunov exponent. Through their application for the logistic map, we show that our measures are in agreement with known methods from literature. We further show that one symbolic recurrence measure, namely the symbolic Shannon entropy, correlates positively with the positive Lyapunov exponents. Finally we evaluate the performance of our measures in emotion recognition through the implementation of classification tasks for different types of audio signals, and show that in some cases, they perform better than state-of-the-art methods that rely on low-level acoustic features.
How Major Depressive Disorder affects the ability to decode multimodal dynamic emotional stimuli

Directory of Open Access Journals (Sweden)

FILOMENA SCIBELLI

2016-09-01

Full Text Available Most studies investigating the processing of emotions in depressed patients reported impairments in the decoding of negative emotions. However, these studies adopted static stimuli (mostly stereotypical facial expressions corresponding to basic emotions which do not reflect the way people experience emotions in everyday life. For this reason, this work proposes to investigate the decoding of emotional expressions in patients affected by Recurrent Major Depressive Disorder (RMDDs using dynamic audio/video stimuli. RMDDs’ performance is compared with the performance of patients with Adjustment Disorder with Depressed Mood (ADs and healthy (HCs subjects. The experiments involve 27 RMDDs (16 with acute depression - RMDD-A, and 11 in a compensation phase - RMDD-C, 16 ADs and 16 HCs. The ability to decode emotional expressions is assessed through an emotion recognition task based on short audio (without video, video (without audio and audio/video clips. The results show that AD patients are significantly less accurate than HCs in decoding fear, anger, happiness, surprise and sadness. RMDD-As with acute depression are significantly less accurate than HCs in decoding happiness, sadness and surprise. Finally, no significant differences were found between HCs and RMDD-Cs in a compensation phase. The different communication channels and the types of emotion play a significant role in limiting the decoding accuracy.
Font attributes based text steganographic algorithm (FATS) for communicating images. A nuclear power plant perspective

Energy Technology Data Exchange (ETDEWEB)

Krishnan, R. Bala [Indira Gandhi Centre for Atomic Research, Kalpakkam (India). Computer Division; Thandra, Prasanth Kumar; Murty, S.A.V. Satya [Indira Gandhi Centre for Atomic Research, Kalpakkam (India); Thiyagarajan, P. [Central Univ. of Tamil Nadu, Thiruvarur (India). Dept. of Computer Science

2017-03-15

Today even sensitive organizations like nuclear power plants are connected to the internet. As information security is of utmost priority to such organizations, achieving the same is a challenging task while using the public network like the internet. Though cryptography could be used to enhance the information security, it cannot hide its own presence from the attackers. This drawback lead security professionals to look for an alternate technique like steganography. Steganography hides the secret information, stealthily, inside another innocent looking cover medium like text, image, audio and video. Considering these, this paper proposes a novel text steganographic algorithm that embeds images as codes in the font attributes such as color, character spacing and kerning. The procedure of representing an image as codes makes the proposed method independent from the resolution of the image. Experiments were conducted by embedding diverse categories of images. A comparison with the best existing method depicted that the embedding capacity attained by the proposed method is higher at 1.59-bits/character, which suits best for low bandwidth environments. Various security aspects of the proposed method have also been discussed.
Font attributes based text steganographic algorithm (FATS) for communicating images. A nuclear power plant perspective

International Nuclear Information System (INIS)

Krishnan, R. Bala; Thiyagarajan, P.

2017-01-01

Today even sensitive organizations like nuclear power plants are connected to the internet. As information security is of utmost priority to such organizations, achieving the same is a challenging task while using the public network like the internet. Though cryptography could be used to enhance the information security, it cannot hide its own presence from the attackers. This drawback lead security professionals to look for an alternate technique like steganography. Steganography hides the secret information, stealthily, inside another innocent looking cover medium like text, image, audio and video. Considering these, this paper proposes a novel text steganographic algorithm that embeds images as codes in the font attributes such as color, character spacing and kerning. The procedure of representing an image as codes makes the proposed method independent from the resolution of the image. Experiments were conducted by embedding diverse categories of images. A comparison with the best existing method depicted that the embedding capacity attained by the proposed method is higher at 1.59-bits/character, which suits best for low bandwidth environments. Various security aspects of the proposed method have also been discussed.
Statistical Analysis of Video Frame Size Distribution Originating from Scalable Video Codec (SVC

Directory of Open Access Journals (Sweden)

Sima Ahmadpour

2017-01-01

Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.
Time-Scale Invariant Audio Data Embedding

Directory of Open Access Journals (Sweden)

Mansour Mohamed F

2003-01-01

Full Text Available We propose a novel algorithm for high-quality data embedding in audio. The algorithm is based on changing the relative length of the middle segment between two successive maximum and minimum peaks to embed data. Spline interpolation is used to change the lengths. To ensure smooth monotonic behavior between peaks, a hybrid orthogonal and nonorthogonal wavelet decomposition is used prior to data embedding. The possible data embedding rates are between 20 and 30 bps. However, for practical purposes, we use repetition codes, and the effective embedding data rate is around 5 bps. The algorithm is invariant after time-scale modification, time shift, and time cropping. It gives high-quality output and is robust to mp3 compression.

TECHNICAL NOTE: Portable audio electronics for impedance-based measurements in microfluidics

Science.gov (United States)

Wood, Paul; Sinton, David

2010-08-01

We demonstrate the use of audio electronics-based signals to perform on-chip electrochemical measurements. Cell phones and portable music players are examples of consumer electronics that are easily operated and are ubiquitous worldwide. Audio output (play) and input (record) signals are voltage based and contain frequency and amplitude information. A cell phone, laptop soundcard and two compact audio players are compared with respect to frequency response; the laptop soundcard provides the most uniform frequency response, while the cell phone performance is found to be insufficient. The audio signals in the common portable music players and laptop soundcard operate in the range of 20 Hz to 20 kHz and are found to be applicable, as voltage input and output signals, to impedance-based electrochemical measurements in microfluidic systems. Validated impedance-based measurements of concentration (0.1-50 mM), flow rate (2-120 µL min-1) and particle detection (32 µm diameter) are demonstrated. The prevailing, lossless, wave audio file format is found to be suitable for data transmission to and from external sources, such as a centralized lab, and the cost of all hardware (in addition to audio devices) is ~10 USD. The utility demonstrated here, in combination with the ubiquitous nature of portable audio electronics, presents new opportunities for impedance-based measurements in portable microfluidic systems.
Class-D audio amplifiers with negative feedback

OpenAIRE

Cox, Stephen M.; Candy, B. H.

2006-01-01

There are many different designs for audio amplifiers. Class-D, or switching, amplifiers generate their output signal in the form of a high-frequency square wave of variable duty cycle (ratio of on time to off time). The square-wave nature of the output allows a particularly efficient output stage, with minimal losses. The output is ultimately filtered to remove components of the spectrum above the audio range. Mathematical models are derived here for a variety of related class-D amplifier de...
A second-order class-D audio amplifier

OpenAIRE

Cox, Stephen M.; Tan, M.T.; Yu, J.

2011-01-01

Class-D audio amplifiers are particularly efficient, and this efficiency has led to their ubiquity in a wide range of modern electronic appliances. Their output takes the form of a high-frequency square wave whose duty cycle (ratio of on-time to off-time) is modulated at low frequency according to the audio signal. A mathematical model is developed here for a second-order class-D amplifier design (i.e., containing one second-order integrator) with negative feedback. We derive exact expression...
Documentary management of the sport audio-visual information in the generalist televisions

OpenAIRE

Jorge Caldera Serrano; Felipe Alonso

2007-01-01

The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference i...
Using Music to Communicate Geoscience in Films, Videos and Interactive Games

Science.gov (United States)

Kerlow, I.

2017-12-01

Music is a powerful storytelling device and an essential component in today's movies and interactive games. Communicating Earth science can be enhanced and focused with the proper use of a musical score, particularly in the context of documentary films, television programs, interactive games and museum installations. This presentation presents five simple professional techniques to integrate music, visuals and voice-over narration into a single cohesive story that is emotionally engaging. It also presents five practical tips to improve the success of a musical collaboration. The concepts in question are illustrated with practical audio and video examples from real science projects.
Multi Carrier Modulation Audio Power Amplifier with Programmable Logic

DEFF Research Database (Denmark)

Christiansen, Theis; Andersen, Toke Meyer; Knott, Arnold

2009-01-01

While switch-mode audio power amplifiers allow compact implementations and high output power levels due to their high power efficiency, they are very well known for creating electromagnetic interference (EMI) with other electronic equipment. To lower the EMI of switch-mode (class D) audio power a...
Advances in audio watermarking based on singular value decomposition

CERN Document Server

Dhar, Pranab Kumar

2015-01-01

This book introduces audio watermarking methods for copyright protection, which has drawn extensive attention for securing digital data from unauthorized copying. The book is divided into two parts. First, an audio watermarking method in discrete wavelet transform (DWT) and discrete cosine transform (DCT) domains using singular value decomposition (SVD) and quantization is introduced. This method is robust against various attacks and provides good imperceptible watermarked sounds. Then, an audio watermarking method in fast Fourier transform (FFT) domain using SVD and Cartesian-polar transformation (CPT) is presented. This method has high imperceptibility and high data payload and it provides good robustness against various attacks. These techniques allow media owners to protect copyright and to show authenticity and ownership of their material in a variety of applications. · Features new methods of audio watermarking for copyright protection and ownership protection · Outl...
APLIKASI MEDIA AUDIO-VISUAL DALAM PEMBELAJARAN SPEAKING SKILL DENGAN PENDEKATAN AUDIOLINGUAL: Studi Kasus di MAN Batang

Directory of Open Access Journals (Sweden)

Slamet Untung

2012-10-01

Full Text Available The research to study the application of audio and visual medium in order to learn speaking skill by audiolingual approach is a good contribution to educational world of senior high school and the Islamic one, particularly, in finding a way to improving the learning component relating directly to the medium and method of learning speaking skill. This research is to find out its significance and relevance. The main variable of this research includes the whole activities of the application of audio and visual medium in learning speaking skill by audio-lingual approach. The data were collected through observation, interview, questionnaire and documentation. This research took place in state Islamic senior high school of Batang in Central Java. The result shows that the application helps the students to speak English correctly and accurately and stresses the message of the speaking skill learning.
An introduction to audio content analysis applications in signal processing and music informatics

CERN Document Server

Lerch, Alexander

2012-01-01

"With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included"--
BAT: An open-source, web-based audio events annotation tool

OpenAIRE

Blai Meléndez-Catalan, Emilio Molina, Emilia Gómez

2017-01-01

In this paper we present BAT (BMAT Annotation Tool), an open-source, web-based tool for the manual annotation of events in audio recordings developed at BMAT (Barcelona Music and Audio Technologies). The main feature of the tool is that it provides an easy way to annotate the salience of simultaneous sound sources. Additionally, it allows to define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data. Moreover, it is easy to install and deploy...
APPLICATION OF CONTROLLED SOURCE AUDIO MAGNETOTELLURIC (CSAMT AT GEOTHERMAL

Directory of Open Access Journals (Sweden)

Susilawati S.

2017-04-01

Full Text Available CSAMT or Controlled Source Audio-Magnetotelluric is one of the Geophysics methods to determine the resistivity of rock under earth surface. CSAMT method utilizes artificial stream and injected into the ground, the frequency of artificial sources ranging from 0.1 Hz to 10 kHz, CSAMT data source effect correction is inverted. From the inversion results showed that there is a layer having resistivity values ranged between 2.5 Ω.m – 15 Ω.m, which is interpreted that the layer is clay.
Automatic Association of Chats and Video Tracks for Activity Learning and Recognition in Aerial Video Surveillance

Directory of Open Access Journals (Sweden)

Riad I. Hammoud

2014-10-01

Full Text Available We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA and multi-media indexing and explorer (MINER. VIVA utilizes analyst call-outs (ACOs in the form of chat messages (voice-to-text to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1 a fusion of graphical track and text data using probabilistic methods; (2 an activity pattern learning framework to support querying an index of activities of interest (AOIs and targets of interest (TOIs by movement type and geolocation; and (3 a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV. VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
On the Use of Memory Models in Audio Features

DEFF Research Database (Denmark)

Jensen, Karl Kristoffer

2011-01-01

Audio feature estimation is potentially improved by including higher- level models. One such model is the Short Term Memory (STM) model. A new paradigm of audio feature estimation is obtained by adding the influence of notes in the STM. These notes are identified when the perceptual spectral flux...
Audio Teleconferencing: Low Cost Technology for External Studies Networking.

Science.gov (United States)

Robertson, Bill

1987-01-01

This discussion of the benefits of audio teleconferencing for distance education programs and for business and government applications focuses on the recent experience of Canadian educational users. Four successful operating models and their costs are reviewed, and it is concluded that audio teleconferencing is cost efficient and educationally…
Audio Query by Example Using Similarity Measures between Probability Density Functions of Features

Directory of Open Access Journals (Sweden)

Marko Helén

2010-01-01

Full Text Available This paper proposes a query by example system for generic audio. We estimate the similarity of the example signal and the samples in the queried database by calculating the distance between the probability density functions (pdfs of their frame-wise acoustic features. Since the features are continuous valued, we propose to model them using Gaussian mixture models (GMMs or hidden Markov models (HMMs. The models parametrize each sample efficiently and retain sufficient information for similarity measurement. To measure the distance between the models, we apply a novel Euclidean distance, approximations of Kullback-Leibler divergence, and a cross-likelihood ratio test. The performance of the measures was tested in simulations where audio samples are automatically retrieved from a general audio database, based on the estimated similarity to a user-provided example. The simulations show that the distance between probability density functions is an accurate measure for similarity. Measures based on GMMs or HMMs are shown to produce better results than that of the existing methods based on simpler statistics or histograms of the features. A good performance with low computational cost is obtained with the proposed Euclidean distance.
Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations

Directory of Open Access Journals (Sweden)

Müller Meinard

2007-01-01

Full Text Available One major goal of structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches to this problem work well for music, where the repetitions largely agree with respect to instrumentation and tempo, as is typically the case for popular music. For other classes of music such as Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, modulation, articulation, and tempo progression. In this paper, we propose a robust and efficient algorithm for audio structure analysis, which allows to identify musically similar segments even in the presence of large variations in these parameters. To account for such variations, our main idea is to incorporate invariance at various levels simultaneously: we design a new type of statistical features to absorb microvariations, introduce an enhanced local distance measure to account for local variations, and describe a new strategy for structure extraction that can cope with the global variations. Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations.
The Build-Up Course of Visuo-Motor and Audio-Motor Temporal Recalibration

Directory of Open Access Journals (Sweden)

Yoshimori Sugano

2011-10-01

Full Text Available The sensorimotor timing is recalibrated after a brief exposure to a delayed feedback of voluntary actions (temporal recalibration effect: TRE (Heron et al., 2009; Stetson et al., 2006; Sugano et al., 2010. We introduce a new paradigm, namely ‘synchronous tapping’ (ST which allows us to investigate how the TRE builds up during adaptation. In each experimental trial, participants were repeatedly exposed to a constant lag (∼150 ms between their voluntary action (pressing a mouse and a feedback stimulus (a visual flash / an auditory click 10 times. Immediately after that, they performed a ST task with the same stimulus as a pace signal (7 flashes / clicks. A subjective ‘no-delay condition’ (∼50 ms served as control. The TRE manifested itself as a change in the tap-stimulus asynchrony that compensated the exposed lag (eg, after lag adaptation, the tap preceded the stimulus more than in control and built up quickly (∼3–6 trials, ∼23–45 sec in both the visuo- and audio-motor domain. The audio-motor TRE was bigger and built-up faster than the visuo-motor one. To conclude, the TRE is comparable between visuo- and audio-motor domain, though they are slightly different in size and build-up rate.
Automatic Organisation and Quality Analysis of User-Generated Content with Audio Fingerprinting

OpenAIRE

Cavaco, Sofia; Magalhaes, Joao; Mordido, Gonçalo

2018-01-01

The increase of the quantity of user-generated content experienced in social media has boosted the importance of analysing and organising the content by its quality. Here, we propose a method that uses audio fingerprinting to organise and infer the quality of user-generated audio content. The proposed method detects the overlapping segments between different audio clips to organise and cluster the data according to events, and to infer the audio quality of the samples. A test setup with conce...
Text messaging as a strategy to address the limits of audio-based communication during mass-gathering events with high ambient noise.

Science.gov (United States)

Lund, Adam; Wong, Daniel; Lewis, Kerrie; Turris, Sheila A; Vaisler, Sean; Gutman, Samuel

2013-02-01

The provision of medical care in environments with high levels of ambient noise (HLAN), such as concerts or sporting events, presents unique communication challenges. Audio transmissions can be incomprehensible to the receivers. Text-based communications may be a valuable primary and/or secondary means of communication in this type of setting. To evaluate the usability of text-based communications in parallel with standard two-way radio communications during mass-gathering (MG) events in the context of HLAN. This Canadian study used outcome survey methods to evaluate the performance of communication devices during MG events. Ten standard commercially available handheld smart phones loaded with basic voice and data plans were assigned to health care providers (HCPs) for use as an adjunct to the medical team's typical radio-based communication. Common text messaging and chat platforms were trialed. Both efficacy and provider satisfaction were evaluated. During a 23-month period, the smart phones were deployed at 17 events with HLAN for a total of 40 event days or approximately 460 hours of active use. Survey responses from health care providers (177) and dispatchers (26) were analyzed. The response rate was unknown due to the method of recruitment. Of the 155 HCP responses to the question measuring difficulty of communication in environments with HLAN, 68.4% agreed that they "occasionally" or "frequently" found it difficult to clearly understand voice communications via two-way radio. Similarly, of the 23 dispatcher responses to the same item, 65.2% of the responses indicated that "occasionally" or "frequently" HLAN negatively affected the ability to communicate clearly with team members. Of the 168 HCP responses to the item assessing whether text-based communication improved the ability to understand and respond to calls when compared to radio alone, 86.3% "agreed" or "strongly agreed" that this was the case. The dispatcher responses (n = 21) to the same item also
Real-Time Audio Processing on the T-CREST Multicore Platform

DEFF Research Database (Denmark)

Ausin, Daniel Sanz; Pezzarossa, Luca; Schoeberl, Martin

2017-01-01

of the audio signal. This paper presents a real-time multicore audio processing system based on the T-CREST platform. T-CREST is a time-predictable multicore processor for real-time embedded systems. Multiple audio effect tasks have been implemented, which can be connected together in different configurations...... forming sequential and parallel effect chains, and using a network-onchip for intercommunication between processors. The evaluation of the system shows that real-time processing of multiple effect configurations is possible, and that the estimation and control of latency ensures real-time behavior.......Multicore platforms are nowadays widely used for audio processing applications, due to the improvement of computational power that they provide. However, some of these systems are not optimized for temporally constrained environments, which often leads to an undesired increase in the latency...

Haptic and Audio-visual Stimuli: Enhancing Experiences and Interaction

NARCIS (Netherlands)

Nijholt, Antinus; Dijk, Esko O.; Lemmens, Paul M.C.; Luitjens, S.B.

2010-01-01

The intention of the symposium on Haptic and Audio-visual stimuli at the EuroHaptics 2010 conference is to deepen the understanding of the effect of combined Haptic and Audio-visual stimuli. The knowledge gained will be used to enhance experiences and interactions in daily life. To this end, a
Selected Audio-Visual Materials for Consumer Education. [New Version.

Science.gov (United States)

Johnston, William L.

Ninety-two films, filmstrips, multi-media kits, slides, and audio cassettes, produced between 1964 and 1974, are listed in this selective annotated bibliography on consumer education. The major portion of the bibliography is devoted to films and filmstrips. The main topics of the audio-visual materials include purchasing, advertising, money…
Effect of Audio Coaching on Correlation of Abdominal Displacement With Lung Tumor Motion

International Nuclear Information System (INIS)

Nakamura, Mitsuhiro; Narita, Yuichiro; Matsuo, Yukinori; Narabayashi, Masaru; Nakata, Manabu; Sawada, Akira; Mizowaki, Takashi; Nagata, Yasushi; Hiraoka, Masahiro

2009-01-01

Purpose: To assess the effect of audio coaching on the time-dependent behavior of the correlation between abdominal motion and lung tumor motion and the corresponding lung tumor position mismatches. Methods and Materials: Six patients who had a lung tumor with a motion range >8 mm were enrolled in the present study. Breathing-synchronized fluoroscopy was performed initially without audio coaching, followed by fluoroscopy with recorded audio coaching for multiple days. Two different measurements, anteroposterior abdominal displacement using the real-time positioning management system and superoinferior (SI) lung tumor motion by X-ray fluoroscopy, were performed simultaneously. Their sequential images were recorded using one display system. The lung tumor position was automatically detected with a template matching technique. The relationship between the abdominal and lung tumor motion was analyzed with and without audio coaching. Results: The mean SI tumor displacement was 10.4 mm without audio coaching and increased to 23.0 mm with audio coaching (p < .01). The correlation coefficients ranged from 0.89 to 0.97 with free breathing. Applying audio coaching, the correlation coefficients improved significantly (range, 0.93-0.99; p < .01), and the SI lung tumor position mismatches became larger in 75% of all sessions. Conclusion: Audio coaching served to increase the degree of correlation and make it more reproducible. In addition, the phase shifts between tumor motion and abdominal displacement were improved; however, all patients breathed more deeply, and the SI lung tumor position mismatches became slightly larger with audio coaching than without audio coaching.
Four-quadrant flyback converter for direct audio power amplification

OpenAIRE

Ljusev, Petar; Andersen, Michael Andreas E.

2005-01-01

This paper presents a bidirectional, four-quadrant flyback converter for use in direct audio power amplification. When compared to the standard Class-D switching audio power amplifier with a separate power supply, the proposed four-quadrant flyback converter provides simple solution with better efficiency, higher level of integration and lower component count.
Extraction Of Audio Features For Emotion Recognition System Based On Music

Directory of Open Access Journals (Sweden)

Kee Moe Han

2015-08-01

Full Text Available Music is the combination of melody linguistic information and the vocalists emotion. Since music is a work of art analyzing emotion in music by computer is a difficult task. Many approaches have been developed to detect the emotions included in music but the results are not satisfactory because emotion is very complex. In this paper the evaluations of audio features from the music files are presented. The extracted features are used to classify the different emotion classes of the vocalists. Musical features extraction is done by using Music Information Retrieval MIR tool box in this paper. The database of 100 music clips are used to classify the emotions perceived in music clips. Music may contain many emotions according to the vocalists mood such as happy sad nervous bored peace etc. In this paper the audio features related to the emotions of the vocalists are extracted to use in emotion recognition system based on music.
Selective attention modulates the direction of audio-visual temporal recalibration.

Science.gov (United States)

Ikumi, Nara; Soto-Faraco, Salvador

2014-01-01

Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Contemplation, Subcreation, and Video Games

Directory of Open Access Journals (Sweden)

Mark J. P. Wolf

2018-04-01

Full Text Available This essay asks how religion and theological ideas might be made manifest in video games, and particularly the creation of video games as a religious activity, looking at contemplative experiences in video games, and the creation and world-building of game worlds as a form of Tolkienian subcreation, which itself leads to contemplation regarding the creation of worlds.
Spectacular Attractions: Museums, Audio-Visuals and the Ghosts of Memory

Directory of Open Access Journals (Sweden)

Mandelli Elisa

2015-12-01

Full Text Available In the last decades, moving images have become a common feature not only in art museums, but also in a wide range of institutions devoted to the conservation and transmission of memory. This paper focuses on the role of audio-visuals in the exhibition design of history and memory museums, arguing that they are privileged means to achieve the spectacular effects and the visitors’ emotional and “experiential” engagement that constitute the main objective of contemporary museums. I will discuss this topic through the concept of “cinematic attraction,” claiming that when embedded in displays, films and moving images often produce spectacular mises en scène with immersive effects, creating wonder and astonishment, and involving visitors on an emotional, visceral and physical level. Moreover, I will consider the diffusion of audio-visual witnesses of real or imaginary historical characters, presented in Phantasmagoria-like displays that simulate ghostly and uncanny apparitions, creating an ambiguous and often problematic coexistence of truth and illusion, subjectivity and objectivity, facts and imagination.
Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

Directory of Open Access Journals (Sweden)

W. Bastiaan Kleijn

2005-06-01

Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.
Self-oscillating modulators for direct energy conversion audio power amplifiers

Energy Technology Data Exchange (ETDEWEB)

Ljusev, P.; Andersen, Michael A.E.

2005-07-01

Direct energy conversion audio power amplifier represents total integration of switching-mode power supply and Class D audio power amplifier into one compact stage, achieving high efficiency, high level of integration, low component count and eventually low cost. This paper presents how self-oscillating modulators can be used with the direct switching-mode audio power amplifier to improve its performance by providing fast hysteretic control with high power supply rejection ratio, open-loop stability and high bandwidth. Its operation is thoroughly analyzed and simulated waveforms of a prototype amplifier are presented. (au)
Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

DEFF Research Database (Denmark)

Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel

2015-01-01

This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...... training exercise without any technological input, (2) a visual biofeedback group, performing via visual input, and (3) an audio-visual biofeedback group, performing via audio and visual input. Results retrieved from comparisons between the data sets (2) and (3) suggested superior postural stability...
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... Eye Disease Dilated Eye Exam Dry Eye For Kids Glaucoma Healthy Vision Tips Leber Congenital Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded ...
Balancing Audio

DEFF Research Database (Denmark)

Walther-Hansen, Mads

2016-01-01

is not thoroughly understood. In this paper I treat balance as a metaphor that we use to reason about several different actions in music production, such as adjusting levels, editing the frequency spectrum or the spatiality of the recording. This study is based on an exploration of a linguistic corpus of sound......This paper explores the concept of balance in music production and examines the role of conceptual metaphors in reasoning about audio editing. Balance may be the most central concept in record production, however, the way we cognitively understand and respond meaningfully to a mix requiring balance...
Four-quadrant flyback converter for direct audio power amplification

Energy Technology Data Exchange (ETDEWEB)

Ljusev, P.; Andersen, Michael A.E.

2005-07-01

This paper presents a bidirectional, four-quadrant yback converter for use in direct audio power amplication. When compared to the standard Class-D switching-mode audio power amplier with separate power supply, the proposed four-quadrant flyback converter provides simple and compact solution with high efciency, higher level of integration, lower component count, less board space and eventually lower cost. Both peak and average current-mode control for use with 4Q flyback power converters are described and compared. Integrated magnetics is presented which simplies the construction of the auxiliary power supplies for control biasing and isolated gate drives. The feasibility of the approach is proven on audio power amplier prototype for subwoofer applications. (au)
SCORE DIGITAL TECHNOLOGY: THE CONVERGENCE

Directory of Open Access Journals (Sweden)

Chernyshov Alexander V.

2013-12-01

Full Text Available Explores the role of digital scorewriters in today's culture, education, and music industry and media environment. The main principle of the development of software is not only publishing innovation (relating to the sheet music, and integration into the area of composition, arrangement, education, creative process for works based on digital technology (films, television and radio broadcasting, Internet, audio and video art. Therefore the own convergence of musically-computer technology is a total phenomenon: notation program combined with means MIDI-sequencer, audio and video editor. The article contains the unique interview with the creator of music notation processors.
The audiovisual mounting narrative as a basis for the documentary film interactive: news studies

Directory of Open Access Journals (Sweden)

Mgs. Denis Porto Renó

2008-01-01

Full Text Available This paper presents a literature review and experiment results from pilot-doctoral research "assembly language visual narrative for the documentary film interactive," which defend the thesis that there are features interactive audio and video editing of the movie, even as causing agent of interactivity. The search for interactive audio-visual formats are present in international investigations, but sob glances technology. He believes that this paper is to propose possible formats for interactive audiovisual production film, video, television, computer and cell phone from the postmodern society. Key words: Audiovisual, language, interactivity, cinema interactive, documentary, communication.
JINGLE: THE SOUNDING SYMBOL

Directory of Open Access Journals (Sweden)

Bysko Maxim V.

2013-12-01

Full Text Available The article considers the role of jingles in the industrial era, from the occurrence of the regular radio broadcasting, sound films and television up of modern video games, audio and video podcasts, online broadcasts, and mobile communications. Jingles are researched from the point of view of the theory of symbols: the forward motion is detected in the process of development of jingles from the social symbols (radio callsigns to the individual signs-images (ringtones. The role of technical progress in the formation of jingles as important cultural audio elements of modern digital civilization.
Surgery on Fetus Reduces Complications of Spina Bifida

Medline Plus

Full Text Available ... Forms Clinical Research Newsroom News Digital Media Infographics Videos Get to Know NICHD Podcasts and Audio Social Media Join NICHD Listservs About NICHD Organization Office of ...
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... will allow you to take a more active role in your care. The information in these videos ... Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for ...
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 ... Patient to an Adult Rheumatologist Drug Information for Patients Arthritis Drug Information Sheets Benefits and Risks of ...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.