video text recognition: Topics by WorldWideScience.org

Sample records for video text recognition

Face Recognition and Tracking in Videos

Directory of Open Access Journals (Sweden)

Swapnil Vitthal Tathe

2017-07-01

Full Text Available Advancement in computer vision technology and availability of video capturing devices such as surveillance cameras has evoked new video processing applications. The research in video face recognition is mostly biased towards law enforcement applications. Applications involves human recognition based on face and iris, human computer interaction, behavior analysis, video surveillance etc. This paper presents face tracking framework that is capable of face detection using Haar features, recognition using Gabor feature extraction, matching using correlation score and tracking using Kalman filter. The method has good recognition rate for real-life videos and robust performance to changes due to illumination, environmental factors, scale, pose and orientations.
Human recognition at a distance in video

CERN Document Server

Bhanu, Bir

2010-01-01

Most biometric systems employed for human recognition require physical contact with, or close proximity to, a cooperative subject. Far more challenging is the ability to reliably recognize individuals at a distance, when viewed from an arbitrary angle under real-world environmental conditions. Gait and face data are the two biometrics that can be most easily captured from a distance using a video camera. This comprehensive and logically organized text/reference addresses the fundamental problems associated with gait and face-based human recognition, from color and infrared video data that are
An Innovative SIFT-Based Method for Rigid Video Object Recognition

Directory of Open Access Journals (Sweden)

Jie Yu

2014-01-01

Full Text Available This paper presents an innovative SIFT-based method for rigid video object recognition (hereafter called RVO-SIFT. Just like what happens in the vision system of human being, this method makes the object recognition and feature updating process organically unify together, using both trajectory and feature matching, and thereby it can learn new features not only in the training stage but also in the recognition stage, which can improve greatly the completeness of the video object’s features automatically and, in turn, increases the ratio of correct recognition drastically. The experimental results on real video sequences demonstrate its surprising robustness and efficiency.
Application of Video Recognition Technology in Landslide Monitoring System

Directory of Open Access Journals (Sweden)

Qingjia Meng

2018-01-01

Full Text Available The video recognition technology is applied to the landslide emergency remote monitoring system. The trajectories of the landslide are identified by this system in this paper. The system of geological disaster monitoring is applied synthetically to realize the analysis of landslide monitoring data and the combination of video recognition technology. Landslide video monitoring system will video image information, time point, network signal strength, power supply through the 4G network transmission to the server. The data is comprehensively analysed though the remote man-machine interface to conduct to achieve the threshold or manual control to determine the front-end video surveillance system. The system is used to identify the target landslide video for intelligent identification. The algorithm is embedded in the intelligent analysis module, and the video frame is identified, detected, analysed, filtered, and morphological treatment. The algorithm based on artificial intelligence and pattern recognition is used to mark the target landslide in the video screen and confirm whether the landslide is normal. The landslide video monitoring system realizes the remote monitoring and control of the mobile side, and provides a quick and easy monitoring technology.
Self-Recognition in Live Videos by Young Children: Does Video Training Help?

Science.gov (United States)

Demir, Defne; Skouteris, Helen

2010-01-01

The overall aim of the experiment reported here was to establish whether self-recognition in live video can be facilitated when live video training is provided to children aged 2-2.5 years. While the majority of children failed the test of live self-recognition prior to video training, more than half exhibited live self-recognition post video…
Probabilistic recognition of human faces from video

DEFF Research Database (Denmark)

Zhou, Saohua; Krüger, Volker; Chellappa, Rama

2003-01-01

Recognition of human faces using a gallery of still or video images and a probe set of videos is systematically investigated using a probabilistic framework. In still-to-video recognition, where the gallery consists of still images, a time series state space model is proposed to fuse temporal...... of the identity variable produces the recognition result. The model formulation is very general and it allows a variety of image representations and transformations. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach for both still-to-video and video-to-video...... information in a probe video, which simultaneously characterizes the kinematics and identity using a motion vector and an identity variable, respectively. The joint posterior distribution of the motion vector and the identity variable is estimated at each time instant and then propagated to the next time...
VideoSET: Video Summary Evaluation through Text

OpenAIRE

Yeung, Serena; Fathi, Alireza; Fei-Fei, Li

2014-01-01

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text ...
Hierarchical Context Modeling for Video Event Recognition.

Science.gov (United States)

Wang, Xiaoyang; Ji, Qiang

2016-10-11

Current video event recognition research remains largely target-centered. For real-world surveillance videos, targetcentered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.
A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos

Directory of Open Access Journals (Sweden)

Wenjun Zhang

2013-10-01

Full Text Available Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51 validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.
Quality Assessment of Compressed Video for Automatic License Plate Recognition

DEFF Research Database (Denmark)

Ukhanova, Ann; Støttrup-Andersen, Jesper; Forchhammer, Søren

2014-01-01

Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC s...... recognition in our study has a behavior similar to human recognition, allowing the use of the same mathematical models. We furthermore propose an application of one of the models for video surveillance systems......Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC...... standards on the recognition performance. We compare logarithmic and logistic functions for quality modeling. Our results show that a logistic function can better describe the dependence of recognition performance on the quality for both compression standards. We observe that automatic license plate...
A Review on Video-Based Human Activity Recognition

Directory of Open Access Journals (Sweden)

Shian-Ru Ke

2013-06-01

Full Text Available This review article surveys extensively the current progresses made toward video-based human activity recognition. Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation. In the core technology, three critical processing stages are thoroughly discussed mainly: human object segmentation, feature extraction and representation, activity detection and classification algorithms. In the human activity recognition systems, three main types are mentioned, including single person activity recognition, multiple people interaction and crowd behavior, and abnormal activity recognition. Finally the domains of applications are discussed in detail, specifically, on surveillance environments, entertainment environments and healthcare systems. Our survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications. Moreover, in this survey, various applications are discussed in great detail, specifically, a survey on the applications in healthcare monitoring systems.
Non Audio-Video gesture recognition system

DEFF Research Database (Denmark)

Craciunescu, Razvan; Mihovska, Albena Dimitrova; Kyriazakos, Sofoklis

2016-01-01

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current research focus includes on the emotion...... recognition from the face and hand gesture recognition. Gesture recognition enables humans to communicate with the machine and interact naturally without any mechanical devices. This paper investigates the possibility to use non-audio/video sensors in order to design a low-cost gesture recognition device...
Non-Cooperative Facial Recognition Video Dataset Collection Plan

Energy Technology Data Exchange (ETDEWEB)

Kimura, Marcia L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Erikson, Rebecca L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Lombardo, Nicholas J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

2013-08-31

The Pacific Northwest National Laboratory (PNNL) will produce a non-cooperative (i.e. not posing for the camera) facial recognition video data set for research purposes to evaluate and enhance facial recognition systems technology. The aggregate data set consists of 1) videos capturing PNNL role players and public volunteers in three key operational settings, 2) photographs of the role players for enrolling in an evaluation database, and 3) ground truth data that documents when the role player is within various camera fields of view. PNNL will deliver the aggregate data set to DHS who may then choose to make it available to other government agencies interested in evaluating and enhancing facial recognition systems. The three operational settings that will be the focus of the video collection effort include: 1) unidirectional crowd flow 2) bi-directional crowd flow, and 3) linear and/or serpentine queues.
Action recognition in depth video from RGB perspective: A knowledge transfer manner

Science.gov (United States)

Chen, Jun; Xiao, Yang; Cao, Zhiguo; Fang, Zhiwen

2018-03-01

Different video modal for human action recognition has becoming a highly promising trend in the video analysis. In this paper, we propose a method for human action recognition from RGB video to Depth video using domain adaptation, where we use learned feature from RGB videos to do action recognition for depth videos. More specifically, we make three steps for solving this problem in this paper. First, different from image, video is more complex as it has both spatial and temporal information, in order to better encode this information, dynamic image method is used to represent each RGB or Depth video to one image, based on this, most methods for extracting feature in image can be used in video. Secondly, as video can be represented as image, so standard CNN model can be used for training and testing for videos, beside, CNN model can be also used for feature extracting as its powerful feature expressing ability. Thirdly, as RGB videos and Depth videos are belong to two different domains, in order to make two different feature domains has more similarity, domain adaptation is firstly used for solving this problem between RGB and Depth video, based on this, the learned feature from RGB video model can be directly used for Depth video classification. We evaluate the proposed method on one complex RGB-D action dataset (NTU RGB-D), and our method can have more than 2% accuracy improvement using domain adaptation from RGB to Depth action recognition.
Tracking and recognition face in videos with incremental local sparse representation model

Science.gov (United States)

Wang, Chao; Wang, Yunhong; Zhang, Zhaoxiang

2013-10-01

This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. First a robust face tracking algorithm is proposed via employing local sparse appearance and covariance pooling method. In the following face recognition stage, with the employment of a novel template update strategy, which combines incremental subspace learning, our recognition algorithm adapts the template to appearance changes and reduces the influence of occlusion and illumination variation. This leads to a robust video-based face tracking and recognition with desirable performance. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database, which includes 47 celebrities. Our proposed method produces a high face recognition rate at 95% of all videos. The proposed face tracking and recognition algorithms are also tested on a set of noisy videos under heavy occlusion and illumination variation. The tracking results on challenging benchmark videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. In the case of the challenging dataset in which faces undergo occlusion and illumination variation, and tracking and recognition experiments under significant pose variation on the University of California, San Diego (Honda/UCSD) database, our proposed method also consistently demonstrates a high recognition rate.
Application of robust face recognition in video surveillance systems

Science.gov (United States)

Zhang, De-xin; An, Peng; Zhang, Hao-xiang

2018-03-01

In this paper, we propose a video searching system that utilizes face recognition as searching indexing feature. As the applications of video cameras have great increase in recent years, face recognition makes a perfect fit for searching targeted individuals within the vast amount of video data. However, the performance of such searching depends on the quality of face images recorded in the video signals. Since the surveillance video cameras record videos without fixed postures for the object, face occlusion is very common in everyday video. The proposed system builds a model for occluded faces using fuzzy principal component analysis (FPCA), and reconstructs the human faces with the available information. Experimental results show that the system has very high efficiency in processing the real life videos, and it is very robust to various kinds of face occlusions. Hence it can relieve people reviewers from the front of the monitors and greatly enhances the efficiency as well. The proposed system has been installed and applied in various environments and has already demonstrated its power by helping solving real cases.
Violent video game players and non-players differ on facial emotion recognition.

Science.gov (United States)

Diaz, Ruth L; Wong, Ulric; Hodgins, David C; Chiu, Carina G; Goghari, Vina M

2016-01-01

Violent video game playing has been associated with both positive and negative effects on cognition. We examined whether playing two or more hours of violent video games a day, compared to not playing video games, was associated with a different pattern of recognition of five facial emotions, while controlling for general perceptual and cognitive differences that might also occur. Undergraduate students were categorized as violent video game players (n = 83) or non-gamers (n = 69) and completed a facial recognition task, consisting of an emotion recognition condition and a control condition of gender recognition. Additionally, participants completed questionnaires assessing their video game and media consumption, aggression, and mood. Violent video game players recognized fearful faces both more accurately and quickly and disgusted faces less accurately than non-gamers. Desensitization to violence, constant exposure to fear and anxiety during game playing, and the habituation to unpleasant stimuli, are possible mechanisms that could explain these results. Future research should evaluate the effects of violent video game playing on emotion processing and social cognition more broadly. © 2015 Wiley Periodicals, Inc.
Robust Pedestrian Tracking and Recognition from FLIR Video: A Unified Approach via Sparse Coding

Directory of Open Access Journals (Sweden)

Xin Li

2014-06-01

Full Text Available Sparse coding is an emerging method that has been successfully applied to both robust object tracking and recognition in the vision literature. In this paper, we propose to explore a sparse coding-based approach toward joint object tracking-and-recognition and explore its potential in the analysis of forward-looking infrared (FLIR video to support nighttime machine vision systems. A key technical contribution of this work is to unify existing sparse coding-based approaches toward tracking and recognition under the same framework, so that they can benefit from each other in a closed-loop. On the one hand, tracking the same object through temporal frames allows us to achieve improved recognition performance through dynamical updating of template/dictionary and combining multiple recognition results; on the other hand, the recognition of individual objects facilitates the tracking of multiple objects (i.e., walking pedestrians, especially in the presence of occlusion within a crowded environment. We report experimental results on both the CASIAPedestrian Database and our own collected FLIR video database to demonstrate the effectiveness of the proposed joint tracking-and-recognition approach.
Heartbeat Signal from Facial Video for Biometric Recognition

DEFF Research Database (Denmark)

Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.

2015-01-01

Different biometric traits such as face appearance and heartbeat signal from Electrocardiogram (ECG)/Phonocardiogram (PCG) are widely used in the human identity recognition. Recent advances in facial video based measurement of cardio-physiological parameters such as heartbeat rate, respiratory rate......, and blood volume pressure provide the possibility of extracting heartbeat signal from facial video instead of using obtrusive ECG or PCG sensors in the body. This paper proposes the Heartbeat Signal from Facial Video (HSFV) as a new biometric trait for human identity recognition, for the first time...... to the best of our knowledge. Feature extraction from the HSFV is accomplished by employing Radon transform on a waterfall model of the replicated HSFV. The pairwise Minkowski distances are obtained from the Radon image as the features. The authentication is accomplished by a decision tree based supervised...
Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

OpenAIRE

Su, Yu-Chuan; Chiu, Tzu-Hsuan; Yeh, Chun-Yen; Huang, Hsin-Fu; Hsu, Winston H.

2014-01-01

Unconstrained video recognition and Deep Convolution Network (DCN) are two active topics in computer vision recently. In this work, we apply DCNs as frame-based recognizers for video recognition. Our preliminary studies, however, show that video corpora with complete ground truth are usually not large and diverse enough to learn a robust model. The networks trained directly on the video data set suffer from significant overfitting and have poor recognition rate on the test set. The same lack-...

Scene text recognition in mobile applications by character descriptor and structure configuration.

Science.gov (United States)

Yi, Chucai; Tian, Yingli

2014-07-01

Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
Micro Expression Recognition Using the Eulerian Video Magnification Method

Directory of Open Access Journals (Sweden)

Elham Zarezadeh

2016-08-01

Full Text Available In this paper we propose a new approach for facial micro expressions recognition. For this purpose the Eulerian Video Magnification (EVM method is used to retrieve the subtle motions of the face. The results of this method are obtained as in the magnified images sequence. In this study the numerical tests are performed on two databases: Spontaneous Micro expression (SMIC and Category and Sourcing Managers Executive (CASME. We evaluate our proposed method in two phases using the eigenface method. In phase 1 we recognize the type of a micro expression, for example emotional versus unemotional in SMIC database. Phase 2 classifies the recognized micro expression as negative versus positive in SMIC database and happiness versus disgust in CASME database. The results show that the eigenface method by the EVM method for the retrieval of subtle motions of the face increases the performance of micro expression recognition. Moreover, the proposed approach is more accurate and promising than the previous works in micro expressions recognition.
A sensor and video based ontology for activity recognition in smart environments.

Science.gov (United States)

Mitchell, D; Morrow, Philip J; Nugent, Chris D

2014-01-01

Activity recognition is used in a wide range of applications including healthcare and security. In a smart environment activity recognition can be used to monitor and support the activities of a user. There have been a range of methods used in activity recognition including sensor-based approaches, vision-based approaches and ontological approaches. This paper presents a novel approach to activity recognition in a smart home environment which combines sensor and video data through an ontological framework. The ontology describes the relationships and interactions between activities, the user, objects, sensors and video data.
Towards Robust Face Recognition from Video

International Nuclear Information System (INIS)

Price, JR

2001-01-01

A novel, template-based method for face recognition is presented. The goals of the proposed method are to integrate multiple observations for improved robustness and to provide auxiliary confidence data for subsequent use in an automated video surveillance system. The proposed framework consists of a parallel system of classifiers, referred to as observers, where each observer is trained on one face region. The observer outputs are combined to yield the final recognition result. Three of the four confounding factors-expression, illumination, and decoration-are specifically addressed in this paper. The extension of the proposed approach to address the fourth confounding factor-pose-is straightforward and well supported in previous work. A further contribution of the proposed approach is the computation of a revealing confidence measure. This confidence measure will aid the subsequent application of the proposed method to video surveillance scenarios. Results are reported for a database comprising 676 images of 160 subjects under a variety of challenging circumstances. These results indicate significant performance improvements over previous methods and demonstrate the usefulness of the confidence data
Obscene Video Recognition Using Fuzzy SVM and New Sets of Features

Directory of Open Access Journals (Sweden)

Alireza Behrad

2013-02-01

Full Text Available In this paper, a novel approach for identifying normal and obscene videos is proposed. In order to classify different episodes of a video independently and discard the need to process all frames, first, key frames are extracted and skin regions are detected for groups of video frames starting with key frames. In the second step, three different features including 1- structural features based on single frame information, 2- features based on spatiotemporal volume and 3-motion-based features, are extracted for each episode of video. The PCA-LDA method is then applied to reduce the size of structural features and select more distinctive features. For the final step, we use fuzzy or a Weighted Support Vector Machine (WSVM classifier to identify video episodes. We also employ a multilayer Kohonen network as an initial clustering algorithm to increase the ability to discriminate between the extracted features into two classes of videos. Features based on motion and periodicity characteristics increase the efficiency of the proposed algorithm in videos with bad illumination and skin colour variation. The proposed method is evaluated using 1100 videos in different environmental and illumination conditions. The experimental results show a correct recognition rate of 94.2% for the proposed algorithm.
An Efficient Solution for Hand Gesture Recognition from Video Sequence

Directory of Open Access Journals (Sweden)

PRODAN, R.-C.

2012-08-01

Full Text Available The paper describes a system of hand gesture recognition by image processing for human robot interaction. The recognition and interpretation of the hand postures acquired through a video camera allow the control of the robotic arm activity: motion - translation and rotation in 3D - and tightening/releasing the clamp. A gesture dictionary was defined and heuristic algorithms for recognition were developed and tested. The system can be used for academic and industrial purposes, especially for those activities where the movements of the robotic arm were not previously scheduled, for training the robot easier than using a remote control. Besides the gesture dictionary, the novelty of the paper consists in a new technique for detecting the relative positions of the fingers in order to recognize the various hand postures, and in the achievement of a robust system for controlling robots by postures of the hands.
Sex differences in facial emotion recognition across varying expression intensity levels from videos.

Directory of Open Access Journals (Sweden)

Tanja S H Wingenbach

Full Text Available There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or 'extreme' examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates and response latencies for emotion recognition using short video stimuli (1sec of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral across three variations in the intensity of the emotional expression (low, intermediate, high in an adolescent and adult sample (N = 111; 51 male, 60 female aged between 16 and 45 (M = 22.2, SD = 5.7. Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations.
Emotion Index of Cover Song Music Video Clips based on Facial Expression Recognition

DEFF Research Database (Denmark)

Kavallakis, George; Vidakis, Nikolaos; Triantafyllidis, Georgios

2017-01-01

This paper presents a scheme of creating an emotion index of cover song music video clips by recognizing and classifying facial expressions of the artist in the video. More specifically, it fuses effective and robust algorithms which are employed for expression recognition, along with the use...... of a neural network system using the features extracted by the SIFT algorithm. Also we support the need of this fusion of different expression recognition algorithms, because of the way that emotions are linked to facial expressions in music video clips....
Image quality assessment for video stream recognition systems

Science.gov (United States)

Chernov, Timofey S.; Razumnuy, Nikita P.; Kozharinov, Alexander S.; Nikolaev, Dmitry P.; Arlazarov, Vladimir V.

2018-04-01

Recognition and machine vision systems have long been widely used in many disciplines to automate various processes of life and industry. Input images of optical recognition systems can be subjected to a large number of different distortions, especially in uncontrolled or natural shooting conditions, which leads to unpredictable results of recognition systems, making it impossible to assess their reliability. For this reason, it is necessary to perform quality control of the input data of recognition systems, which is facilitated by modern progress in the field of image quality evaluation. In this paper, we investigate the approach to designing optical recognition systems with built-in input image quality estimation modules and feedback, for which the necessary definitions are introduced and a model for describing such systems is constructed. The efficiency of this approach is illustrated by the example of solving the problem of selecting the best frames for recognition in a video stream for a system with limited resources. Experimental results are presented for the system for identity documents recognition, showing a significant increase in the accuracy and speed of the system under simulated conditions of automatic camera focusing, leading to blurring of frames.
Improving human object recognition performance using video enhancement techniques

Science.gov (United States)

Whitman, Lucy S.; Lewis, Colin; Oakley, John P.

2004-12-01

Atmospheric scattering causes significant degradation in the quality of video images, particularly when imaging over long distances. The principle problem is the reduction in contrast due to scattered light. It is known that when the scattering particles are not too large compared with the imaging wavelength (i.e. Mie scattering) then high spatial resolution information may be contained within a low-contrast image. Unfortunately this information is not easily perceived by a human observer, particularly when using a standard video monitor. A secondary problem is the difficulty of achieving a sharp focus since automatic focus techniques tend to fail in such conditions. Recently several commercial colour video processing systems have become available. These systems use various techniques to improve image quality in low contrast conditions whilst retaining colour content. These systems produce improvements in subjective image quality in some situations, particularly in conditions of haze and light fog. There is also some evidence that video enhancement leads to improved ATR performance when used as a pre-processing stage. Psychological literature indicates that low contrast levels generally lead to a reduction in the performance of human observers in carrying out simple visual tasks. The aim of this paper is to present the results of an empirical study on object recognition in adverse viewing conditions. The chosen visual task was vehicle number plate recognition at long ranges (500 m and beyond). Two different commercial video enhancement systems are evaluated using the same protocol. The results show an increase in effective range with some differences between the different enhancement systems.
Robust audio-visual speech recognition under noisy audio-video conditions.

Science.gov (United States)

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Finding and Improving the Key-Frames of Long Video Sequences for Face Recognition

DEFF Research Database (Denmark)

Nasrollahi, Kamal; Moeslund, Thomas B.

2010-01-01

Face recognition systems are very sensitive to the quality and resolution of their input face images. This makes such systems unreliable when working with long surveillance video sequences without employing some selection and enhancement algorithms. On the other hand, processing all the frames...... of such video sequences by any enhancement or even face recognition algorithm is demanding. Thus, there is a need for a mechanism to summarize the input video sequence to a set of key-frames and then applying an enhancement algorithm to this subset. This paper presents a system doing exactly this. The system...... uses face quality assessment to select the key-frames and a hybrid super-resolution to enhance the face image quality. The suggested system that employs a linear associator face recognizer to evaluate the enhanced results has been tested on real surveillance video sequences and the experimental results...
Online and unsupervised face recognition for continuous video stream

Science.gov (United States)

Huo, Hongwen; Feng, Jufu

2009-10-01

We present a novel online face recognition approach for video stream in this paper. Our method includes two stages: pre-training and online training. In the pre-training phase, our method observes interactions, collects batches of input data, and attempts to estimate their distributions (Box-Cox transformation is adopted here to normalize rough estimates). In the online training phase, our method incrementally improves classifiers' knowledge of the face space and updates it continuously with incremental eigenspace analysis. The performance achieved by our method shows its great potential in video stream processing.
A Depth Video Sensor-Based Life-Logging Human Activity Recognition System for Elderly Care in Smart Indoor Environments

Directory of Open Access Journals (Sweden)

Ahmad Jalal

2014-07-01

Full Text Available Recent advancements in depth video sensors technologies have made human activity recognition (HAR realizable for elderly monitoring applications. Although conventional HAR utilizes RGB video sensors, HAR could be greatly improved with depth video sensors which produce depth or distance information. In this paper, a depth-based life logging HAR system is designed to recognize the daily activities of elderly people and turn these environments into an intelligent living space. Initially, a depth imaging sensor is used to capture depth silhouettes. Based on these silhouettes, human skeletons with joint information are produced which are further used for activity recognition and generating their life logs. The life-logging system is divided into two processes. Firstly, the training system includes data collection using a depth camera, feature extraction and training for each activity via Hidden Markov Models. Secondly, after training, the recognition engine starts to recognize the learned activities and produces life logs. The system was evaluated using life logging features against principal component and independent component features and achieved satisfactory recognition rates against the conventional approaches. Experiments conducted on the smart indoor activity datasets and the MSRDailyActivity3D dataset show promising results. The proposed system is directly applicable to any elderly monitoring system, such as monitoring healthcare problems for elderly people, or examining the indoor activities of people at home, office or hospital.
Visual Self-Recognition in Mirrors and Live Videos: Evidence for a Developmental Asynchrony

Science.gov (United States)

Suddendorf, Thomas; Simcock, Gabrielle; Nielsen, Mark

2007-01-01

Three experiments (N = 123) investigated the development of live-video self-recognition using the traditional mark test. In Experiment 1, 24-, 30- and 36-month-old children saw a live video image of equal size and orientation as a control group saw in a mirror. The video version of the test was more difficult than the mirror version with only the…
Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Directory of Open Access Journals (Sweden)

Seymour Rowan

2008-01-01

Full Text Available Abstract We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Directory of Open Access Journals (Sweden)

Ji Ming

2008-03-01

Full Text Available We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Class Energy Image Analysis for Video Sensor-Based Gait Recognition: A Review

Directory of Open Access Journals (Sweden)

Zhuowen Lv

2015-01-01

Full Text Available Gait is a unique perceptible biometric feature at larger distances, and the gait representation approach plays a key role in a video sensor-based gait recognition system. Class Energy Image is one of the most important gait representation methods based on appearance, which has received lots of attentions. In this paper, we reviewed the expressions and meanings of various Class Energy Image approaches, and analyzed the information in the Class Energy Images. Furthermore, the effectiveness and robustness of these approaches were compared on the benchmark gait databases. We outlined the research challenges and provided promising future directions for the field. To the best of our knowledge, this is the first review that focuses on Class Energy Image. It can provide a useful reference in the literature of video sensor-based gait representation approach.
Linking Video and Text via Representations of Narrative

OpenAIRE

Salway, Andrew; Graham, Mike; Tomadaki, Eleftheria; Xu, Yan

2003-01-01

The ongoing TIWO project is investigating the synthesis of language technologies, like information extraction and corpus-based text analysis, video data modeling and knowledge representation. The aim is to develop a computational account of how video and text can be integrated by representations of narrative in multimedia systems. The multimedia domain is that of film and audio description – an emerging text type that is produced specifically to be informative about the events and objects dep...
An Embedded Application for Degraded Text Recognition

Directory of Open Access Journals (Sweden)

Thillou Céline

2005-01-01

Full Text Available This paper describes a mobile device which tries to give the blind or visually impaired access to text information. Three key technologies are required for this system: text detection, optical character recognition, and speech synthesis. Blind users and the mobile environment imply two strong constraints. First, pictures will be taken without control on camera settings and a priori information on text (font or size and background. The second issue is to link several techniques together with an optimal compromise between computational constraints and recognition efficiency. We will present the overall description of the system from text detection to OCR error correction.

Applied learning-based color tone mapping for face recognition in video surveillance system

Science.gov (United States)

Yew, Chuu Tian; Suandi, Shahrel Azmin

2012-04-01

In this paper, we present an applied learning-based color tone mapping technique for video surveillance system. This technique can be applied onto both color and grayscale surveillance images. The basic idea is to learn the color or intensity statistics from a training dataset of photorealistic images of the candidates appeared in the surveillance images, and remap the color or intensity of the input image so that the color or intensity statistics match those in the training dataset. It is well known that the difference in commercial surveillance cameras models, and signal processing chipsets used by different manufacturers will cause the color and intensity of the images to differ from one another, thus creating additional challenges for face recognition in video surveillance system. Using Multi-Class Support Vector Machines as the classifier on a publicly available video surveillance camera database, namely SCface database, this approach is validated and compared to the results of using holistic approach on grayscale images. The results show that this technique is suitable to improve the color or intensity quality of video surveillance system for face recognition.
Recognition of Indian Sign Language in Live Video

Science.gov (United States)

Singha, Joyeeta; Das, Karen

2013-05-01

Sign Language Recognition has emerged as one of the important area of research in Computer Vision. The difficulty faced by the researchers is that the instances of signs vary with both motion and appearance. Thus, in this paper a novel approach for recognizing various alphabets of Indian Sign Language is proposed where continuous video sequences of the signs have been considered. The proposed system comprises of three stages: Preprocessing stage, Feature Extraction and Classification. Preprocessing stage includes skin filtering, histogram matching. Eigen values and Eigen Vectors were considered for feature extraction stage and finally Eigen value weighted Euclidean distance is used to recognize the sign. It deals with bare hands, thus allowing the user to interact with the system in natural way. We have considered 24 different alphabets in the video sequences and attained a success rate of 96.25%.
Extract the Relational Information of Static Features and Motion Features for Human Activities Recognition in Videos

Directory of Open Access Journals (Sweden)

Li Yao

2016-01-01

Full Text Available Both static features and motion features have shown promising performance in human activities recognition task. However, the information included in these features is insufficient for complex human activities. In this paper, we propose extracting relational information of static features and motion features for human activities recognition. The videos are represented by a classical Bag-of-Word (BoW model which is useful in many works. To get a compact and discriminative codebook with small dimension, we employ the divisive algorithm based on KL-divergence to reconstruct the codebook. After that, to further capture strong relational information, we construct a bipartite graph to model the relationship between words of different feature set. Then we use a k-way partition to create a new codebook in which similar words are getting together. With this new codebook, videos can be represented by a new BoW vector with strong relational information. Moreover, we propose a method to compute new clusters from the divisive algorithm’s projective function. We test our work on the several datasets and obtain very promising results.
Data-Model Relationship in Text-Independent Speaker Recognition

Directory of Open Access Journals (Sweden)

Stapert Robert

2005-01-01

Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.
Activity Recognition Using A Combination of Category Components And Local Models for Video Surveillance

OpenAIRE

Lin, Weiyao; Sun, Ming-Ting; Poovendran, Radha; Zhang, Zhengyou

2015-01-01

This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components, and demonstrate that this approach offers flexibility to add new activities to the system and an ability to deal with the problem of building models for activities lacking training data. For improving the recognition accuracy, a Confident-Frame- based Recognition algorithm is also proposed, where th...
Identifying sports videos using replay, text, and camera motion features

Science.gov (United States)

Kobla, Vikrant; DeMenthon, Daniel; Doermann, David S.

1999-12-01

Automated classification of digital video is emerging as an important piece of the puzzle in the design of content management systems for digital libraries. The ability to classify videos into various classes such as sports, news, movies, or documentaries, increases the efficiency of indexing, browsing, and retrieval of video in large databases. In this paper, we discuss the extraction of features that enable identification of sports videos directly from the compressed domain of MPEG video. These features include detecting the presence of action replays, determining the amount of scene text in vide, and calculating various statistics on camera and/or object motion. The features are derived from the macroblock, motion,and bit-rate information that is readily accessible from MPEG video with very minimal decoding, leading to substantial gains in processing speeds. Full-decoding of selective frames is required only for text analysis. A decision tree classifier built using these features is able to identify sports clips with an accuracy of about 93 percent.
Sex differences in facial emotion recognition across varying expression intensity levels from videos.

Science.gov (United States)

Wingenbach, Tanja S H; Ashwin, Chris; Brosnan, Mark

2018-01-01

There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or 'extreme' examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations.
Sex differences in facial emotion recognition across varying expression intensity levels from videos

Science.gov (United States)

2018-01-01

There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or ‘extreme’ examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations. PMID:29293674
Real-time billboard trademark detection and recognition in sports video

Science.gov (United States)

Bu, Jiang; Lao, Song-Yan; Bai, Liang

2013-03-01

Nowadays, different applications like automatic video indexing, keyword based video search and TV commercials can be developed by detecting and recognizing the billboard trademark. We propose a hierarchical solution for real-time billboard trademark recognition in various sports video, billboard frames are detected in the first level, fuzzy decision tree with easily-computing features are employed to accelerate the process, while in the second level, color and regional SIFT features are combined for the first time to describe the appearance of trademarks, and the shared nearest neighbor (SNN) clustering with x2 distance is utilized instead of traditional K-means clustering to construct the SIFT vocabulary, at last, Latent Semantic Analysis (LSA) based SIFT vocabulary matching is performed on the template trademark and the candidate regions in billboard frame. The preliminary experiments demonstrate the effectiveness of the hierarchical solution, and real time constraints are also met by our solution.
Patient perceptions of text-messages, email, and video in dermatologic surgery patients.

Science.gov (United States)

Hawkins, Spencer D; Barilla, Steven; Williford, Phillip Williford M; Feldman, Steven R; Pearce, Daniel J

2017-04-14

We developed dermatology patient education videos and a post-operative text message service that could be accessed universally via web based applications. A secondary outcome of the study was to assess patient opinions of text-messages, email, and video in the health care setting which is reported here. An investigator-blinded, randomized, controlled intervention was evaluated in 90 nonmelanoma MMS patients at Wake Forest Baptist Dermatology. Patients were randomized 1:1:1:1 for exposure to: 1) videos with text messages, 2) videos only, 3) text messages-only, or 4) standard of care. Assessment measures were obtained by the use of REDCap survey questions during the follow up visit. 1) 67% would like to receive an email with information about the procedure beforehand 2) 98% of patients reported they would like other doctors to use educational videos as a form of patient education 3) 88% of our patients think it is appropriate for physicians to communicate to patients via text message in certain situations. Nearly all patients desired physicians to use text-messages and video in their practice and the majority of patients preferred to receive an email with information about their procedure beforehand.
RGBD Video Based Human Hand Trajectory Tracking and Gesture Recognition System

Directory of Open Access Journals (Sweden)

Weihua Liu

2015-01-01

Full Text Available The task of human hand trajectory tracking and gesture trajectory recognition based on synchronized color and depth video is considered. Toward this end, in the facet of hand tracking, a joint observation model with the hand cues of skin saliency, motion and depth is integrated into particle filter in order to move particles to local peak in the likelihood. The proposed hand tracking method, namely, salient skin, motion, and depth based particle filter (SSMD-PF, is capable of improving the tracking accuracy considerably, in the context of the signer performing the gesture toward the camera device and in front of moving, cluttered backgrounds. In the facet of gesture recognition, a shape-order context descriptor on the basis of shape context is introduced, which can describe the gesture in spatiotemporal domain. The efficient shape-order context descriptor can reveal the shape relationship and embed gesture sequence order information into descriptor. Moreover, the shape-order context leads to a robust score for gesture invariant. Our approach is complemented with experimental results on the settings of the challenging hand-signed digits datasets and American sign language dataset, which corroborate the performance of the novel techniques.
Evaluation on the use of animated narrative video in teaching narrative text

Directory of Open Access Journals (Sweden)

Soe’oed Rahmat

2018-01-01

Full Text Available In the 21st century, our life is strongly affected by the information technology. Educational technology has been rapidly improved by the development of audiovisual tools. Teachers may choose a number of different types of resources for teaching purposes, including videos and movies. Therefore, this study is aimed at evaluating animated narrative videos from YouTube for the teaching narrative text and identifying potential factors which influence the quality of educational videos. The videos were examined by using assessment rubric to see the quality and suitability of animated narrative videos which might be used in the teaching narrative text. The rubric was adapted from Prince Edward Island (PEI Department of Education: Evaluation and Selection of Learning Resources. It consists of four criteria, content, structure, instructional design, and technical design In addition, the study presents critical awareness of how these aspects can be interpreted to measure animated narrative videos and at the same time the engagement of the teachers in exploring animated narrative videos used in classroom.
Automatic Human Facial Expression Recognition Based on Integrated Classifier From Monocular Video with Uncalibrated Camera

Directory of Open Access Journals (Sweden)

Yu Tao

2017-01-01

Full Text Available An automatic recognition framework for human facial expressions from a monocular video with an uncalibrated camera is proposed. The expression characteristics are first acquired from a kind of deformable template, similar to a facial muscle distribution. After associated regularization, the time sequences from the trait changes in space-time under complete expressional production are then arranged line by line in a matrix. Next, the matrix dimensionality is reduced by a method of manifold learning of neighborhood-preserving embedding. Finally, the refined matrix containing the expression trait information is recognized by a classifier that integrates the hidden conditional random field (HCRF and support vector machine (SVM. In an experiment using the Cohn–Kanade database, the proposed method showed a comparatively higher recognition rate than the individual HCRF or SVM methods in direct recognition from two-dimensional human face traits. Moreover, the proposed method was shown to be more robust than the typical Kotsia method because the former contains more structural characteristics of the data to be classified in space-time
Nonlinear analysis and synthesis of video images using deep dynamic bottleneck neural networks for face recognition.

Science.gov (United States)

Moghadam, Saeed Montazeri; Seyyedsalehi, Seyyed Ali

2018-05-31

Nonlinear components extracted from deep structures of bottleneck neural networks exhibit a great ability to express input space in a low-dimensional manifold. Sharing and combining the components boost the capability of the neural networks to synthesize and interpolate new and imaginary data. This synthesis is possibly a simple model of imaginations in human brain where the components are expressed in a nonlinear low dimensional manifold. The current paper introduces a novel Dynamic Deep Bottleneck Neural Network to analyze and extract three main features of videos regarding the expression of emotions on the face. These main features are identity, emotion and expression intensity that are laid in three different sub-manifolds of one nonlinear general manifold. The proposed model enjoying the advantages of recurrent networks was used to analyze the sequence and dynamics of information in videos. It is noteworthy to mention that this model also has also the potential to synthesize new videos showing variations of one specific emotion on the face of unknown subjects. Experiments on discrimination and recognition ability of extracted components showed that the proposed model has an average of 97.77% accuracy in recognition of six prominent emotions (Fear, Surprise, Sadness, Anger, Disgust, and Happiness), and 78.17% accuracy in the recognition of intensity. The produced videos revealed variations from neutral to the apex of an emotion on the face of the unfamiliar test subject which is on average 0.8 similar to reference videos in the scale of the SSIM method. Copyright © 2018 Elsevier Ltd. All rights reserved.
An Introduction to Face Recognition Technology

Directory of Open Access Journals (Sweden)

Shang-Hung Lin

2000-01-01

Full Text Available Recently face recognition is attracting much attention in the society of network multimedia information access. Areas such as network security, content indexing and retrieval, and video compression benefits from face recognition technology because "people" are the center of attention in a lot of video. Network access control via face recognition not only makes hackers virtually impossible to steal one's "password", but also increases the user-friendliness in human-computer interaction. Indexing and/or retrieving video data based on the appearances of particular persons will be useful for users such as news reporters, political scientists, and moviegoers. For the applications of videophone and teleconferencing, the assistance of face recognition also provides a more efficient coding scheme. In this paper, we give an introductory course of this new information processing technology. The paper shows the readers the generic framework for the face recognition system, and the variants that are frequently encountered by the face recognizer. Several famous face recognition algorithms, such as eigenfaces and neural networks, will also be explained.
Automatic Association of Chats and Video Tracks for Activity Learning and Recognition in Aerial Video Surveillance

Directory of Open Access Journals (Sweden)

Riad I. Hammoud

2014-10-01

Full Text Available We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA and multi-media indexing and explorer (MINER. VIVA utilizes analyst call-outs (ACOs in the form of chat messages (voice-to-text to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1 a fusion of graphical track and text data using probabilistic methods; (2 an activity pattern learning framework to support querying an index of activities of interest (AOIs and targets of interest (TOIs by movement type and geolocation; and (3 a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV. VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
Deep Belief Networks Based Toponym Recognition for Chinese Text

Directory of Open Access Journals (Sweden)

Shu Wang

2018-06-01

Full Text Available In Geographical Information Systems, geo-coding is used for the task of mapping from implicitly geo-referenced data to explicitly geo-referenced coordinates. At present, an enormous amount of implicitly geo-referenced information is hidden in unstructured text, e.g., Wikipedia, social data and news. Toponym recognition is the foundation of mining this useful geo-referenced information by identifying words as toponyms in text. In this paper, we propose an adapted toponym recognition approach based on deep belief network (DBN by exploring two key issues: word representation and model interpretation. A Skip-Gram model is used in the word representation process to represent words with contextual information that are ignored by current word representation models. We then determine the core hyper-parameters of the DBN model by illustrating the relationship between the performance and the hyper-parameters, e.g., vector dimensionality, DBN structures and probability thresholds. The experiments evaluate the performance of the Skip-Gram model implemented by the Word2Vec open-source tool, determine stable hyper-parameters and compare our approach with a conditional random field (CRF based approach. The experimental results show that the DBN model outperforms the CRF model with smaller corpus. When the corpus size is large enough, their statistical metrics become approaching. However, their recognition results express differences and complementarity on different kinds of toponyms. More importantly, combining their results can directly improve the performance of toponym recognition relative to their individual performances. It seems that the scale of the corpus has an obvious effect on the performance of toponym recognition. Generally, there is no adequate tagged corpus on specific toponym recognition tasks, especially in the era of Big Data. In conclusion, we believe that the DBN-based approach is a promising and powerful method to extract geo
High-emulation mask recognition with high-resolution hyperspectral video capture system

Science.gov (United States)

Feng, Jiao; Fang, Xiaojing; Li, Shoufeng; Wang, Yongjin

2014-11-01

We present a method for distinguishing human face from high-emulation mask, which is increasingly used by criminals for activities such as stealing card numbers and passwords on ATM. Traditional facial recognition technique is difficult to detect such camouflaged criminals. In this paper, we use the high-resolution hyperspectral video capture system to detect high-emulation mask. A RGB camera is used for traditional facial recognition. A prism and a gray scale camera are used to capture spectral information of the observed face. Experiments show that mask made of silica gel has different spectral reflectance compared with the human skin. As multispectral image offers additional spectral information about physical characteristics, high-emulation mask can be easily recognized.
Human Pose Estimation and Activity Recognition from Multi-View Videos

DEFF Research Database (Denmark)

Holte, Michael Boelstoft; Tran, Cuong; Trivedi, Mohan

2012-01-01

approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi......–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent......-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more...
Viewpoint Manifolds for Action Recognition

Directory of Open Access Journals (Sweden)

Souvenir Richard

2009-01-01

Full Text Available Abstract Action recognition from video is a problem that has many important applications to human motion analysis. In real-world settings, the viewpoint of the camera cannot always be fixed relative to the subject, so view-invariant action recognition methods are needed. Previous view-invariant methods use multiple cameras in both the training and testing phases of action recognition or require storing many examples of a single action from multiple viewpoints. In this paper, we present a framework for learning a compact representation of primitive actions (e.g., walk, punch, kick, sit that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation. Using our method, which models the low-dimensional structure of these actions relative to viewpoint, we show recognition rates on a publicly available dataset previously only achieved using multiple simultaneous views.

Incongruence Between Observers’ and Observed Facial Muscle Activation Reduces Recognition of Emotional Facial Expressions From Video Stimuli

Directory of Open Access Journals (Sweden)

Tanja S. H. Wingenbach

2018-06-01

Full Text Available According to embodied cognition accounts, viewing others’ facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others’ facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a explicit imitation of viewed facial emotional expressions (stimulus-congruent condition, (b pen-holding with the lips (stimulus-incongruent condition, and (c passive viewing (control condition. It was hypothesised that (1 experimental condition (a and (b result in greater facial muscle activity than (c, (2 experimental condition (a increases emotion recognition accuracy from others’ faces compared to (c, (3 experimental condition (b lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c. Participants (42 males, 42 females underwent a facial emotion recognition experiment (ADFES-BIV while electromyography (EMG was recorded from five facial muscle sites. The experimental conditions’ order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed.
Entity recognition from clinical texts via recurrent neural network.

Science.gov (United States)

Liu, Zengjian; Yang, Ming; Wang, Xiaolong; Chen, Qingcai; Tang, Buzhou; Wang, Zhe; Xu, Hua

2017-07-05

Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer - generates representation of each word of a sentence; LSTM layer - outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer - makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge
Combining high-speed SVM learning with CNN feature encoding for real-time target recognition in high-definition video for ISR missions

Science.gov (United States)

Kroll, Christine; von der Werth, Monika; Leuck, Holger; Stahl, Christoph; Schertler, Klaus

2017-05-01

For Intelligence, Surveillance, Reconnaissance (ISR) missions of manned and unmanned air systems typical electrooptical payloads provide high-definition video data which has to be exploited with respect to relevant ground targets in real-time by automatic/assisted target recognition software. Airbus Defence and Space is developing required technologies for real-time sensor exploitation since years and has combined the latest advances of Deep Convolutional Neural Networks (CNN) with a proprietary high-speed Support Vector Machine (SVM) learning method into a powerful object recognition system with impressive results on relevant high-definition video scenes compared to conventional target recognition approaches. This paper describes the principal requirements for real-time target recognition in high-definition video for ISR missions and the Airbus approach of combining an invariant feature extraction using pre-trained CNNs and the high-speed training and classification ability of a novel frequency-domain SVM training method. The frequency-domain approach allows for a highly optimized implementation for General Purpose Computation on a Graphics Processing Unit (GPGPU) and also an efficient training of large training samples. The selected CNN which is pre-trained only once on domain-extrinsic data reveals a highly invariant feature extraction. This allows for a significantly reduced adaptation and training of the target recognition method for new target classes and mission scenarios. A comprehensive training and test dataset was defined and prepared using relevant high-definition airborne video sequences. The assessment concept is explained and performance results are given using the established precision-recall diagrams, average precision and runtime figures on representative test data. A comparison to legacy target recognition approaches shows the impressive performance increase by the proposed CNN+SVM machine-learning approach and the capability of real-time high
Emotion recognition techniques using physiological signals and video games -Systematic review-

OpenAIRE

Callejas-Cuervo, Mauro; Martínez-Tejada, Laura Alejandra; Alarcón-Aldana, Andrea Catherine

2017-01-01

Abstract Emotion recognition systems from physiological signals are innovative techniques that allow studying the behavior and reaction of an individual when exposed to information that may evoke emotional reactions through multimedia tools, for example, video games. This type of approach is used to identify the behavior of an individual in different fields, such as medicine, education, psychology, etc., in order to assess the effect that the content has on the individual that is interacting ...
If a Picture Is Worth a Thousand Words Is Video Worth a Million? Differences in Affective and Cognitive Processing of Video and Text Cases

Science.gov (United States)

Yadav, Aman; Phillips, Michael M.; Lundeberg, Mary A.; Koehler, Matthew J.; Hilden, Katherine; Dirkin, Kathryn H.

2011-01-01

In this investigation we assessed whether different formats of media (video, text, and video + text) influenced participants' engagement, cognitive processing and recall of non-fiction cases of people diagnosed with HIV/AIDS. For each of the cases used in the study, we designed three informationally-equivalent versions: video, text, and video +…
Sudden Event Recognition: A Survey

Directory of Open Access Journals (Sweden)

Mohd Asyraf Zulkifley

2013-08-01

Full Text Available Event recognition is one of the most active research areas in video surveillance fields. Advancement in event recognition systems mainly aims to provide convenience, safety and an efficient lifestyle for humanity. A precise, accurate and robust approach is necessary to enable event recognition systems to respond to sudden changes in various uncontrolled environments, such as the case of an emergency, physical threat and a fire or bomb alert. The performance of sudden event recognition systems depends heavily on the accuracy of low level processing, like detection, recognition, tracking and machine learning algorithms. This survey aims to detect and characterize a sudden event, which is a subset of an abnormal event in several video surveillance applications. This paper discusses the following in detail: (1 the importance of a sudden event over a general anomalous event; (2 frameworks used in sudden event recognition; (3 the requirements and comparative studies of a sudden event recognition system and (4 various decision-making approaches for sudden event recognition. The advantages and drawbacks of using 3D images from multiple cameras for real-time application are also discussed. The paper concludes with suggestions for future research directions in sudden event recognition.
Automatic association of chats and video tracks for activity learning and recognition in aerial video surveillance.

Science.gov (United States)

Hammoud, Riad I; Sahin, Cem S; Blasch, Erik P; Rhodes, Bradley J; Wang, Tao

2014-10-22

We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA) and multi-media indexing and explorer (MINER). VIVA utilizes analyst call-outs (ACOs) in the form of chat messages (voice-to-text) to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1) a fusion of graphical track and text data using probabilistic methods; (2) an activity pattern learning framework to support querying an index of activities of interest (AOIs) and targets of interest (TOIs) by movement type and geolocation; and (3) a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV). VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments.

Science.gov (United States)

Jalal, Ahmad; Kamal, Shaharyar; Kim, Daijin

2014-07-02

Recent advancements in depth video sensors technologies have made human activity recognition (HAR) realizable for elderly monitoring applications. Although conventional HAR utilizes RGB video sensors, HAR could be greatly improved with depth video sensors which produce depth or distance information. In this paper, a depth-based life logging HAR system is designed to recognize the daily activities of elderly people and turn these environments into an intelligent living space. Initially, a depth imaging sensor is used to capture depth silhouettes. Based on these silhouettes, human skeletons with joint information are produced which are further used for activity recognition and generating their life logs. The life-logging system is divided into two processes. Firstly, the training system includes data collection using a depth camera, feature extraction and training for each activity via Hidden Markov Models. Secondly, after training, the recognition engine starts to recognize the learned activities and produces life logs. The system was evaluated using life logging features against principal component and independent component features and achieved satisfactory recognition rates against the conventional approaches. Experiments conducted on the smart indoor activity datasets and the MSRDailyActivity3D dataset show promising results. The proposed system is directly applicable to any elderly monitoring system, such as monitoring healthcare problems for elderly people, or examining the indoor activities of people at home, office or hospital.
Delayed Video Self-Recognition in Children with High Vo Functioning Autism and Asperger's Disorder

Science.gov (United States)

Dissanayake, Cheryl; Shembrey, Joh; Suddendorf, Thomas

2010-01-01

Two studies are reported which investigate delayed video self-recognition (DSR) in children with autistic disorder and Asperger's disorder relative to one another and to their typically developing peers. A secondary aim was to establish whether DSR ability is dependent on metarepresentational ability. Children's verbal and affective responses to…
Automated Indexing and Search of Video Data in Large Collections with inVideo

Directory of Open Access Journals (Sweden)

Shuangbao Paul Wang

2017-08-01

Full Text Available In this paper, we present a novel system, inVideo, for automatically indexing and searching videos based on the keywords spoken in the audio track and the visual content of the video frames. Using the highly efficient video indexing engine we developed, inVideo is able to analyze videos using machine learning and pattern recognition without the need for initial viewing by a human. The time-stamped commenting and tagging features refine the accuracy of search results. The cloud-based implementation makes it possible to conduct elastic search, augmented search, and data analytics. Our research shows that inVideo presents an efficient tool in processing and analyzing videos and increasing interactions in video-based online learning environment. Data from a cybersecurity program with more than 500 students show that applying inVideo to current video material, interactions between student-student and student-faculty increased significantly across 24 sections program-wide.
Word-level recognition of multifont Arabic text using a feature vector matching approach

Science.gov (United States)

Erlandson, Erik J.; Trenkle, John M.; Vogt, Robert C., III

1996-03-01

Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition system for machine-printed Arabic text has been implemented. Arabic is a script language, and is therefore difficult to segment at the character level. Character segmentation has been avoided by recognizing text imagery of complete words. The Arabic recognition system computes a vector of image-morphological features on a query word image. This vector is matched against a precomputed database of vectors from a lexicon of Arabic words. Vectors from the database with the highest match score are returned as hypotheses for the unknown image. Several feature vectors may be stored for each word in the database. Database feature vectors generated using multiple fonts and noise models allow the system to be tuned to its input stream. Used in conjunction with database pruning techniques, this Arabic recognition system has obtained promising word recognition rates on low-quality multifont text imagery.
Advanced digital video surveillance for safeguard and physical protection

International Nuclear Information System (INIS)

Kumar, R.

2002-01-01

Full text: Video surveillance is a very crucial component in safeguard and physical protection. Digital technology has revolutionized the surveillance scenario and brought in various new capabilities like better image quality, faster search and retrieval of video images, less storage space for recording, efficient transmission and storage of video, better protection of recorded video images, and easy remote accesses to live and recorded video etc. The basic safeguard requirement for verifiably uninterrupted surveillance has remained largely unchanged since its inception. However, changes to the inspection paradigm to admit automated review and remote monitoring have dramatically increased the demands on safeguard surveillance system. Today's safeguard systems can incorporate intelligent motion detection with very low rate of false alarm and less archiving volume, embedded image processing capability for object behavior and event based indexing, object recognition, efficient querying and report generation etc. It also demands cryptographically authenticating, encrypted, and highly compressed video data for efficient, secure, tamper indicating and transmission. In physical protection, intelligent on robust video motion detection, real time moving object detection and tracking from stationary and moving camera platform, multi-camera cooperative tracking, activity detection and recognition, human motion analysis etc. is going to play a key rote in perimeter security. Incorporation of front and video imagery exploitation tools like automatic number plate recognition, vehicle identification and classification, vehicle undercarriage inspection, face recognition, iris recognition and other biometric tools, gesture recognition etc. makes personnel and vehicle access control robust and foolproof. Innovative digital image enhancement techniques coupled with novel sensor design makes low cost, omni-directional vision capable, all weather, day night surveillance a reality
Scale Invariant Gabor Descriptor-Based Noncooperative Iris Recognition

Directory of Open Access Journals (Sweden)

Du Yingzi

2010-01-01

Full Text Available Abstract A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Scale Invariant Gabor Descriptor-based Noncooperative Iris Recognition

Directory of Open Access Journals (Sweden)

Zhi Zhou

2010-01-01

Full Text Available A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
A Flexible Object-of-Interest Annotation Framework for Online Video Portals

Directory of Open Access Journals (Sweden)

Robert Sorschag

2012-02-01

Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.
Search the Audio, Browse the Video—A Generic Paradigm for Video Collections

Directory of Open Access Journals (Sweden)

Efrat Alon

2003-01-01

Full Text Available The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.
Video- or text-based e-learning when teaching clinical procedures? A randomized controlled trial

Directory of Open Access Journals (Sweden)

Buch SV

2014-08-01

Full Text Available Steen Vigh Buch,1 Frederik Philip Treschow,2 Jesper Brink Svendsen,3 Bjarne Skjødt Worm4 1Department of Vascular Surgery, Rigshospitalet, Copenhagen, Denmark; 2Department of Anesthesia and Intensive Care, Herlev Hospital, Copenhagen, Denmark; 3Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; 4Department of Anesthesia and Intensive Care, Bispebjerg Hospital, Copenhagen, Denmark Background and aims: This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Materials and methods: Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. Results: The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (P<0.001 and in the follow-up test (P<0.01. Regarding theoretical knowledge, no differences were found between the groups on the primary test, though the video group performed better on the follow-up test (P=0.04. Conclusion: Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills. Keywords: e-learning, video versus text, medicine, clinical skills
Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction

Directory of Open Access Journals (Sweden)

Dat Tien Nguyen

2017-03-01

Full Text Available Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT, speed-up robust feature (SURF, local binary patterns (LBP, histogram of oriented gradients (HOG, and weighted HOG. Recently, the convolutional neural network (CNN method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Exploring Techniques for Vision Based Human Activity Recognition: Methods, Systems, and Evaluation

Directory of Open Access Journals (Sweden)

Hong Zhang

2013-01-01

Full Text Available With the wide applications of vision based intelligent systems, image and video analysis technologies have attracted the attention of researchers in the computer vision field. In image and video analysis, human activity recognition is an important research direction. By interpreting and understanding human activity, we can recognize and predict the occurrence of crimes and help the police or other agencies react immediately. In the past, a large number of papers have been published on human activity recognition in video and image sequences. In this paper, we provide a comprehensive survey of the recent development of the techniques, including methods, systems, and quantitative evaluation towards the performance of human activity recognition.
Comparative study of methods for recognition of an unknown person's action from a video sequence

Science.gov (United States)

Hori, Takayuki; Ohya, Jun; Kurumisawa, Jun

2009-02-01

This paper proposes a Tensor Decomposition Based method that can recognize an unknown person's action from a video sequence, where the unknown person is not included in the database (tensor) used for the recognition. The tensor consists of persons, actions and time-series image features. For the observed unknown person's action, one of the actions stored in the tensor is assumed. Using the motion signature obtained from the assumption, the unknown person's actions are synthesized. The actions of one of the persons in the tensor are replaced by the synthesized actions. Then, the core tensor for the replaced tensor is computed. This process is repeated for the actions and persons. For each iteration, the difference between the replaced and original core tensors is computed. The assumption that gives the minimal difference is the action recognition result. For the time-series image features to be stored in the tensor and to be extracted from the observed video sequence, the human body silhouette's contour shape based feature is used. To show the validity of our proposed method, our proposed method is experimentally compared with Nearest Neighbor rule and Principal Component analysis based method. Experiments using 33 persons' seven kinds of action show that our proposed method achieves better recognition accuracies for the seven actions than the other methods.

Video- or text-based e-learning when teaching clinical procedures? A randomized controlled trial.

Science.gov (United States)

Buch, Steen Vigh; Treschow, Frederik Philip; Svendsen, Jesper Brink; Worm, Bjarne Skjødt

2014-01-01

This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (Pvideo group performed better on the follow-up test (P=0.04). Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills.
New baseline correction algorithm for text-line recognition with bidirectional recurrent neural networks

Science.gov (United States)

Morillot, Olivier; Likforman-Sulem, Laurence; Grosicki, Emmanuèle

2013-04-01

Many preprocessing techniques have been proposed for isolated word recognition. However, recently, recognition systems have dealt with text blocks and their compound text lines. In this paper, we propose a new preprocessing approach to efficiently correct baseline skew and fluctuations. Our approach is based on a sliding window within which the vertical position of the baseline is estimated. Segmentation of text lines into subparts is, thus, avoided. Experiments conducted on a large publicly available database (Rimes), with a BLSTM (bidirectional long short-term memory) recurrent neural network recognition system, show that our baseline correction approach highly improves performance.
Content-based TV sports video retrieval using multimodal analysis

Science.gov (United States)

Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru

2003-09-01

In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.
Advanced text and video analytics for proactive decision making

Science.gov (United States)

Bowman, Elizabeth K.; Turek, Matt; Tunison, Paul; Porter, Reed; Thomas, Steve; Gintautas, Vadas; Shargo, Peter; Lin, Jessica; Li, Qingzhe; Gao, Yifeng; Li, Xiaosheng; Mittu, Ranjeev; Rosé, Carolyn Penstein; Maki, Keith; Bogart, Chris; Choudhari, Samrihdi Shree

2017-05-01

Today's warfighters operate in a highly dynamic and uncertain world, and face many competing demands. Asymmetric warfare and the new focus on small, agile forces has altered the framework by which time critical information is digested and acted upon by decision makers. Finding and integrating decision-relevant information is increasingly difficult in data-dense environments. In this new information environment, agile data algorithms, machine learning software, and threat alert mechanisms must be developed to automatically create alerts and drive quick response. Yet these advanced technologies must be balanced with awareness of the underlying context to accurately interpret machine-processed indicators and warnings and recommendations. One promising approach to this challenge brings together information retrieval strategies from text, video, and imagery. In this paper, we describe a technology demonstration that represents two years of tri-service research seeking to meld text and video for enhanced content awareness. The demonstration used multisource data to find an intelligence solution to a problem using a common dataset. Three technology highlights from this effort include 1) Incorporation of external sources of context into imagery normalcy modeling and anomaly detection capabilities, 2) Automated discovery and monitoring of targeted users from social media text, regardless of language, and 3) The concurrent use of text and imagery to characterize behaviour using the concept of kinematic and text motifs to detect novel and anomalous patterns. Our demonstration provided a technology baseline for exploiting heterogeneous data sources to deliver timely and accurate synopses of data that contribute to a dynamic and comprehensive worldview.
Multiple Feature Fusion Based on Co-Training Approach and Time Regularization for Place Classification in Wearable Video

Directory of Open Access Journals (Sweden)

Vladislavs Dovgalecs

2013-01-01

Full Text Available The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method.
Event Recognition Based on Deep Learning in Chinese Texts.

Directory of Open Access Journals (Sweden)

Yajun Zhang

Full Text Available Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM. Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN, then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Text recognition and correction for automated data collection by mobile devices

Science.gov (United States)

Ozarslan, Suleyman; Eren, P. Erhan

2014-03-01

Participatory sensing is an approach which allows mobile devices such as mobile phones to be used for data collection, analysis and sharing processes by individuals. Data collection is the first and most important part of a participatory sensing system, but it is time consuming for the participants. In this paper, we discuss automatic data collection approaches for reducing the time required for collection, and increasing the amount of collected data. In this context, we explore automated text recognition on images of store receipts which are captured by mobile phone cameras, and the correction of the recognized text. Accordingly, our first goal is to evaluate the performance of the Optical Character Recognition (OCR) method with respect to data collection from store receipt images. Images captured by mobile phones exhibit some typical problems, and common image processing methods cannot handle some of them. Consequently, the second goal is to address these types of problems through our proposed Knowledge Based Correction (KBC) method used in support of the OCR, and also to evaluate the KBC method with respect to the improvement on the accurate recognition rate. Results of the experiments show that the KBC method improves the accurate data recognition rate noticeably.
Object Recognition In HADOOP Using HIPI

Directory of Open Access Journals (Sweden)

Ankit Kumar Agrawal

2015-07-01

Full Text Available Abstract The amount of images and videos being shared by the user is exponentially increasing but applications that perform video analytics is severely lacking or work on limited set of data. It is also challenging to perform analytics with less time complexity. Object recognition is the primary step in video analytics. We implement a robust method to extract objects from the data which is in unstructured format and cannot be processed directly by relational databases. In this study we present our report with results after performance evaluation and compare them with results of MATLAB.
Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition

Directory of Open Access Journals (Sweden)

Jiasong Zhu

2018-06-01

Full Text Available Vehicle behavior recognition is an attractive research field which is useful for many computer vision and intelligent traffic analysis tasks. This paper presents an all-in-one behavior recognition framework for moving vehicles based on the latest deep learning techniques. Unlike traditional traffic analysis methods which rely on low-resolution videos captured by road cameras, we capture 4K ( 3840 × 2178 traffic videos at a busy road intersection of a modern megacity by flying a unmanned aerial vehicle (UAV during the rush hours. We then manually annotate locations and types of road vehicles. The proposed method consists of the following three steps: (1 vehicle detection and type recognition based on deep neural networks; (2 vehicle tracking by data association and vehicle trajectory modeling; (3 vehicle behavior recognition by nearest neighbor search and by bidirectional long short-term memory network, respectively. This paper also presents experimental results of the proposed framework in comparison with state-of-the-art approaches on the 4K testing traffic video, which demonstrated the effectiveness and superiority of the proposed method.
Gait Recognition Using Image Self-Similarity

Directory of Open Access Journals (Sweden)

Chiraz BenAbdelkader

2004-04-01

Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And Social Media Data

OpenAIRE

Jai Prakash Verma; Smita Agrawal; Bankim Patel; Atul Patel

2016-01-01

All types of machine automated systems are generating large amount of data in different forms like statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we are discussing issues, challenges, and application of these types of Big Data with the consideration of big data dimensions. Here we are discussing social media data analytics, content based analytics, text data analytics, audio, and video data analytics their issues and expected applica...
EEG-based recognition of video-induced emotions: selecting subject-independent feature set.

Science.gov (United States)

Kortelainen, Jukka; Seppänen, Tapio

2013-01-01

Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Human recognition in a video network

Science.gov (United States)

Bhanu, Bir

2009-10-01

Video networks is an emerging interdisciplinary field with significant and exciting scientific and technological challenges. It has great promise in solving many real-world problems and enabling a broad range of applications, including smart homes, video surveillance, environment and traffic monitoring, elderly care, intelligent environments, and entertainment in public and private spaces. This paper provides an overview of the design of a wireless video network as an experimental environment, camera selection, hand-off and control, anomaly detection. It addresses challenging questions for individual identification using gait and face at a distance and present new techniques and their comparison for robust identification.
The effects of two health information texts on patient recognition memory: a randomized controlled trial.

Science.gov (United States)

Freed, Erin; Long, Debra; Rodriguez, Tonantzin; Franks, Peter; Kravitz, Richard L; Jerant, Anthony

2013-08-01

To compare the effects of two health information texts on patient recognition memory, a key aspect of comprehension. Randomized controlled trial (N=60), comparing the effects of experimental and control colorectal cancer (CRC) screening texts on recognition memory, measured using a statement recognition test, accounting for response bias (score range -0.91 to 5.34). The experimental text had a lower Flesch-Kincaid reading grade level (7.4 versus 9.6), was more focused on addressing screening barriers, and employed more comparative tables than the control text. Recognition memory was higher in the experimental group (2.54 versus 1.09, t=-3.63, P=0.001), including after adjustment for age, education, and health literacy (β=0.42, 95% CI: 0.17, 0.68, P=0.001), and in analyses limited to persons with college degrees (β=0.52, 95% CI: 0.18, 0.86, P=0.004) or no self-reported health literacy problems (β=0.39, 95% CI: 0.07, 0.71, P=0.02). An experimental CRC screening text improved recognition memory, including among patients with high education and self-assessed health literacy. CRC screening texts comparable to our experimental text may be warranted for all screening-eligible patients, if such texts improve screening uptake. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Human action recognition with depth cameras

CERN Document Server

Wang, Jiang; Wu, Ying

2014-01-01

Action recognition technology has many real-world applications in human-computer interaction, surveillance, video retrieval, retirement home monitoring, and robotics. The commoditization of depth sensors has also opened up further applications that were not feasible before. This text focuses on feature representation and machine learning algorithms for action recognition from depth sensors. After presenting a comprehensive overview of the state of the art, the authors then provide in-depth descriptions of their recently developed feature representations and machine learning techniques, includi
Indexed Captioned Searchable Videos: A Learning Companion for STEM Coursework

Science.gov (United States)

Tuna, Tayfun; Subhlok, Jaspal; Barker, Lecia; Shah, Shishir; Johnson, Olin; Hovey, Christopher

2017-02-01

Videos of classroom lectures have proven to be a popular and versatile learning resource. A key shortcoming of the lecture video format is accessing the content of interest hidden in a video. This work meets this challenge with an advanced video framework featuring topical indexing, search, and captioning (ICS videos). Standard optical character recognition (OCR) technology was enhanced with image transformations for extraction of text from video frames to support indexing and search. The images and text on video frames is analyzed to divide lecture videos into topical segments. The ICS video player integrates indexing, search, and captioning in video playback providing instant access to the content of interest. This video framework has been used by more than 70 courses in a variety of STEM disciplines and assessed by more than 4000 students. Results presented from the surveys demonstrate the value of the videos as a learning resource and the role played by videos in a students learning process. Survey results also establish the value of indexing and search features in a video platform for education. This paper reports on the development and evaluation of ICS videos framework and over 5 years of usage experience in several STEM courses.
PROBABILISTIC APPROACH TO OBJECT DETECTION AND RECOGNITION FOR VIDEOSTREAM PROCESSING

Directory of Open Access Journals (Sweden)

Volodymyr Kharchenko

2017-07-01

Full Text Available Purpose: The represented research results are aimed to improve theoretical basics of computer vision and artificial intelligence of dynamical system. Proposed approach of object detection and recognition is based on probabilistic fundamentals to ensure the required level of correct object recognition. Methods: Presented approach is grounded at probabilistic methods, statistical methods of probability density estimation and computer-based simulation at verification stage of development. Results: Proposed approach for object detection and recognition for video stream data processing has shown several advantages in comparison with existing methods due to its simple realization and small time of data processing. Presented results of experimental verification look plausible for object detection and recognition in video stream. Discussion: The approach can be implemented in dynamical system within changeable environment such as remotely piloted aircraft systems and can be a part of artificial intelligence in navigation and control systems.
Human activity recognition and prediction

CERN Document Server

2016-01-01

This book provides a unique view of human activity recognition, especially fine-grained human activity structure learning, human-interaction recognition, RGB-D data based action recognition, temporal decomposition, and causality learning in unconstrained human activity videos. The techniques discussed give readers tools that provide a significant improvement over existing methodologies of video content understanding by taking advantage of activity recognition. It links multiple popular research fields in computer vision, machine learning, human-centered computing, human-computer interaction, image classification, and pattern recognition. In addition, the book includes several key chapters covering multiple emerging topics in the field. Contributed by top experts and practitioners, the chapters present key topics from different angles and blend both methodology and application, composing a solid overview of the human activity recognition techniques. .
A Depth Video-based Human Detection and Activity Recognition using Multi-features and Embedded Hidden Markov Models for Health Care Monitoring Systems

Directory of Open Access Journals (Sweden)

Ahmad Jalal

2017-08-01

Full Text Available Increase in number of elderly people who are living independently needs especial care in the form of healthcare monitoring systems. Recent advancements in depth video technologies have made human activity recognition (HAR realizable for elderly healthcare applications. In this paper, a depth video-based novel method for HAR is presented using robust multi-features and embedded Hidden Markov Models (HMMs to recognize daily life activities of elderly people living alone in indoor environment such as smart homes. In the proposed HAR framework, initially, depth maps are analyzed by temporal motion identification method to segment human silhouettes from noisy background and compute depth silhouette area for each activity to track human movements in a scene. Several representative features, including invariant, multi-view differentiation and spatiotemporal body joints features were fused together to explore gradient orientation change, intensity differentiation, temporal variation and local motion of specific body parts. Then, these features are processed by the dynamics of their respective class and learned, modeled, trained and recognized with specific embedded HMM having active feature values. Furthermore, we construct a new online human activity dataset by a depth sensor to evaluate the proposed features. Our experiments on three depth datasets demonstrated that the proposed multi-features are efficient and robust over the state of the art features for human action and activity recognition.
NAMED ENTITY RECOGNITION FROM BIOMEDICAL TEXT -AN INFORMATION EXTRACTION TASK

Directory of Open Access Journals (Sweden)

N. Kanya

2016-07-01

Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.

FPGA IMPLEMENTATION OF ADAPTIVE INTEGRATED SPIKING NEURAL NETWORK FOR EFFICIENT IMAGE RECOGNITION SYSTEM

Directory of Open Access Journals (Sweden)

T. Pasupathi

2014-05-01

Full Text Available Image recognition is a technology which can be used in various applications such as medical image recognition systems, security, defense video tracking, and factory automation. In this paper we present a novel pipelined architecture of an adaptive integrated Artificial Neural Network for image recognition. In our proposed work we have combined the feature of spiking neuron concept with ANN to achieve the efficient architecture for image recognition. The set of training images are trained by ANN and target output has been identified. Real time videos are captured and then converted into frames for testing purpose and the image were recognized. The machine can operate at up to 40 frames/sec using images acquired from the camera. The system has been implemented on XC3S400 SPARTAN-3 Field Programmable Gate Arrays.
Adaptive metric learning with deep neural networks for video-based facial expression recognition

Science.gov (United States)

Liu, Xiaofeng; Ge, Yubin; Yang, Chao; Jia, Ping

2018-01-01

Video-based facial expression recognition has become increasingly important for plenty of applications in the real world. Despite that numerous efforts have been made for the single sequence, how to balance the complex distribution of intra- and interclass variations well between sequences has remained a great difficulty in this area. We propose the adaptive (N+M)-tuplet clusters loss function and optimize it with the softmax loss simultaneously in the training phrase. The variations introduced by personal attributes are alleviated using the similarity measurements of multiple samples in the feature space with many fewer comparison times as conventional deep metric learning approaches, which enables the metric calculations for large data applications (e.g., videos). Both the spatial and temporal relations are well explored by a unified framework that consists of an Inception-ResNet network with long short term memory and the two fully connected layer branches structure. Our proposed method has been evaluated with three well-known databases, and the experimental results show that our method outperforms many state-of-the-art approaches.
Image and Video for Hearing Impaired People

Directory of Open Access Journals (Sweden)

Aran Oya

2007-01-01

Full Text Available We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL and the cued speech (CS language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.
Incongruence Between Observers' and Observed Facial Muscle Activation Reduces Recognition of Emotional Facial Expressions From Video Stimuli.

Science.gov (United States)

Wingenbach, Tanja S H; Brosnan, Mark; Pfaltz, Monique C; Plichta, Michael M; Ashwin, Chris

2018-01-01

According to embodied cognition accounts, viewing others' facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others' facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a) explicit imitation of viewed facial emotional expressions (stimulus-congruent condition), (b) pen-holding with the lips (stimulus-incongruent condition), and (c) passive viewing (control condition). It was hypothesised that (1) experimental condition (a) and (b) result in greater facial muscle activity than (c), (2) experimental condition (a) increases emotion recognition accuracy from others' faces compared to (c), (3) experimental condition (b) lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c). Participants (42 males, 42 females) underwent a facial emotion recognition experiment (ADFES-BIV) while electromyography (EMG) was recorded from five facial muscle sites. The experimental conditions' order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed.
Incongruence Between Observers’ and Observed Facial Muscle Activation Reduces Recognition of Emotional Facial Expressions From Video Stimuli

Science.gov (United States)

Wingenbach, Tanja S. H.; Brosnan, Mark; Pfaltz, Monique C.; Plichta, Michael M.; Ashwin, Chris

2018-01-01

According to embodied cognition accounts, viewing others’ facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others’ facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a) explicit imitation of viewed facial emotional expressions (stimulus-congruent condition), (b) pen-holding with the lips (stimulus-incongruent condition), and (c) passive viewing (control condition). It was hypothesised that (1) experimental condition (a) and (b) result in greater facial muscle activity than (c), (2) experimental condition (a) increases emotion recognition accuracy from others’ faces compared to (c), (3) experimental condition (b) lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c). Participants (42 males, 42 females) underwent a facial emotion recognition experiment (ADFES-BIV) while electromyography (EMG) was recorded from five facial muscle sites. The experimental conditions’ order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed. PMID:29928240
A Novel Approach in Text-Independent Speaker Recognition in Noisy Environment

Directory of Open Access Journals (Sweden)

Nona Heydari Esfahani

2014-10-01

Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.
Semantic Activity Recognition

OpenAIRE

Thonnat , Monique

2008-01-01

International audience; Extracting automatically the semantics from visual data is a real challenge. We describe in this paper how recent work in cognitive vision leads to significative results in activity recognition for visualsurveillance and video monitoring. In particular we present work performed in the domain of video understanding in our PULSAR team at INRIA in Sophia Antipolis. Our main objective is to analyse in real-time video streams captured by static video cameras and to recogniz...
Optical Character Recognition Applied to Romanian Printed Texts of the 18th–20th Century

Directory of Open Access Journals (Sweden)

Svetlana Cojocaru

2016-04-01

Full Text Available The paper discusses Optical Character Recognition (OCR of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries, with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.
Continuous Chinese sign language recognition with CNN-LSTM

Science.gov (United States)

Yang, Su; Zhu, Qing

2017-07-01

The goal of sign language recognition (SLR) is to translate the sign language into text, and provide a convenient tool for the communication between the deaf-mute and the ordinary. In this paper, we formulate an appropriate model based on convolutional neural network (CNN) combined with Long Short-Term Memory (LSTM) network, in order to accomplish the continuous recognition work. With the strong ability of CNN, the information of pictures captured from Chinese sign language (CSL) videos can be learned and transformed into vector. Since the video can be regarded as an ordered sequence of frames, LSTM model is employed to connect with the fully-connected layer of CNN. As a recurrent neural network (RNN), it is suitable for sequence learning tasks with the capability of recognizing patterns defined by temporal distance. Compared with traditional RNN, LSTM has performed better on storing and accessing information. We evaluate this method on our self-built dataset including 40 daily vocabularies. The experimental results show that the recognition method with CNN-LSTM can achieve a high recognition rate with small training sets, which will meet the needs of real-time SLR system.
Video content analysis of surgical procedures.

Science.gov (United States)

Loukas, Constantinos

2018-02-01

In addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations. The review was obtained from PubMed and Google Scholar search on combinations of the following keywords: 'surgery', 'video', 'phase', 'task', 'skills', 'event', 'shot', 'analysis', 'retrieval', 'detection', 'classification', and 'recognition'. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation. A total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection. Content-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
A Motion-Adaptive Deinterlacer via Hybrid Motion Detection and Edge-Pattern Recognition

Directory of Open Access Journals (Sweden)

He-Yuan Lin

2008-03-01

Full Text Available A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
A Motion-Adaptive Deinterlacer via Hybrid Motion Detection and Edge-Pattern Recognition

Directory of Open Access Journals (Sweden)

Li Hsin-Te

2008-01-01

Full Text Available Abstract A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
Vision-Based Recognition of Activities by a Humanoid Robot

Directory of Open Access Journals (Sweden)

Mounîm A. El-Yacoubi

2015-12-01

Full Text Available We present an autonomous assistive robotic system for human activity recognition from video sequences. Due to the large variability inherent to video capture from a non-fixed robot (as opposed to a fixed camera, as well as the robot's limited computing resources, implementation has been guided by robustness to this variability and by memory and computing speed efficiency. To accommodate motion speed variability across users, we encode motion using dense interest point trajectories. Our recognition model harnesses the dense interest point bag-of-words representation through an intersection kernel-based SVM that better accommodates the large intra-class variability stemming from a robot operating in different locations and conditions. To contextually assess the engine as implemented in the robot, we compare it with the most recent approaches of human action recognition performed on public datasets (non-robot-based, including a novel approach of our own that is based on a two-layer SVM-hidden conditional random field sequential recognition model. The latter's performance is among the best within the recent state of the art. We show that our robot-based recognition engine, while less accurate than the sequential model, nonetheless shows good performances, especially given the adverse test conditions of the robot, relative to those of a fixed camera.
Data Compression by Shape Compensation for Mobile Video Sensors

Directory of Open Access Journals (Sweden)

Ben-Shung Chow

2009-04-01

Full Text Available Most security systems, with their transmission bandwidth and computing power both being sufficient, emphasize their automatic recognition techniques. However, in some situations such as baby monitors and intruder avoidance by mobile sensors, the decision function sometimes can be shifted to the concerned human to reduce the transmission and computation cost. We therefore propose a binary video compression method in low resolution to achieve a low cost mobile video communication for inexpensive camera sensors. Shape compensation as proposed in this communication successfully replaces the standard Discrete Cosine Transformation (DCT after motion compensation.
Precision Security: Integrating Video Surveillance with Surrounding Environment Changes

Directory of Open Access Journals (Sweden)

Wenfeng Wang

2018-01-01

Full Text Available Video surveillance plays a vital role in maintaining the social security although, until now, large uncertainty still exists in danger understanding and recognition, which can be partly attributed to intractable environment changes in the backgrounds. This article presents a brain-inspired computing of attention value of surrounding environment changes (EC with a processes-based cognition model by introducing a ratio value λ of EC-implications within considered periods. Theoretical models for computation of warning level of EC-implications to the universal video recognition efficiency (quantified as time cost of implication-ratio variations from λk to λk+1, k=1,2,… are further established. Imbedding proposed models into the online algorithms is suggested as a future research priority towards precision security for critical applications and, furthermore, schemes for a practical implementation of such integration are also preliminarily discussed.
Target recognition and scene interpretation in image/video understanding systems based on network-symbolic models

Science.gov (United States)

Kuvich, Gary

2004-08-01

Vision is only a part of a system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. These mechanisms provide a reliable recognition if the object is occluded or cannot be recognized as a whole. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, is the most feasible for such models. It converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Network-Symbolic Transformations derive abstract structures, which allows for invariant recognition of an object as exemplar of a class. Active vision helps creating consistent models. Attention, separation of figure from ground and perceptual grouping are special kinds of network-symbolic transformations. Such Image/Video Understanding Systems will be reliably recognizing targets.
A Review of Human Activity Recognition Methods

Directory of Open Access Journals (Sweden)

Michalis eVrigkas

2015-11-01

Full Text Available Recognizing human activities from video sequences or still images is a challenging task due to problems such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance. Many applications, including video surveillance systems, human-computer interaction, and robotics for human behavior characterization, require a multiple activity recognition system. In this work, we provide a detailed review of recent and state-of-the-art research advances in the field of human activity classification. We propose a categorization of human activity methodologies and discuss their advantages and limitations. In particular, we divide human activity classification methods into two large categories according to whether they use data from different modalities or not. Then, each of these categories is further analyzed into sub-categories, which reflect how they model human activities and what type of activities they are interested in. Moreover, we provide a comprehensive analysis of the existing, publicly available human activity classification datasets and examine the requirements for an ideal human activity recognition dataset. Finally, we report the characteristics of future research directions and present some open issues on human activity recognition.
A Biometric Face Recognition System Using an Algorithm Based on the Principal Component Analysis Technique

Directory of Open Access Journals (Sweden)

Gheorghe Gîlcă

2015-06-01

Full Text Available This article deals with a recognition system using an algorithm based on the Principal Component Analysis (PCA technique. The recognition system consists only of a PC and an integrated video camera. The algorithm is developed in MATLAB language and calculates the eigenfaces considered as features of the face. The PCA technique is based on the matching between the facial test image and the training prototype vectors. The mathcing score between the facial test image and the training prototype vectors is calculated between their coefficient vectors. If the matching is high, we have the best recognition. The results of the algorithm based on the PCA technique are very good, even if the person looks from one side at the video camera.
Mobile Message Services Using Text, Audio or Video for Improving the Learning Infrastructure in Higher Education

Directory of Open Access Journals (Sweden)

BjÃƒÂ¶rn Olof Hedin

2006-06-01

Full Text Available This study examines how media files sent to mobile phones can be used to improve education at universities, and describes a prototype implement of such a system using standard components. To accomplish this, university students were equipped with mobile phones and software that allowed teachers to send text-based, audio-based and video-based messages to the students. Data was collected using questionnaires, focus groups and log files. The conclusions were that students preferred to have information and learning content sent as text, rather than audio or video. Text messages sent to phones should be no longer than 2000 characters. The most appreciated services were notifications of changes in course schedules, short lecture introductions and reminders. The prototype showed that this functionality is easy to implement using standard components.
Design and Implementation of Video Shot Detection on Field Programmable Gate Arrays

Directory of Open Access Journals (Sweden)

Jharna Majumdar

2012-09-01

Full Text Available Video has become an interactive medium of communication in everyday life. The sheer volume of video makes it extremely difficult to browse through and find the required data. Hence extraction of key frames from the video which represents the abstract of the entire video becomes necessary. The aim of the video shot detection is to find the position of the shot boundaries, so that key frames can be selected from each shot for subsequent processing such as video summarization, indexing etc. For most of the surveillance applications like video summery, face recognition etc., the hardware (real time implementation of these algorithms becomes necessary. Here in this paper we present the architecture for simultaneous accessing of consecutive frames, which are then used for the implementation of various Video Shot Detection algorithms. We also present the real time implementation of three video shot detection algorithms using the above mentioned architecture on FPGA (Field Programmable Gate Arrays.

Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

Science.gov (United States)

Abdous, M'hammed; He, Wu

2011-01-01

Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
COMPOSITIONAL AND CONTENT-RELATED PARTICULARITIES OF POLITICAL MEDIA TEXTS (THROUGH THE EXAMPLE OF THE TEXTS OF POLITICAL VIDEO CLIPS ISSUED BY THE CANDIDATES FOR PRESIDENCY IN FRANCE IN 2017

Directory of Open Access Journals (Sweden)

Dmitrieva, A.V.

2017-09-01

Full Text Available The article examines the texts of political advertising video clips issued by the candidates for presidency in France during the campaign before the first round of elections in 2017. The mentioned examples of media texts are analysed from the compositional point of view as well as from that of the content particularities which are directly connected to the text structure. In general, the majority of the studied clips have a similar structure and consist of three parts: introduction, main part and conclusion. However, as a result of the research, a range of advantages marking well-structured videos was revealed. These include: addressing the voters and stating the speech topic clearly at the beginning of the clip, a relevant attention-grabbing opening phrase, consistency and clarity of the information presentation, appropriate use of additional video plots, conclusion at the end of the clip.
Social trait judgment and affect recognition from static faces and video vignettes in schizophrenia.

Science.gov (United States)

McIntosh, Lindsey G; Park, Sohee

2014-09-01

Social impairment is a core feature of schizophrenia, present from the pre-morbid stage and predictive of outcome, but the etiology of this deficit remains poorly understood. Successful and adaptive social interactions depend on one's ability to make rapid and accurate judgments about others in real time. Our surprising ability to form accurate first impressions from brief exposures, known as "thin slices" of behavior has been studied very extensively in healthy participants. We sought to examine affect and social trait judgment from thin slices of static or video stimuli in order to investigate the ability of schizophrenic individuals to form reliable social impressions of others. 21 individuals with schizophrenia (SZ) and 20 matched healthy participants (HC) were asked to identify emotions and social traits for actors in standardized face stimuli as well as brief video clips. Sound was removed from videos to remove all verbal cues. Clinical symptoms in SZ and delusional ideation in both groups were measured. Results showed a general impairment in affect recognition for both types of stimuli in SZ. However, the two groups did not differ in the judgments of trustworthiness, approachability, attractiveness, and intelligence. Interestingly, in SZ, the severity of positive symptoms was correlated with higher ratings of attractiveness, trustworthiness, and approachability. Finally, increased delusional ideation in SZ was associated with a tendency to rate others as more trustworthy, while the opposite was true for HC. These findings suggest that complex social judgments in SZ are affected by symptomatology. Copyright © 2014 Elsevier B.V. All rights reserved.
Temporal visual cues aid speech recognition

DEFF Research Database (Denmark)

Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

2006-01-01

of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...
Algorithm of Syntactic Idioms Recognition in the Text: Attempt of Construction

Directory of Open Access Journals (Sweden)

Sytar Hanna

2016-12-01

Full Text Available Background: Attention of national and foreign researchers was focused so far on structural and semantic features of syntactic idioms. Automatic analysis of these peculiar units that are on the verge of syntax and phraseology still was not carried out in the scientific literature. This issue requires a theoretical understanding and practical implementation. Purpose: To create an algorithm of recognition of syntactic idioms with one- or two-term core component in the corpus of texts. Results: Based on the results of previous theoretical studies we highlighted a number of formal and statistical criteria that enable to distinguish syntactic idioms from other language units in the corpus of Ukrainian-language texts. The author developed a block diagram of syntactic idioms recognition, incorporating two branches constructed accordingly for the sentences with one-term and sentences with two-term core component. The first branch is based on the presence of word repeats (full words concurrence or presence of other word forms of the word and the list of core components determined on previous stages of the study (є, це, то, не, так; як; з/із/зі, між, над, серед; а, але, зате, однак, проте. The second branch was created for another type of syntactic idioms – one with a two-term core component. It takes into account the following properties of the analyzed units: the presence of combinations of service parts of speech, service parts of speech with pronoun or adverb, pronoun and adverb; compliance of words combinations with the register of the syntactic idioms core components currently comprising 92 structures; association measure of mutual information ≥9, etc. Discussion: Offered algorithm enables automatic identification of syntactic idioms in the corpus of texts and removal of contexts of their use, it can be used to improve the procedure of automatic text processing and creation of automated translation
Face Spoof Attack Recognition Using Discriminative Image Patches

Directory of Open Access Journals (Sweden)

Zahid Akhtar

2016-01-01

Full Text Available Face recognition systems are now being used in many applications such as border crossings, banks, and mobile payments. The wide scale deployment of facial recognition systems has attracted intensive attention to the reliability of face biometrics against spoof attacks, where a photo, a video, or a 3D mask of a genuine user’s face can be used to gain illegitimate access to facilities or services. Though several face antispoofing or liveness detection methods (which determine at the time of capture whether a face is live or spoof have been proposed, the issue is still unsolved due to difficulty in finding discriminative and computationally inexpensive features and methods for spoof attacks. In addition, existing techniques use whole face image or complete video for liveness detection. However, often certain face regions (video frames are redundant or correspond to the clutter in the image (video, thus leading generally to low performances. Therefore, we propose seven novel methods to find discriminative image patches, which we define as regions that are salient, instrumental, and class-specific. Four well-known classifiers, namely, support vector machine (SVM, Naive-Bayes, Quadratic Discriminant Analysis (QDA, and Ensemble, are then used to distinguish between genuine and spoof faces using a voting based scheme. Experimental analysis on two publicly available databases (Idiap REPLAY-ATTACK and CASIA-FASD shows promising results compared to existing works.
Video2vec Embeddings Recognize Events When Examples Are Scarce.

Science.gov (United States)

Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G M

2017-10-01

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.
Effectiveness of a Video-Versus Text-Based Computer-Tailored Intervention for Obesity Prevention after One Year: A Randomized Controlled Trial

Directory of Open Access Journals (Sweden)

Kei Long Cheung

2017-10-01

Full Text Available Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1 the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI, and (2 the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM difference = −205.40, p = 0.00 and 12 months (LSM difference = −128.14, p = 0.03. Only video intervention resulted in lower average daily energy intake after one year (d = 0.12. Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring.
Remix video and the crisis of the humanities

Directory of Open Access Journals (Sweden)

Kim Middleton

2012-03-01

Full Text Available The discourses of crisis in the humanities is juxtaposed with an analysis of remix video practices to suggest that the cognitive and cultural engagement feared lost in the former appear with frequency and enthusiasm in the latter. Whether humanists focus on the deleterious effects of the digital or celebrate the digital humanities but resist a turn to computation, their anxieties turn to the disappearance of textual analysis, aesthetics, critique, and self-reflection. Remix video, as exemplified by mashups, trailer remixes, and vids, depends on these same competencies for the creation and circulation of its works. Remix video is not the answer to the crises of the humanities; rather, the recognition of a common set of practices, skills, and values underpinning scholars and video practitioners' work provides the basis for a coalitional approach: identification of shared opportunities to promote and engage potential participants in the modes of thinking and production that contend with complex cultural ideas.
Medical Student and Tutor Perceptions of Video Versus Text in an Interactive Online Virtual Patient for Problem-Based Learning: A Pilot Study

Science.gov (United States)

Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil

2015-01-01

Background The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. Objective A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. Methods An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George’s, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Results Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students’ ability to review and critically appraise the presented information. Conclusions Our findings suggest that text was perceived to be a
Medical Student and Tutor Perceptions of Video Versus Text in an Interactive Online Virtual Patient for Problem-Based Learning: A Pilot Study.

Science.gov (United States)

Woodham, Luke A; Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil

2015-06-18

The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George's, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students' ability to review and critically appraise the presented information. Our findings suggest that text was perceived to be a better source of information than video in virtual
Support vector machine for automatic pain recognition

Science.gov (United States)

Monwar, Md Maruf; Rezaei, Siamak

2009-02-01

Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.
Object Attention Patches for Text Detection and Recognition in Scene Images using SIFT

NARCIS (Netherlands)

Sriman, Bowornrat; Schomaker, Lambertus; De Marsico, Maria; Figueiredo, Mário; Fred, Ana

2015-01-01

Natural urban scene images contain many problems for character recognition such as luminance noise, varying font styles or cluttered backgrounds. Detecting and recognizing text in a natural scene is a difficult problem. Several techniques have been proposed to overcome these problems. These are,
Mobile-based text recognition from water quality devices

Science.gov (United States)

Dhakal, Shanti; Rahnemoonfar, Maryam

2015-03-01

Measuring water quality of bays, estuaries, and gulfs is a complicated and time-consuming process. YSI Sonde is an instrument used to measure water quality parameters such as pH, temperature, salinity, and dissolved oxygen. This instrument is taken to water bodies in a boat trip and researchers note down different parameters displayed by the instrument's display monitor. In this project, a mobile application is developed for Android platform that allows a user to take a picture of the YSI Sonde monitor, extract text from the image and store it in a file on the phone. The image captured by the application is first processed to remove perspective distortion. Probabilistic Hough line transform is used to identify lines in the image and the corner of the image is then obtained by determining the intersection of the detected horizontal and vertical lines. The image is warped using the perspective transformation matrix, obtained from the corner points of the source image and the destination image, hence, removing the perspective distortion. Mathematical morphology operation, black-hat is used to correct the shading of the image. The image is binarized using Otsu's binarization technique and is then passed to the Optical Character Recognition (OCR) software for character recognition. The extracted information is stored in a file on the phone and can be retrieved later for analysis. The algorithm was tested on 60 different images of YSI Sonde with different perspective features and shading. Experimental results, in comparison to ground-truth results, demonstrate the effectiveness of the proposed method.
Fast Pedestrian Recognition Based on Multisensor Fusion

Directory of Open Access Journals (Sweden)

Hongyu Hu

2012-01-01

Full Text Available A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients (ROI-IHOG feature extraction method is proposed later. A support vector machine (SVM classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.
Testimonials and Informational Videos on Branded Prescription Drug Websites: Experimental Study to Assess Influence on Consumer Knowledge and Perceptions.

Science.gov (United States)

Sullivan, Helen W; O'Donoghue, Amie C; Gard Read, Jennifer; Amoozegar, Jacqueline B; Aikin, Kathryn J; Rupert, Douglas J

2018-01-23

Direct-to-consumer (DTC) promotion of prescription drugs can affect consumer behaviors and health outcomes, and Internet drug promotion is growing rapidly. Branded drug websites often capitalize on the multimedia capabilities of the Internet by using videos to emphasize drug benefits and characteristics. However, it is unknown how such videos affect consumer processing of drug information. This study aimed to examine how videos on prescription drug websites, and the inclusion of risk information in those videos, influence consumer knowledge and perceptions. We conducted an experimental study in which online panel participants with acid reflux (n=1070) or high blood pressure (n=1055) were randomly assigned to view 1 of the 10 fictitious prescription drug websites and complete a short questionnaire. On each website, we manipulated the type of video (patient testimonial, mechanism of action animation, or none) and whether the video mentioned drug risks. Participants who viewed any video were less likely to recognize drug risks presented only in the website text (P≤.01). Including risk information in videos increased participants' recognition of the risks presented in the videos (P≤.01). However, in some cases, including risk information in videos decreased participants' recognition of the risks not presented in the videos (ie, risks presented in text only; P≤.04). Participants who viewed a video without drug risk information thought that the website placed more emphasis on benefits, compared with participants who viewed the video with drug risk information (P≤.01). Compared with participants who viewed a video without drug risk information, participants who viewed a video with drug risk information thought that the drug was less effective in the high blood pressure sample (P=.03) and thought that risks were more serious in the acid reflux sample (P=.01). There were no significant differences between risk and nonrisk video conditions on other perception
Recognition of pornographic web pages by classifying texts and images.

Science.gov (United States)

Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

2007-06-01

With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
Remote Video Monitor of Vehicles in Cooperative Information Platform

Science.gov (United States)

Qin, Guofeng; Wang, Xiaoguo; Wang, Li; Li, Yang; Li, Qiyan

Detection of vehicles plays an important role in the area of the modern intelligent traffic management. And the pattern recognition is a hot issue in the area of computer vision. An auto- recognition system in cooperative information platform is studied. In the cooperative platform, 3G wireless network, including GPS, GPRS (CDMA), Internet (Intranet), remote video monitor and M-DMB networks are integrated. The remote video information can be taken from the terminals and sent to the cooperative platform, then detected by the auto-recognition system. The images are pretreated and segmented, including feature extraction, template matching and pattern recognition. The system identifies different models and gets vehicular traffic statistics. Finally, the implementation of the system is introduced.
Event Recognition Based on Deep Learning in Chinese Texts.

Science.gov (United States)

Zhang, Yajun; Liu, Zongtian; Zhou, Wen

2016-01-01

Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM). Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN), then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Effectiveness of a Video-Versus Text-Based Computer-Tailored Intervention for Obesity Prevention after One Year: A Randomized Controlled Trial

Science.gov (United States)

Cheung, Kei Long; Schwabe, Inga; Walthouwer, Michel J. L.; Oenema, Anke; de Vries, Hein

2017-01-01

Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1) the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI), and (2) the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM) difference = −205.40, p = 0.00) and 12 months (LSM difference = −128.14, p = 0.03). Only video intervention resulted in lower average daily energy intake after one year (d = 0.12). Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring. PMID:29065545

Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction.

Science.gov (United States)

Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung

2017-03-20

Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction

Science.gov (United States)

Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung

2017-01-01

Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. PMID:28335510
The contribution of the body and motion to whole person recognition.

Science.gov (United States)

Simhi, Noa; Yovel, Galit

2016-05-01

While the importance of faces in person recognition has been the subject of many studies, there are relatively few studies examining recognition of the whole person in motion even though this most closely resembles daily experience. Most studies examining the whole body in motion use point light displays, which have many advantages but are impoverished and unnatural compared to real life. To determine which factors are used when recognizing the whole person in motion we conducted two experiments using naturalistic videos. In Experiment 1 we used a matching task in which the first stimulus in each pair could either be a video or multiple still images from a video of the full body. The second stimulus, on which person recognition was performed, could be an image of either the full body or face alone. We found that the body contributed to person recognition beyond the face, but only after exposure to motion. Since person recognition was performed on still images, the contribution of motion to person recognition was mediated by form-from-motion processes. To assess whether dynamic identity signatures may also contribute to person recognition, in Experiment 2 we presented people in motion and examined person recognition from videos compared to still images. Results show that dynamic identity signatures did not contribute to person recognition beyond form-from-motion processes. We conclude that the face, body and form-from-motion processes all appear to play a role in unfamiliar person recognition, suggesting the importance of considering the whole body and motion when examining person perception. Copyright © 2016 Elsevier Ltd. All rights reserved.
Depth-based human activity recognition: A comparative perspective study on feature extraction

Directory of Open Access Journals (Sweden)

Heba Hamdy Ali

2018-06-01

Full Text Available Depth Maps-based Human Activity Recognition is the process of categorizing depth sequences with a particular activity. In this problem, some applications represent robust solutions in domains such as surveillance system, computer vision applications, and video retrieval systems. The task is challenging due to variations inside one class and distinguishes between activities of various classes and video recording settings. In this study, we introduce a detailed study of current advances in the depth maps-based image representations and feature extraction process. Moreover, we discuss the state of art datasets and subsequent classification procedure. Also, a comparative study of some of the more popular depth-map approaches has provided in greater detail. The proposed methods are evaluated on three depth-based datasets “MSR Action 3D”, “MSR Hand Gesture”, and “MSR Daily Activity 3D”. Experimental results achieved 100%, 95.83%, and 96.55% respectively. While combining depth and color features on “RGBD-HuDaAct” Dataset, achieved 89.1%. Keywords: Activity recognition, Depth, Feature extraction, Video, Human body detection, Hand gesture
Human Action Recognition Using Ordinal Measure of Accumulated Motion

Directory of Open Access Journals (Sweden)

Kim Wonjun

2010-01-01

Full Text Available This paper presents a method for recognizing human actions from a single query action video. We propose an action recognition scheme based on the ordinal measure of accumulated motion, which is robust to variations of appearances. To this end, we first define the accumulated motion image (AMI using image differences. Then the AMI of the query action video is resized to a subimage by intensity averaging and a rank matrix is generated by ordering the sample values in the sub-image. By computing the distances from the rank matrix of the query action video to the rank matrices of all local windows in the target video, local windows close to the query action are detected as candidates. To find the best match among the candidates, their energy histograms, which are obtained by projecting AMI values in horizontal and vertical directions, respectively, are compared with those of the query action video. The proposed method does not require any preprocessing task such as learning and segmentation. To justify the efficiency and robustness of our approach, the experiments are conducted on various datasets.
Cell line name recognition in support of the identification of synthetic lethality in cancer from text

Science.gov (United States)

Kaewphan, Suwisa; Van Landeghem, Sofie; Ohta, Tomoko; Van de Peer, Yves; Ginter, Filip; Pyysalo, Sampo

2016-01-01

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: sukaew@utu.fi PMID:26428294
VideoStory Embeddings Recognize Events when Examples are Scarce

OpenAIRE

Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G. M.

2015-01-01

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call VideoStory, the correlati...
Recommendations for Recognizing Video Events by Concept Vocabularies

NARCIS (Netherlands)

Habibian, A.; Snoek, C.G.M.

2014-01-01

Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is
Improving Students� Ability in Writing Hortatory Exposition Texts by Using Process-Genre Based Approach with YouTube Videos as the Media

Directory of Open Access Journals (Sweden)

fifin naili rizkiyah

2017-06-01

Full Text Available Abstract: This research is aimed at finding out how Process-Genre Based Approach strategy with YouTube Videos as the media are employed to improve the students� ability in writing hortatory exposition texts. This study uses collaborative classroom action research design following the procedures namely planning, implementing, observing, and reflecting. The procedures of carrying out the strategy are: (1 relating several issues/ cases to the students� background knowledge and introducing the generic structures and linguistic features of hortatory exposition text as the BKoF stage, (2 analyzing the generic structure and the language features used in the text and getting model on how to write a hortatory exposition text by using the YouTube Video as the MoT stage, (3 writing a hortatory exposition text collaboratively in a small group and in pairs through process writing as the JCoT stage, and (4 writing a hortatory exposition text individually as the ICoT stage. The result shows that the use of Process-Genre Based Approach and YouTube Videos can improve the students� ability in writing hortatory exposition texts. The percentage of the students achieving the score above the minimum passing grade (70 had improved from only 15.8% (3 out of 19 students in the preliminary study to 100% (22 students in the Cycle 1. Besides, the score of each aspect; content, organization, vocabulary, grammar, and mechanics also improved. � Key Words: writing ability, hortatory exposition text, process-genre based approach, youtube video
Understanding Behaviors in Videos through Behavior-Specific Dictionaries

DEFF Research Database (Denmark)

Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor

2018-01-01

Understanding behaviors is the core of video content analysis, which is highly related to two important applications: abnormal event detection and action recognition. Dictionary learning, as one of the mid-level representations, is an important step to process a video. It has achieved state...
An efficient approach for video action classification based on 3d Zernike moments

OpenAIRE

Lassoued , Imen; Zagrouba , Ezzedine; Chahir , Youssef

2011-01-01

International audience; Action recognition in video and still image is one of the most challenging research topics in pattern recognition and computer vision. This paper proposes a new method for video action classification based on 3D Zernike moments. These last ones aim to capturing both structural and temporal information of a time varying sequence. The originality of this approach consists to represent actions in video sequences by a three-dimension shape obtained from different silhouett...
Video2vec Embeddings Recognize Events when Examples are Scarce

OpenAIRE

Habibian, A.; Mensink, T.; Snoek, C.G.M.

2017-01-01

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlatio...
VASIR: An Open-Source Research Platform for Advanced Iris Recognition Technologies.

Science.gov (United States)

Lee, Yooyoung; Micheals, Ross J; Filliben, James J; Phillips, P Jonathon

2013-01-01

The performance of iris recognition systems is frequently affected by input image quality, which in turn is vulnerable to less-than-optimal conditions due to illuminations, environments, and subject characteristics (e.g., distance, movement, face/body visibility, blinking, etc.). VASIR (Video-based Automatic System for Iris Recognition) is a state-of-the-art NIST-developed iris recognition software platform designed to systematically address these vulnerabilities. We developed VASIR as a research tool that will not only provide a reference (to assess the relative performance of alternative algorithms) for the biometrics community, but will also advance (via this new emerging iris recognition paradigm) NIST's measurement mission. VASIR is designed to accommodate both ideal (e.g., classical still images) and less-than-ideal images (e.g., face-visible videos). VASIR has three primary modules: 1) Image Acquisition 2) Video Processing, and 3) Iris Recognition. Each module consists of several sub-components that have been optimized by use of rigorous orthogonal experiment design and analysis techniques. We evaluated VASIR performance using the MBGC (Multiple Biometric Grand Challenge) NIR (Near-Infrared) face-visible video dataset and the ICE (Iris Challenge Evaluation) 2005 still-based dataset. The results showed that even though VASIR was primarily developed and optimized for the less-constrained video case, it still achieved high verification rates for the traditional still-image case. For this reason, VASIR may be used as an effective baseline for the biometrics community to evaluate their algorithm performance, and thus serves as a valuable research platform.
Investigating an Application of Speech-to-Text Recognition: A Study on Visual Attention and Learning Behaviour

Science.gov (United States)

Huang, Y-M.; Liu, C-J.; Shadiev, Rustam; Shen, M-H.; Hwang, W-Y.

2015-01-01

One major drawback of previous research on speech-to-text recognition (STR) is that most findings showing the effectiveness of STR for learning were based upon subjective evidence. Very few studies have used eye-tracking techniques to investigate visual attention of students on STR-generated text. Furthermore, not much attention was paid to…
Learning a Mid-Level Representation for Multiview Action Recognition

Directory of Open Access Journals (Sweden)

Cuiwei Liu

2018-01-01

Full Text Available Recognizing human actions in videos is an active topic with broad commercial potentials. Most of the existing action recognition methods are supposed to have the same camera view during both training and testing. And thus performances of these single-view approaches may be severely influenced by the camera movement and variation of viewpoints. In this paper, we address the above problem by utilizing videos simultaneously recorded from multiple views. To this end, we propose a learning framework based on multitask random forest to exploit a discriminative mid-level representation for videos from multiple cameras. In the first step, subvolumes of continuous human-centered figures are extracted from original videos. In the next step, spatiotemporal cuboids sampled from these subvolumes are characterized by multiple low-level descriptors. Then a set of multitask random forests are built upon multiview cuboids sampled at adjacent positions and construct an integrated mid-level representation for multiview subvolumes of one action. Finally, a random forest classifier is employed to predict the action category in terms of the learned representation. Experiments conducted on the multiview IXMAS action dataset illustrate that the proposed method can effectively recognize human actions depicted in multiview videos.
Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition

Directory of Open Access Journals (Sweden)

Min Peng

2017-10-01

Full Text Available Facial micro-expression is a brief involuntary facial movement and can reveal the genuine emotion that people try to conceal. Traditional methods of spontaneous micro-expression recognition rely excessively on sophisticated hand-crafted feature design and the recognition rate is not high enough for its practical application. In this paper, we proposed a Dual Temporal Scale Convolutional Neural Network (DTSCNN for spontaneous micro-expressions recognition. The DTSCNN is a two-stream network. Different of stream of DTSCNN is used to adapt to different frame rate of micro-expression video clips. Each stream of DSTCNN consists of independent shallow network for avoiding the overfitting problem. Meanwhile, we fed the networks with optical-flow sequences to ensure that the shallow networks can further acquire higher-level features. Experimental results on spontaneous micro-expression databases (CASME I/II showed that our method can achieve a recognition rate almost 10% higher than what some state-of-the-art method can achieve.
Development of Infrared Lip Movement Sensor for Spoken Word Recognition

Directory of Open Access Journals (Sweden)

Takahiro Yoshida

2007-12-01

Full Text Available Lip movement of speaker is very informative for many application of speech signal processing such as multi-modal speech recognition and password authentication without speech signal. However, in collecting multi-modal speech information, we need a video camera, large amount of memory, video interface, and high speed processor to extract lip movement in real time. Such a system tends to be expensive and large. This is one reasons of preventing the use of multi-modal speech processing. In this study, we have developed a simple infrared lip movement sensor mounted on a headset, and made it possible to acquire lip movement by PDA, mobile phone, and notebook PC. The sensor consists of an infrared LED and an infrared photo transistor, and measures the lip movement by the reflected light from the mouth region. From experiment, we achieved 66% successfully word recognition rate only by lip movement features. This experimental result shows that our developed sensor can be utilized as a tool for multi-modal speech processing by combining a microphone mounted on the headset.
Episodic Reasoning for Vision-Based Human Action Recognition

Directory of Open Access Journals (Sweden)

Maria J. Santofimia

2014-01-01

Full Text Available Smart Spaces, Ambient Intelligence, and Ambient Assisted Living are environmental paradigms that strongly depend on their capability to recognize human actions. While most solutions rest on sensor value interpretations and video analysis applications, few have realized the importance of incorporating common-sense capabilities to support the recognition process. Unfortunately, human action recognition cannot be successfully accomplished by only analyzing body postures. On the contrary, this task should be supported by profound knowledge of human agency nature and its tight connection to the reasons and motivations that explain it. The combination of this knowledge and the knowledge about how the world works is essential for recognizing and understanding human actions without committing common-senseless mistakes. This work demonstrates the impact that episodic reasoning has in improving the accuracy of a computer vision system for human action recognition. This work also presents formalization, implementation, and evaluation details of the knowledge model that supports the episodic reasoning.
Comparing a Video and Text Version of a Web-Based Computer-Tailored Intervention for Obesity Prevention: A Randomized Controlled Trial.

Science.gov (United States)

Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein

2015-10-19

Web-based computer-tailored interventions often suffer from small effect sizes and high drop-out rates, particularly among people with a low level of education. Using videos as a delivery format can possibly improve the effects and attractiveness of these interventions The main aim of this study was to examine the effects of a video and text version of a Web-based computer-tailored obesity prevention intervention on dietary intake, physical activity, and body mass index (BMI) among Dutch adults. A second study aim was to examine differences in appreciation between the video and text version. The final study aim was to examine possible differences in intervention effects and appreciation per educational level. A three-armed randomized controlled trial was conducted with a baseline and 6 months follow-up measurement. The intervention consisted of six sessions, lasting about 15 minutes each. In the video version, the core tailored information was provided by means of videos. In the text version, the same tailored information was provided in text format. Outcome variables were self-reported and included BMI, physical activity, energy intake, and appreciation of the intervention. Multiple imputation was used to replace missing values. The effect analyses were carried out with multiple linear regression analyses and adjusted for confounders. The process evaluation data were analyzed with independent samples t tests. The baseline questionnaire was completed by 1419 participants and the 6 months follow-up measurement by 1015 participants (71.53%). No significant interaction effects of educational level were found on any of the outcome variables. Compared to the control condition, the video version resulted in lower BMI (B=-0.25, P=.049) and lower average daily energy intake from energy-dense food products (B=-175.58, PWeb-based computer-tailored obesity prevention intervention was the most effective intervention and most appreciated. Future research needs to examine if the
Composite Wavelet Filters for Enhanced Automated Target Recognition

Science.gov (United States)

Chiang, Jeffrey N.; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

2012-01-01

Automated Target Recognition (ATR) systems aim to automate target detection, recognition, and tracking. The current project applies a JPL ATR system to low-resolution sonar and camera videos taken from unmanned vehicles. These sonar images are inherently noisy and difficult to interpret, and pictures taken underwater are unreliable due to murkiness and inconsistent lighting. The ATR system breaks target recognition into three stages: 1) Videos of both sonar and camera footage are broken into frames and preprocessed to enhance images and detect Regions of Interest (ROIs). 2) Features are extracted from these ROIs in preparation for classification. 3) ROIs are classified as true or false positives using a standard Neural Network based on the extracted features. Several preprocessing, feature extraction, and training methods are tested and discussed in this paper.

Cross-View Action Recognition via Transferable Dictionary Learning.

Science.gov (United States)

Zheng, Jingjing; Jiang, Zhuolin; Chellappa, Rama

2016-05-01

Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.
Iris recognition in less constrained environments: a video-based approach

OpenAIRE

Mahadeo, Nitin Kumar

2017-01-01

This dissertation focuses on iris biometrics. Although the iris is the most accurate biometric, its adoption has been relatively slow. Conventional iris recognition systems utilize still eye images captured in ideal environments and require highly constrained subject presentation. A drop in recognition performance is observed when these constraints are removed as the quality of the data acquired is affected by heterogeneous factors. For iris recognition to be widely adopted, it can therefore ...
Dynamic Textures Modeling via Joint Video Dictionary Learning.

Science.gov (United States)

Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng

2017-04-06

Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
The Effects of Video Self-Modeling on the Decoding Skills of Children At Risk for Reading Disabilities

OpenAIRE

Ayala, Sandra M

2010-01-01

Ten first grade students, participating in a Tier II response to intervention (RTI) reading program received an intervention of video self modeling to improve decoding skills and sight word recognition. The students were video recorded blending and segmenting decodable words, and reading sight words taken directly from their curriculum instruction. Individual videos were recorded and edited to show students successfully and accurately decoding words and practicing sight word recognition. Each...
Video2vec Embeddings Recognize Events when Examples are Scarce

NARCIS (Netherlands)

Habibian, A.; Mensink, T.; Snoek, C.G.M.

2017-01-01

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire
Candidate Smoke Region Segmentation of Fire Video Based on Rough Set Theory

Directory of Open Access Journals (Sweden)

Yaqin Zhao

2015-01-01

Full Text Available Candidate smoke region segmentation is the key link of smoke video detection; an effective and prompt method of candidate smoke region segmentation plays a significant role in a smoke recognition system. However, the interference of heavy fog and smoke-color moving objects greatly degrades the recognition accuracy. In this paper, a novel method of candidate smoke region segmentation based on rough set theory is presented. First, Kalman filtering is used to update video background in order to exclude the interference of static smoke-color objects, such as blue sky. Second, in RGB color space smoke regions are segmented by defining the upper approximation, lower approximation, and roughness of smoke-color distribution. Finally, in HSV color space small smoke regions are merged by the definition of equivalence relation so as to distinguish smoke images from heavy fog images in terms of V component value variety from center to edge of smoke region. The experimental results on smoke region segmentation demonstrated the effectiveness and usefulness of the proposed scheme.
OLIVE: Speech-Based Video Retrieval

NARCIS (Netherlands)

de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus

1999-01-01

This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the
Human features detection in video surveillance

OpenAIRE

Barbosa, Patrícia Margarida Silva de Castro Neves

2016-01-01

Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores Human activity recognition algorithms have been studied actively from decades using a sequence of 2D and 3D images from a video surveillance. This new surveillance solutions and the areas of image processing and analysis have been receiving special attention and interest from the scientific community. Thus, it became possible to witness the appearance of new video compression techniques, the tr...
Individual recognition based on communication behaviour of male fowl.

Science.gov (United States)

Smith, Carolynn L; Taubert, Jessica; Weldon, Kimberly; Evans, Christopher S

2016-04-01

Correctly directing social behaviour towards a specific individual requires an ability to discriminate between conspecifics. The mechanisms of individual recognition include phenotype matching and familiarity-based recognition. Communication-based recognition is a subset of familiarity-based recognition wherein the classification is based on behavioural or distinctive signalling properties. Male fowl (Gallus gallus) produce a visual display (tidbitting) upon finding food in the presence of a female. Females typically approach displaying males. However, males may tidbit without food. We used the distinctiveness of the visual display and the unreliability of some males to test for communication-based recognition in female fowl. We manipulated the prior experience of the hens with the males to create two classes of males: S(+) wherein the tidbitting signal was paired with a food reward to the female, and S (-) wherein the tidbitting signal occurred without food reward. We then conducted a sequential discrimination test with hens using a live video feed of a familiar male. The results of the discrimination tests revealed that hens discriminated between categories of males based on their signalling behaviour. These results suggest that fowl possess a communication-based recognition system. This is the first demonstration of live-to-video transfer of recognition in any species of bird. Copyright © 2016 Elsevier B.V. All rights reserved.
Students' Learning Experiences from Didactic Teaching Sessions Including Patient Case Examples as Either Text or Video: A Qualitative Study.

Science.gov (United States)

Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte; Mors, Ole; Ringsted, Charlotte; Morcke, Anne Mette

2017-10-06

The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed interviews. Students taught with text-based patient cases emphasized excitement and drama towards the personal clinical narratives presented by the teachers during the course, but never referred to the patient cases. Authority and boundary setting were regarded as important in managing patients. Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. The format of patient cases included in teaching may have a substantial impact on students' patient-centeredness. Video-based patient cases are probably more effective than text-based patient cases in fostering patient-centered perspectives in medical students. Teachers sharing stories from their own clinical experiences stimulates both engagement and excitement, but may also provoke unintended stigma and influence an authoritative approach in medical students towards managing patients in clinical psychiatry.
High-Level Event Recognition in Unconstrained Videos

Science.gov (United States)

2013-01-01

frames per- forms well for urban soundscapes but not for polyphonic music. In place of GMM, Lu et al. [78] adopted spectral clustering to generate...Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficientmodel for urban soundscapes but not
A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare.

Science.gov (United States)

Peña, Raul; Ávila, Alfonso; Muñoz, David; Lavariega, Juan

2015-01-01

The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal's samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of -2.6196% for high and medium motion video sequences.
ALGORITHM OF PLACEMENT OF VIDEO SURVEILLANCE CAMERAS AND ITS SOFTWARE IMPLEMENTATION

Directory of Open Access Journals (Sweden)

Loktev Alexey Alexeevich

2012-10-01

Full Text Available Comprehensive distributed safety, control, and monitoring systems applied by companies and organizations of different ownership structure play a substantial role in the present-day society. Video surveillance elements that ensure image processing and decision making in automated or automatic modes are the essential components of new systems. This paper covers the modeling of video surveillance systems installed in buildings, and the algorithm, or pattern, of video camera placement with due account for nearly all characteristics of buildings, detection and recognition facilities, and cameras themselves. This algorithm will be subsequently implemented as a user application. The project contemplates a comprehensive approach to the automatic placement of cameras that take account of their mutual positioning and compatibility of tasks. The project objective is to develop the principal elements of the algorithm of recognition of a moving object to be detected by several cameras. The image obtained by different cameras will be processed. Parameters of motion are to be identified to develop a table of possible options of routes. The implementation of the recognition algorithm represents an independent research project to be covered by a different article. This project consists in the assessment of the degree of complexity of an algorithm of camera placement designated for identification of cases of inaccurate algorithm implementation, as well as in the formulation of supplementary requirements and input data by means of intercrossing sectors covered by neighbouring cameras. The project also contemplates identification of potential problems in the course of development of a physical security and monitoring system at the stage of the project design development and testing. The camera placement algorithm has been implemented as a software application that has already been pilot tested on buildings and inside premises that have irregular dimensions. The
A video, text, and speech-driven realistic 3-d virtual head for human-machine interface.

Science.gov (United States)

Yu, Jun; Wang, Zeng-Fu

2015-05-01

A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.
Playing with Video Games: Going to a New Addiction?

Science.gov (United States)

Tavormina, Maurilio Giuseppe Maria; Tavormina, Romina

2017-09-01

The frequent and protracted use of video games with serious personal, family and social consequences is no longer just a pleasant pastime and could lead to mental and physical health problems. Although there is no official recognition of video game addiction on the Internet as a mild mental health disorder, further scientific research is needed.
Human Activity Recognition in Real-Times Environments using Skeleton Joints

Directory of Open Access Journals (Sweden)

Ajay Kumar

2016-06-01

Full Text Available In this research work, we proposed a most effective noble approach for Human activity recognition in real-time environments. We recognize several distinct dynamic human activity actions using kinect. A 3D skeleton data is processed from real-time video gesture to sequence of frames and getter skeleton joints (Energy Joints, orientation, rotations of joint angles from selected setof frames. We are using joint angle and orientations, rotations information from Kinect therefore less computation required. However, after extracting the set of frames we implemented several classification techniques Principal Component Analysis (PCA with several distance based classifiers and Artificial Neural Network (ANN respectively with some variants for classify our all different gesture models. However, we conclude that use very less number of frame (10-15% for train our system efficiently from the entire set of gesture frames. Moreover, after successfully completion of our classification methods we clinch an excellent overall accuracy 94%, 96% and 98% respectively. We finally observe that our proposed system is more useful than comparing to other existing system, therefore our model is best suitable for real-time application such as in video games for player action/gesture recognition.
Using cloud computing technologies in IP-video surveillance systems with the function of 3d-object modelling

Directory of Open Access Journals (Sweden)

Zhigalov Kirill

2018-01-01

Full Text Available This article is devoted to the integration of cloud technology functions into 3D IP video surveil-lance systems in order to conduct further video Analytics, incoming real-time data, as well as stored video materials on the server in the «cloud». The main attention is devoted to «cloud technologies» usage optimizing the process of recognition of the desired object by increasing the criteria of flexibility and scalability of the system. Transferring image load from the client to the cloud server, to the virtual part of the system. The development of the issues considered in the article in terms of data analysis, which will significantly improve the effectiveness of the implementation of special tasks facing special units.
Person Identification from Video with Multiple Biometric Cues: Benchmarks for Human and Machine Performance

National Research Council Canada - National Science Library

O'Toole, Alice

2003-01-01

.... Experiments have been completed comparing the effects of several types of facial motion on face recognition, the effects of face familiarity on recognition from video clips taken at a distance...
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

KAUST Repository

Heilbron, Fabian Caba; Niebles, Juan Carlos; Ghanem, Bernard

2016-01-01

In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos. Current approaches for activity detection still struggle to handle large-scale video collections and the task remains relatively unexplored. This is in part due to the computational complexity of current action recognition approaches and the lack of a method that proposes fewer intervals in the video, where activity processing can be focused. In this paper, we introduce a proposal method that aims to recover temporal segments containing actions in untrimmed videos. Building on techniques for learning sparse dictionaries, we introduce a learning framework to represent and retrieve activity proposals. We demonstrate the capabilities of our method in not only producing high quality proposals but also in its efficiency. Finally, we show the positive impact our method has on recognition performance when it is used for action detection, while running at 10FPS.
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

KAUST Repository

Heilbron, Fabian Caba

2016-12-13

In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos. Current approaches for activity detection still struggle to handle large-scale video collections and the task remains relatively unexplored. This is in part due to the computational complexity of current action recognition approaches and the lack of a method that proposes fewer intervals in the video, where activity processing can be focused. In this paper, we introduce a proposal method that aims to recover temporal segments containing actions in untrimmed videos. Building on techniques for learning sparse dictionaries, we introduce a learning framework to represent and retrieve activity proposals. We demonstrate the capabilities of our method in not only producing high quality proposals but also in its efficiency. Finally, we show the positive impact our method has on recognition performance when it is used for action detection, while running at 10FPS.

NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration ... Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: ...
Use and Effectiveness of a Video- and Text-Driven Web-Based Computer-Tailored Intervention: Randomized Controlled Trial.

Science.gov (United States)

Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein

2015-09-25

Many Web-based computer-tailored interventions are characterized by high dropout rates, which limit their potential impact. This study had 4 aims: (1) examining if the use of a Web-based computer-tailored obesity prevention intervention can be increased by using videos as the delivery format, (2) examining if the delivery of intervention content via participants' preferred delivery format can increase intervention use, (3) examining if intervention effects are moderated by intervention use and matching or mismatching intervention delivery format preference, (4) and identifying which sociodemographic factors and intervention appreciation variables predict intervention use. Data were used from a randomized controlled study into the efficacy of a video and text version of a Web-based computer-tailored obesity prevention intervention consisting of a baseline measurement and a 6-month follow-up measurement. The intervention consisted of 6 weekly sessions and could be used for 3 months. ANCOVAs were conducted to assess differences in use between the video and text version and between participants allocated to a matching and mismatching intervention delivery format. Potential moderation by intervention use and matching/mismatching delivery format on self-reported body mass index (BMI), physical activity, and energy intake was examined using regression analyses with interaction terms. Finally, regression analysis was performed to assess determinants of intervention use. In total, 1419 participants completed the baseline questionnaire (follow-up response=71.53%, 1015/1419). Intervention use declined rapidly over time; the first 2 intervention sessions were completed by approximately half of the participants and only 10.9% (104/956) of the study population completed all 6 sessions of the intervention. There were no significant differences in use between the video and text version. Intervention use was significantly higher among participants who were allocated to an
Adaptive pattern recognition in real-time video-based soccer analysis

DEFF Research Database (Denmark)

Schlipsing, Marc; Salmen, Jan; Tschentscher, Marc

2017-01-01

are taken into account. Our contribution is twofold: (1) the deliberate use of machine learning and pattern recognition techniques allows us to achieve high classification accuracy in varying environments. We systematically evaluate combinations of image features and learning machines in the given online......Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for "live" recognition and tracking of players exist. But these methods are far from collecting and analyzing event data fully autonomously. To generate accurate results......, human interaction is required at different stages including system setup, calibration, supervision of classifier training, and resolution of tracking conflicts. Furthermore, the real-time constraints are challenging: in contrast to other object recognition and tracking applications, we cannot treat data...
Face puzzle—two new video-based tasks for measuring explicit and implicit aspects of facial emotion recognition

Science.gov (United States)

Kliemann, Dorit; Rosenblau, Gabriela; Bölte, Sven; Heekeren, Hauke R.; Dziobek, Isabel

2013-01-01

Recognizing others' emotional states is crucial for effective social interaction. While most facial emotion recognition tasks use explicit prompts that trigger consciously controlled processing, emotional faces are almost exclusively processed implicitly in real life. Recent attempts in social cognition suggest a dual process perspective, whereby explicit and implicit processes largely operate independently. However, due to differences in methodology the direct comparison of implicit and explicit social cognition has remained a challenge. Here, we introduce a new tool to comparably measure implicit and explicit processing aspects comprising basic and complex emotions in facial expressions. We developed two video-based tasks with similar answer formats to assess performance in respective facial emotion recognition processes: Face Puzzle, implicit and explicit. To assess the tasks' sensitivity to atypical social cognition and to infer interrelationship patterns between explicit and implicit processes in typical and atypical development, we included healthy adults (NT, n = 24) and adults with autism spectrum disorder (ASD, n = 24). Item analyses yielded good reliability of the new tasks. Group-specific results indicated sensitivity to subtle social impairments in high-functioning ASD. Correlation analyses with established implicit and explicit socio-cognitive measures were further in favor of the tasks' external validity. Between group comparisons provide first hints of differential relations between implicit and explicit aspects of facial emotion recognition processes in healthy compared to ASD participants. In addition, an increased magnitude of between group differences in the implicit task was found for a speed-accuracy composite measure. The new Face Puzzle tool thus provides two new tasks to separately assess explicit and implicit social functioning, for instance, to measure subtle impairments as well as potential improvements due to social cognitive
Video Retrieval Berdasarkan Teks dan Gambar

Directory of Open Access Journals (Sweden)

Rahmi Hidayati

2013-01-01

Abstract Retrieval video has been used to search a video based on the query entered by user which were text and image. This system could increase the searching ability on video browsing and expected to reduce the video’s retrieval time. The research purposes were designing and creating a software application of retrieval video based on the text and image on the video. The index process for the text is tokenizing, filtering (stopword, stemming. The results of stemming to saved in the text index table. Index process for the image is to create an image color histogram and compute the mean and standard deviation at each primary color red, green and blue (RGB of each image. The results of feature extraction is stored in the image table The process of video retrieval using the query text, images or both. To text query system to process the text query by looking at the text index tables. If there is a text query on the index table system will display information of the video according to the text query. To image query system to process the image query by finding the value of the feature extraction means red, green means, means blue, red standard deviation, standard deviation and standard deviation of blue green. If the value of the six features extracted query image on the index table image will display the video information system according to the query image. To query text and query images, the system will display the video information if the query text and query images have a relationship that is query text and query image has the same film title. Keywords— video, index, retrieval, text, image
Enhanced bag of words using multilevel k-means for human activity recognition

Directory of Open Access Journals (Sweden)

Motasem Elshourbagy

2016-07-01

Full Text Available This paper aims to enhance the bag of features in order to improve the accuracy of human activity recognition. In this paper, human activity recognition process consists of four stages: local space time features detection, feature description, bag of features representation, and SVMs classification. The k-means step in the bag of features is enhanced by applying three levels of clustering: clustering per video, clustering per action class, and clustering for the final code book. The experimental results show that the proposed method of enhancement reduces the time and memory requirements, and enables the use of all training data in the k-means clustering algorithm. The evaluation of accuracy of action classification on two popular datasets (KTH and Weizmann has been performed. In addition, the proposed method improves the human activity recognition accuracy by 5.57% on the KTH dataset using the same detector, descriptor, and classifier.
Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

Science.gov (United States)

Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

2014-06-01

Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that
Object Occlusion Detection Using Automatic Camera Calibration for a Wide-Area Video Surveillance System

Directory of Open Access Journals (Sweden)

Jaehoon Jung

2016-06-01

Full Text Available This paper presents an object occlusion detection algorithm using object depth information that is estimated by automatic camera calibration. The object occlusion problem is a major factor to degrade the performance of object tracking and recognition. To detect an object occlusion, the proposed algorithm consists of three steps: (i automatic camera calibration using both moving objects and a background structure; (ii object depth estimation; and (iii detection of occluded regions. The proposed algorithm estimates the depth of the object without extra sensors but with a generic red, green and blue (RGB camera. As a result, the proposed algorithm can be applied to improve the performance of object tracking and object recognition algorithms for video surveillance systems.
Prediction of visual saliency in video with deep CNNs

Science.gov (United States)

Chaabouni, Souad; Benois-Pineau, Jenny; Hadar, Ofer

2016-09-01

Prediction of visual saliency in images and video is a highly researched topic. Target applications include Quality assessment of multimedia services in mobile context, video compression techniques, recognition of objects in video streams, etc. In the framework of mobile and egocentric perspectives, visual saliency models cannot be founded only on bottom-up features, as suggested by feature integration theory. The central bias hypothesis, is not respected neither. In this case, the top-down component of human visual attention becomes prevalent. Visual saliency can be predicted on the basis of seen data. Deep Convolutional Neural Networks (CNN) have proven to be a powerful tool for prediction of salient areas in stills. In our work we also focus on sensitivity of human visual system to residual motion in a video. A Deep CNN architecture is designed, where we incorporate input primary maps as color values of pixels and magnitude of local residual motion. Complementary contrast maps allow for a slight increase of accuracy compared to the use of color and residual motion only. The experiments show that the choice of the input features for the Deep CNN depends on visual task:for th eintersts in dynamic content, the 4K model with residual motion is more efficient, and for object recognition in egocentric video the pure spatial input is more appropriate.
Recognizing flu-like symptoms from videos.

Science.gov (United States)

Thi, Tuan Hue; Wang, Li; Ye, Ning; Zhang, Jian; Maurer-Stroh, Sebastian; Cheng, Li

2014-09-12

Vision-based surveillance and monitoring is a potential alternative for early detection of respiratory disease outbreaks in urban areas complementing molecular diagnostics and hospital and doctor visit-based alert systems. Visible actions representing typical flu-like symptoms include sneeze and cough that are associated with changing patterns of hand to head distances, among others. The technical difficulties lie in the high complexity and large variation of those actions as well as numerous similar background actions such as scratching head, cell phone use, eating, drinking and so on. In this paper, we make a first attempt at the challenging problem of recognizing flu-like symptoms from videos. Since there was no related dataset available, we created a new public health dataset for action recognition that includes two major flu-like symptom related actions (sneeze and cough) and a number of background actions. We also developed a suitable novel algorithm by introducing two types of Action Matching Kernels, where both types aim to integrate two aspects of local features, namely the space-time layout and the Bag-of-Words representations. In particular, we show that the Pyramid Match Kernel and Spatial Pyramid Matching are both special cases of our proposed kernels. Besides experimenting on standard testbed, the proposed algorithm is evaluated also on the new sneeze and cough set. Empirically, we observe that our approach achieves competitive performance compared to the state-of-the-arts, while recognition on the new public health dataset is shown to be a non-trivial task even with simple single person unobstructed view. Our sneeze and cough video dataset and newly developed action recognition algorithm is the first of its kind and aims to kick-start the field of action recognition of flu-like symptoms from videos. It will be challenging but necessary in future developments to consider more complex real-life scenario of detecting these actions simultaneously from
Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition.

Science.gov (United States)

Yuan, Chunfeng; Li, Xi; Hu, Weiming; Ling, Haibin; Maybank, Stephen J

2014-02-01

In this paper, we present a new geometric-temporal representation for visual action recognition based on local spatio-temporal features. First, we propose a modified covariance descriptor under the log-Euclidean Riemannian metric to represent the spatio-temporal cuboids detected in the video sequences. Compared with previously proposed covariance descriptors, our descriptor can be measured and clustered in Euclidian space. Second, to capture the geometric-temporal contextual information, we construct a directional pyramid co-occurrence matrix (DPCM) to describe the spatio-temporal distribution of the vector-quantized local feature descriptors extracted from a video. DPCM characterizes the co-occurrence statistics of local features as well as the spatio-temporal positional relationships among the concurrent features. These statistics provide strong descriptive power for action recognition. To use DPCM for action recognition, we propose a directional pyramid co-occurrence matching kernel to measure the similarity of videos. The proposed method achieves the state-of-the-art performance and improves on the recognition performance of the bag-of-visual-words (BOVWs) models by a large margin on six public data sets. For example, on the KTH data set, it achieves 98.78% accuracy while the BOVW approach only achieves 88.06%. On both Weizmann and UCF CIL data sets, the highest possible accuracy of 100% is achieved.
Students' Learning Experiences from Didactic Teaching Sessions Including Patient Case Examples as Either Text or Video

DEFF Research Database (Denmark)

Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte

2017-01-01

OBJECTIVES: The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students......' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. METHODS: The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed...
Toward fast feature adaptation and localization for real-time face recognition systems

NARCIS (Netherlands)

Zuo, F.; With, de P.H.N.; Ebrahimi, T.; Sikora, T.

2003-01-01

In a home environment, video surveillance employing face detection and recognition is attractive for new applications. Facial feature (e.g. eyes and mouth) localization in the face is an essential task for face recognition because it constitutes an indispensable step for face geometry normalization.
Expert Behavior in Children's Video Game Play.

Science.gov (United States)

VanDeventer, Stephanie S.; White, James A.

2002-01-01

Investigates the display of expert behavior by seven outstanding video game-playing children ages 10 and 11. Analyzes observation and debriefing transcripts for evidence of self-monitoring, pattern recognition, principled decision making, qualitative thinking, and superior memory, and discusses implications for educators regarding the development…
Improvement of QR Code Recognition Based on Pillbox Filter Analysis

Directory of Open Access Journals (Sweden)

Jia-Shing Sheu

2013-04-01

Full Text Available The objective of this paper is to perform the innovation design for improving the recognition of a captured QR code image with blur through the Pillbox filter analysis. QR code images can be captured by digital video cameras. Many factors contribute to QR code decoding failure, such as the low quality of the image. Focus is an important factor that affects the quality of the image. This study discusses the out-of-focus QR code image and aims to improve the recognition of the contents in the QR code image. Many studies have used the pillbox filter (circular averaging filter method to simulate an out-of-focus image. This method is also used in this investigation to improve the recognition of a captured QR code image. A blurred QR code image is separated into nine levels. In the experiment, four different quantitative approaches are used to reconstruct and decode an out-of-focus QR code image. These nine reconstructed QR code images using methods are then compared. The final experimental results indicate improvements in identification.
A Comparison of Video Modeling, Text-Based Instruction, and No Instruction for Creating Multiple Baseline Graphs in Microsoft Excel

Science.gov (United States)

Tyner, Bryan C.; Fienup, Daniel M.

2015-01-01

Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance.…
Activity-based exploitation of Full Motion Video (FMV)

Science.gov (United States)

Kant, Shashi

2012-06-01

Video has been a game-changer in how US forces are able to find, track and defeat its adversaries. With millions of minutes of video being generated from an increasing number of sensor platforms, the DOD has stated that the rapid increase in video is overwhelming their analysts. The manpower required to view and garner useable information from the flood of video is unaffordable, especially in light of current fiscal restraints. "Search" within full-motion video has traditionally relied on human tagging of content, and video metadata, to provision filtering and locate segments of interest, in the context of analyst query. Our approach utilizes a novel machine-vision based approach to index FMV, using object recognition & tracking, events and activities detection. This approach enables FMV exploitation in real-time, as well as a forensic look-back within archives. This approach can help get the most information out of video sensor collection, help focus the attention of overburdened analysts form connections in activity over time and conserve national fiscal resources in exploiting FMV.
Recognition of dementia in hospitalized older adults.

Science.gov (United States)

Maslow, Katie; Mezey, Mathy

2008-01-01

Many hospital patients with dementia have no documented dementia diagnosis. In some cases, this is because they have never been diagnosed. Recognition of Dementia in Hospitalized Older Adults proposes several approaches that hospital nurses can use to increase recognition of dementia. This article describes the Try This approaches, how to implement them, and how to incorporate them into a hospital's current admission procedures. For a free online video demonstrating the use of these approaches, go to http://links.lww.com/A216.
A Novel Maximum Entropy Markov Model for Human Facial Expression Recognition.

Directory of Open Access Journals (Sweden)

Muhammad Hameed Siddiqi

Full Text Available Research in video based FER systems has exploded in the past decade. However, most of the previous methods work well when they are trained and tested on the same dataset. Illumination settings, image resolution, camera angle, and physical characteristics of the people differ from one dataset to another. Considering a single dataset keeps the variance, which results from differences, to a minimum. Having a robust FER system, which can work across several datasets, is thus highly desirable. The aim of this work is to design, implement, and validate such a system using different datasets. In this regard, the major contribution is made at the recognition module which uses the maximum entropy Markov model (MEMM for expression recognition. In this model, the states of the human expressions are modeled as the states of an MEMM, by considering the video-sensor observations as the observations of MEMM. A modified Viterbi is utilized to generate the most probable expression state sequence based on such observations. Lastly, an algorithm is designed which predicts the expression state from the generated state sequence. Performance is compared against several existing state-of-the-art FER systems on six publicly available datasets. A weighted average accuracy of 97% is achieved across all datasets.
Action Recognition Using Discriminative Structured Trajectory Groups

KAUST Repository

Atmosukarto, Indriyati

2015-01-06

In this paper, we develop a novel framework for action recognition in videos. The framework is based on automatically learning the discriminative trajectory groups that are relevant to an action. Different from previous approaches, our method does not require complex computation for graph matching or complex latent models to localize the parts. We model a video as a structured bag of trajectory groups with latent class variables. We model action recognition problem in a weakly supervised setting and learn discriminative trajectory groups by employing multiple instance learning (MIL) based Support Vector Machine (SVM) using pre-computed kernels. The kernels depend on the spatio-temporal relationship between the extracted trajectory groups and their associated features. We demonstrate both quantitatively and qualitatively that the classification performance of our proposed method is superior to baselines and several state-of-the-art approaches on three challenging standard benchmark datasets.

Design and Implementation of Behavior Recognition System Based on Convolutional Neural Network

Directory of Open Access Journals (Sweden)

Yu Bo

2017-01-01

Full Text Available We build a set of human behavior recognition system based on the convolution neural network constructed for the specific human behavior in public places. Firstly, video of human behavior data set will be segmented into images, then we process the images by the method of background subtraction to extract moving foreground characters of body. Secondly, the training data sets are trained into the designed convolution neural network, and the depth learning network is constructed by stochastic gradient descent. Finally, the various behaviors of samples are classified and identified with the obtained network model, and the recognition results are compared with the current mainstream methods. The result show that the convolution neural network can study human behavior model automatically and identify human’s behaviors without any manually annotated trainings.
Cognitive diffusion model with user-oriented context-to-text recognition for learning to promote high level cognitive processes

Directory of Open Access Journals (Sweden)

Wu-Yuin Hwang

2014-03-01

Full Text Available There is a large number of studies on how to promote students’ cognitive processes and learning achievements through various learning activities supported by advanced learning technologies. However, not many of them focus on applying the knowledge that students learn in school to solve authentic daily life problems. This study aims to propose a cognitive diffusion model called User-oriented Context-to-Text Recognition for Learning (U-CTRL to facilitate and improve students’ learning and cognitive processes from lower levels (i.e., Remember and Understand to higher levels (i.e., Apply and above through an innovative approach, called User-Oriented Context-to-Text Recognition for Learning (U-CTRL. With U-CTRL, students participate in learning activities in which they capture the learning context that can be scanned and recognized by a computer application as text. Furthermore, this study proposes the use of an innovative model, called Cognitive Diffusion Model, to investigate the diffusion and transition of students’ cognitive processes in different learning stages including pre-schooling, after-schooling, crossing the chasm, and higher cognitive processing. Finally, two cases are presented to demonstrate how the U-CTRL approach can be used to facilitate student cognition in their learning of English and Natural science.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Automatic Facial Expression Recognition and Operator Functional State

Science.gov (United States)

Blanson, Nina

2012-01-01

The prevalence of human error in safety-critical occupations remains a major challenge to mission success despite increasing automation in control processes. Although various methods have been proposed to prevent incidences of human error, none of these have been developed to employ the detection and regulation of Operator Functional State (OFS), or the optimal condition of the operator while performing a task, in work environments due to drawbacks such as obtrusiveness and impracticality. A video-based system with the ability to infer an individual's emotional state from facial feature patterning mitigates some of the problems associated with other methods of detecting OFS, like obtrusiveness and impracticality in integration with the mission environment. This paper explores the utility of facial expression recognition as a technology for inferring OFS by first expounding on the intricacies of OFS and the scientific background behind emotion and its relationship with an individual's state. Then, descriptions of the feedback loop and the emotion protocols proposed for the facial recognition program are explained. A basic version of the facial expression recognition program uses Haar classifiers and OpenCV libraries to automatically locate key facial landmarks during a live video stream. Various methods of creating facial expression recognition software are reviewed to guide future extensions of the program. The paper concludes with an examination of the steps necessary in the research of emotion and recommendations for the creation of an automatic facial expression recognition program for use in real-time, safety-critical missions
Automatic Facial Expression Recognition and Operator Functional State

Science.gov (United States)

Blanson, Nina

2011-01-01

The prevalence of human error in safety-critical occupations remains a major challenge to mission success despite increasing automation in control processes. Although various methods have been proposed to prevent incidences of human error, none of these have been developed to employ the detection and regulation of Operator Functional State (OFS), or the optimal condition of the operator while performing a task, in work environments due to drawbacks such as obtrusiveness and impracticality. A video-based system with the ability to infer an individual's emotional state from facial feature patterning mitigates some of the problems associated with other methods of detecting OFS, like obtrusiveness and impracticality in integration with the mission environment. This paper explores the utility of facial expression recognition as a technology for inferring OFS by first expounding on the intricacies of OFS and the scientific background behind emotion and its relationship with an individual's state. Then, descriptions of the feedback loop and the emotion protocols proposed for the facial recognition program are explained. A basic version of the facial expression recognition program uses Haar classifiers and OpenCV libraries to automatically locate key facial landmarks during a live video stream. Various methods of creating facial expression recognition software are reviewed to guide future extensions of the program. The paper concludes with an examination of the steps necessary in the research of emotion and recommendations for the creation of an automatic facial expression recognition program for use in real-time, safety-critical missions.
Learners' Use of Communication Strategies in Text-Based and Video-Based Synchronous Computer-Mediated Communication Environments: Opportunities for Language Learning

Science.gov (United States)

Hung, Yu-Wan; Higgins, Steve

2016-01-01

This study investigates the different learning opportunities enabled by text-based and video-based synchronous computer-mediated communication (SCMC) from an interactionist perspective. Six Chinese-speaking learners of English and six English-speaking learners of Chinese were paired up as tandem (reciprocal) learning dyads. Each dyad participated…
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... search for current job openings visit HHS USAJobs Home » NEI YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Reflections on academic video

Directory of Open Access Journals (Sweden)

Thommy Eriksson

2012-11-01

Full Text Available As academics we study, research and teach audiovisual media, yet rarely disseminate and mediate through it. Today, developments in production technologies have enabled academic researchers to create videos and mediate audiovisually. In academia it is taken for granted that everyone can write a text. Is it now time to assume that everyone can make a video essay? Using the online journal of academic videos Audiovisual Thinking and the videos published in it as a case study, this article seeks to reflect on the emergence and legacy of academic audiovisual dissemination. Anchoring academic video and audiovisual dissemination of knowledge in two critical traditions, documentary theory and semiotics, we will argue that academic video is in fact already present in a variety of academic disciplines, and that academic audiovisual essays are bringing trends and developments that have long been part of academic discourse to their logical conclusion.
A comparison of video modeling, text-based instruction, and no instruction for creating multiple baseline graphs in Microsoft Excel.

Science.gov (United States)

Tyner, Bryan C; Fienup, Daniel M

2015-09-01

Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance. Participants who used VM constructed graphs significantly faster and with fewer errors than those who used text-based instruction or no instruction. Implications for instruction are discussed. © Society for the Experimental Analysis of Behavior.
Exploiting Visual Cues in Non-Scripted Lecture Videos for Multi-modal Action Recognition

NARCIS (Netherlands)

Imran, Ali Shariq; Moreno Celleri, Alejandro Manuel; Cheikh, Faouzi Alaya

2012-01-01

The usage of non-scripted lecture videos as a part of learning material is becoming an everyday activity in most of higher education institutions due to the growing interest in flexible and blended education. Generally these videos are delivered as part of Learning Objects (LO) through various
The Role of Verbal Instruction and Visual Guidance in Training Pattern Recognition

Directory of Open Access Journals (Sweden)

Jamie S. North

2017-09-01

Full Text Available We used a novel approach to examine whether it is possible to improve the perceptual–cognitive skill of pattern recognition using a video-based training intervention. Moreover, we investigated whether any improvements in pattern recognition transfer to an improved ability to make anticipation judgments. Finally, we compared the relative effectiveness of verbal and visual guidance interventions compared to a group that merely viewed the same sequences without any intervention and a control group that only completed pre- and post-tests. We found a significant effect for time of testing. Participants were more sensitive in their ability to perceive patterns and distinguish between novel and familiar sequences at post- compared to pre-test. However, this improvement was not influenced by the nature of the intervention, despite some trends in the data. An analysis of anticipation accuracy showed no change from pre- to post-test following the pattern recognition training intervention, suggesting that the link between pattern perception and anticipation may not be strong. We present a series of recommendations for scientists and practitioners when employing training methods to improve pattern recognition and anticipation.
Real-time Multiresolution Crosswalk Detection with Walk Light Recognition for the Blind

Directory of Open Access Journals (Sweden)

ROMIC, K.

2018-02-01

Full Text Available Real-time image processing and object detection techniques have a great potential to be applied in digital assistive tools for the blind and visually impaired persons. In this paper, algorithm for crosswalk detection and walk light recognition is proposed with the main aim to help blind person when crossing the road. The proposed algorithm is optimized to work in real-time on portable devices using standard cameras. Images captured by camera are processed while person is moving and decision about detected crosswalk is provided as an output along with the information about walk light if one is present. Crosswalk detection method is based on multiresolution morphological image processing, while the walk light recognition is performed by proposed 6-stage algorithm. The main contributions of this paper are accurate crosswalk detection with small processing time due to multiresolution processing and the recognition of the walk lights covering only small amount of pixels in image. The experiment is conducted using images from video sequences captured in realistic situations on crossings. The results show 98.3% correct crosswalk detections and 89.5% correct walk lights recognition with average processing speed of about 16 frames per second.
Learning Science Through Digital Video: Views on Watching and Creating Videos

Science.gov (United States)

Wade, P.; Courtney, A. R.

2013-12-01

In science, the use of digital video to document phenomena, experiments and demonstrations has rapidly increased during the last decade. The use of digital video for science education also has become common with the wide availability of video over the internet. However, as with using any technology as a teaching tool, some questions should be asked: What science is being learned from watching a YouTube clip of a volcanic eruption or an informational video on hydroelectric power generation? What are student preferences (e.g. multimedia versus traditional mode of delivery) with regard to their learning? This study describes 1) the efficacy of watching digital video in the science classroom to enhance student learning, 2) student preferences of instruction with regard to multimedia versus traditional delivery modes, and 3) the use of creating digital video as a project-based educational strategy to enhance learning. Undergraduate non-science majors were the primary focus group in this study. Students were asked to view video segments and respond to a survey focused on what they learned from the segments. Additionally, they were asked about their preference for instruction (e.g. text only, lecture-PowerPoint style delivery, or multimedia-video). A majority of students indicated that well-made video, accompanied with scientific explanations or demonstration of the phenomena was most useful and preferred over text-only or lecture instruction for learning scientific information while video-only delivery with little or no explanation was deemed not very useful in learning science concepts. The use of student generated video projects as learning vehicles for the creators and other class members as viewers also will be discussed.
Object recognition using deep convolutional neural networks with complete transfer and partial frozen layers

NARCIS (Netherlands)

Kruithof, M.C.; Bouma, H.; Fischer, N.M.; Schutte, K.

2016-01-01

Object recognition is important to understand the content of video and allow flexible querying in a large number of cameras, especially for security applications. Recent benchmarks show that deep convolutional neural networks are excellent approaches for object recognition. This paper describes an
Face Detection and Face Recognition in Android Mobile Applications

Directory of Open Access Journals (Sweden)

Octavian DOSPINESCU

2016-01-01

Full Text Available The quality of the smartphone’s camera enables us to capture high quality pictures at a high resolution, so we can perform different types of recognition on these images. Face detection is one of these types of recognition that is very common in our society. We use it every day on Facebook to tag friends in our pictures. It is also used in video games alongside Kinect concept, or in security to allow the access to private places only to authorized persons. These are just some examples of using facial recognition, because in modern society, detection and facial recognition tend to surround us everywhere. The aim of this article is to create an appli-cation for smartphones that can recognize human faces. The main goal of this application is to grant access to certain areas or rooms only to certain authorized persons. For example, we can speak here of hospitals or educational institutions where there are rooms where only certain employees can enter. Of course, this type of application can cover a wide range of uses, such as helping people suffering from Alzheimer's to recognize the people they loved, to fill gaps persons who can’t remember the names of their relatives or for example to automatically capture the face of our own children when they smile.
Marginalised Stacked Denoising Autoencoders for Robust Representation of Real-Time Multi-View Action Recognition

Directory of Open Access Journals (Sweden)

Feng Gu

2015-07-01

Full Text Available Multi-view action recognition has gained a great interest in video surveillance, human computer interaction, and multimedia retrieval, where multiple cameras of different types are deployed to provide a complementary field of views. Fusion of multiple camera views evidently leads to more robust decisions on both tracking multiple targets and analysing complex human activities, especially where there are occlusions. In this paper, we incorporate the marginalised stacked denoising autoencoders (mSDA algorithm to further improve the bag of words (BoWs representation in terms of robustness and usefulness for multi-view action recognition. The resulting representations are fed into three simple fusion strategies as well as a multiple kernel learning algorithm at the classification stage. Based on the internal evaluation, the codebook size of BoWs and the number of layers of mSDA may not significantly affect recognition performance. According to results on three multi-view benchmark datasets, the proposed framework improves recognition performance across all three datasets and outputs record recognition performance, beating the state-of-art algorithms in the literature. It is also capable of performing real-time action recognition at a frame rate ranging from 33 to 45, which could be further improved by using more powerful machines in future applications.
A survey on vision-based human action recognition

NARCIS (Netherlands)

Poppe, Ronald Walter

Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ... *PDF files require the free Adobe® Reader® software for viewing. This website is maintained by the ...
FaceIt: face recognition from static and live video for law enforcement

Science.gov (United States)

Atick, Joseph J.; Griffin, Paul M.; Redlich, A. N.

1997-01-01

Recent advances in image and pattern recognition technology- -especially face recognition--are leading to the development of a new generation of information systems of great value to the law enforcement community. With these systems it is now possible to pool and manage vast amounts of biometric intelligence such as face and finger print records and conduct computerized searches on them. We review one of the enabling technologies underlying these systems: the FaceIt face recognition engine; and discuss three applications that illustrate its benefits as a problem-solving technology and an efficient and cost effective investigative tool.
Pregnancy Prevention at Her Fingertips: A Text- and Mobile Video-Based Pilot Intervention to Promote Contraceptive Methods among College Women

Science.gov (United States)

Walsh-Buhi, Eric R.; Helmy, Hannah; Harsch, Kristin; Rella, Natalie; Godcharles, Cheryl; Ogunrunde, Adejoke; Lopez Castillo, Humberto

2016-01-01

Objective: This paper reports on a pilot study evaluating the feasibility and acceptability of a text- and mobile video-based intervention to educate women and men attending college about non-daily contraception, with a particular focus on long-acting reversible contraception (LARC). A secondary objective is to describe the process of intervention…

Rotation-invariant features for multi-oriented text detection in natural images.

Directory of Open Access Journals (Sweden)

Cong Yao

Full Text Available Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.
A VidEo-Based Intelligent Recognition and Decision System for the Phacoemulsification Cataract Surgery

Directory of Open Access Journals (Sweden)

Shu Tian

2015-01-01

Full Text Available The phacoemulsification surgery is one of the most advanced surgeries to treat cataract. However, the conventional surgeries are always with low automatic level of operation and over reliance on the ability of surgeons. Alternatively, one imaginative scene is to use video processing and pattern recognition technologies to automatically detect the cataract grade and intelligently control the release of the ultrasonic energy while operating. Unlike cataract grading in the diagnosis system with static images, complicated background, unexpected noise, and varied information are always introduced in dynamic videos of the surgery. Here we develop a VidEo-Based Intelligent Recognitionand Decision (VEBIRD system, which breaks new ground by providing a generic framework for automatically tracking the operation process and classifying the cataract grade in microscope videos of the phacoemulsification cataract surgery. VEBIRD comprises a robust eye (iris detector with randomized Hough transform to precisely locate the eye in the noise background, an effective probe tracker with Tracking-Learning-Detection to thereafter track the operation probe in the dynamic process, and an intelligent decider with discriminative learning to finally recognize the cataract grade in the complicated video. Experiments with a variety of real microscope videos of phacoemulsification verify VEBIRD’s effectiveness.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia NEI Home Contact Us A-Z Site Map NEI on Social Media Information in Spanish (Información en español) Website, ...
THE COMPARISON OF DESCRIPTIVE TEXT WRITING ABILITY USING YOU TUBE DOWNLOADED VIDEO AND SERIAL PICTURES AT THE STUDENTS’OF SMPN 2 METROACADEMIC YEAR 2012/2013

Directory of Open Access Journals (Sweden)

Eka Bayu Pramanca

2013-10-01

Full Text Available This research discusses about how two different techniques affect the students’ ability in descriptive text at SMP N 2 Metro. The objectives of this research are (1 to know the difference result of using YouTube Downloaded Video and Serial Pictures media toward students’ writing ability in descriptive text and (2 to know which one is more effective of students’ writing ability in descriptive text instruction between learning by using YouTube Downloaded Video and Serial Pictures media. The implemented method is quantitative research design in that both researchers use true experimental research design. In this research , experimental and control class pre-test and post test are conducted. It is carried out at the first grade of SMP N 2 Metro in academic year 2012/2013. The population in this research is 7 different classes with total number of 224 students. 2 classes of the total population are taken as the samples; VII.1 students in experimental class and VII.2 students in control class by using cluster random sampling technique. The instruments of the research are tests, treatment and post-test. The data analyzing procedure uses t-test and results the following output. The result of ttest is 3,96 and ttable is 2,06. It means that tcount > ttable with the criterion of ttest is Ha is accepted if tcount > ttable. So, there is any difference result of students’ writing ability using YouTube Downloaded Video and Serial Pictures Media. However; Youtube Downloaded Video media is more effective media than Serial Pictures media toward students’ writing ability. This research is consistent with the previous result of the studies and thus this technique is recommended to use in writing instruction especially in descriptive text in order that students may feel fun and enjoy during the learning process.
LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

Directory of Open Access Journals (Sweden)

M. Favorskaya

2015-05-01

Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.
Using features of local densities, statistics and HMM toolkit (HTK for offline Arabic handwriting text recognition

Directory of Open Access Journals (Sweden)

El Moubtahij Hicham

2017-12-01

Full Text Available This paper presents an analytical approach of an offline handwritten Arabic text recognition system. It is based on the Hidden Markov Models (HMM Toolkit (HTK without explicit segmentation. The first phase is preprocessing, where the data is introduced in the system after quality enhancements. Then, a set of characteristics (features of local densities and features statistics are extracted by using the technique of sliding windows. Subsequently, the resulting feature vectors are injected to the Hidden Markov Model Toolkit (HTK. The simple database âArabic-Numbersâ and IFN/ENIT are used to evaluate the performance of this system. Keywords: Hidden Markov Models (HMM Toolkit (HTK, Sliding windows
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing Chronic Pain and Depression ...
VideoStory: A New Multimedia Embedding for Few Example Recognition and Translation of Events

Science.gov (United States)

2014-11-07

series, and movie trailers . We observe these professional videos are typically semantically dissimilar to the event videos which we are interested in...a list of keywords from Wikipedia, which provides an extensive index of celebrity, TV series and movie names1. We exclude the videos whose...Swimming 0.520 0.489 0.691 0.764 Biking 0.324 0.307 0.420 0.561 Graduation 0.083 0.058 0.135 0.121 Birthday 0.149 0.216 0.187 0.257 Wedding reception
High-Speed Video System for Micro-Expression Detection and Recognition

Directory of Open Access Journals (Sweden)

Diana Borza

2017-12-01

Full Text Available Micro-expressions play an essential part in understanding non-verbal communication and deceit detection. They are involuntary, brief facial movements that are shown when a person is trying to conceal something. Automatic analysis of micro-expression is challenging due to their low amplitude and to their short duration (they occur as fast as 1/15 to 1/25 of a second. We propose a fully micro-expression analysis system consisting of a high-speed image acquisition setup and a software framework which can detect the frames when the micro-expressions occurred as well as determine the type of the emerged expression. The detection and classification methods use fast and simple motion descriptors based on absolute image differences. The recognition module it only involves the computation of several 2D Gaussian probabilities. The software framework was tested on two publicly available high speed micro-expression databases and the whole system was used to acquire new data. The experiments we performed show that our solution outperforms state of the art works which use more complex and computationally intensive descriptors.
Multimodal Feature Learning for Video Captioning

Directory of Open Access Journals (Sweden)

Sujin Lee

2018-01-01

Full Text Available Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD and Microsoft Research Video-to-Text (MSR-VTT, demonstrate the performance of the proposed model.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... questions Clinical Studies Publications Catalog Photos and Images Spanish Language Information Grants and Funding Extramural Research Division ... Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video ...
Video-based noncooperative iris image segmentation.

Science.gov (United States)

Du, Yingzi; Arslanturk, Emrah; Zhou, Zhi; Belcher, Craig

2011-02-01

In this paper, we propose a video-based noncooperative iris image segmentation scheme that incorporates a quality filter to quickly eliminate images without an eye, employs a coarse-to-fine segmentation scheme to improve the overall efficiency, uses a direct least squares fitting of ellipses method to model the deformed pupil and limbic boundaries, and develops a window gradient-based method to remove noise in the iris region. A remote iris acquisition system is set up to collect noncooperative iris video images. An objective method is used to quantitatively evaluate the accuracy of the segmentation results. The experimental results demonstrate the effectiveness of this method. The proposed method would make noncooperative iris recognition or iris surveillance possible.
Advanced video coding systems

CERN Document Server

Gao, Wen

2015-01-01

This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV
Associations between autistic traits and emotion recognition ability in non-clinical young adults

OpenAIRE

Lindahl, Christina

2013-01-01

This study investigated the associations between emotion recognition ability and autistic traits in a sample of non-clinical young adults. Two hundred and forty nine individuals took part in an emotion recognition test, which assessed recognition of 12 emotions portrayed by actors. Emotion portrayals were presented as short video clips, both with and without sound, and as sound only. Autistic traits were assessed using the Autism Spectrum Quotient (ASQ) questionnaire. Results showed that men ...
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... Your Arthritis Managing Chronic Pain and Depression in Arthritis Nutrition & Rheumatoid Arthritis Arthritis and Health-related Quality of Life ...
Invariant recognition drives neural representations of action sequences.

Directory of Open Access Journals (Sweden)

Andrea Tacchetti

2017-12-01

Full Text Available Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs, that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.
Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

Science.gov (United States)

Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

2016-01-01

This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…
Dynamic facial expression recognition based on geometric and texture features

Science.gov (United States)

Li, Ming; Wang, Zengfu

2018-04-01

Recently, dynamic facial expression recognition in videos has attracted growing attention. In this paper, we propose a novel dynamic facial expression recognition method by using geometric and texture features. In our system, the facial landmark movements and texture variations upon pairwise images are used to perform the dynamic facial expression recognition tasks. For one facial expression sequence, pairwise images are created between the first frame and each of its subsequent frames. Integration of both geometric and texture features further enhances the representation of the facial expressions. Finally, Support Vector Machine is used for facial expression recognition. Experiments conducted on the extended Cohn-Kanade database show that our proposed method can achieve a competitive performance with other methods.
The nature of visual self-recognition.

Science.gov (United States)

Suddendorf, Thomas; Butler, David L

2013-03-01

Visual self-recognition is often controversially cited as an indicator of self-awareness and assessed with the mirror-mark test. Great apes and humans, unlike small apes and monkeys, have repeatedly passed mirror tests, suggesting that the underlying brain processes are homologous and evolved 14-18 million years ago. However, neuroscientific, developmental, and clinical dissociations show that the medium used for self-recognition (mirror vs photograph vs video) significantly alters behavioral and brain responses, likely due to perceptual differences among the different media and prior experience. On the basis of this evidence and evolutionary considerations, we argue that the visual self-recognition skills evident in humans and great apes are a byproduct of a general capacity to collate representations, and need not index other aspects of self-awareness. Copyright © 2013 Elsevier Ltd. All rights reserved.
Psychogenic Tremor: A Video Guide to Its Distinguishing Features

Directory of Open Access Journals (Sweden)

Joseph Jankovic

2014-08-01

Full Text Available Background: Psychogenic tremor is the most common psychogenic movement disorder. It has characteristic clinical features that can help distinguish it from other tremor disorders. There is no diagnostic gold standard and the diagnosis is based primarily on clinical history and examination. Despite proposed diagnostic criteria, the diagnosis of psychogenic tremor can be challenging. While there are numerous studies evaluating psychogenic tremor in the literature, there are no publications that provide a video/visual guide that demonstrate the clinical characteristics of psychogenic tremor. Educating clinicians about psychogenic tremor will hopefully lead to earlier diagnosis and treatment. Methods: We selected videos from the database at the Parkinson's Disease Center and Movement Disorders Clinic at Baylor College of Medicine that illustrate classic findings supporting the diagnosis of psychogenic tremor.Results: We include 10 clinical vignettes with accompanying videos that highlight characteristic clinical signs of psychogenic tremor including distractibility, variability, entrainability, suggestibility, and coherence.Discussion: Psychogenic tremor should be considered in the differential diagnosis of patients presenting with tremor, particularly if it is of abrupt onset, intermittent, variable and not congruous with organic tremor. The diagnosis of psychogenic tremor, however, should not be simply based on exclusion of organic tremor, such as essential, parkinsonian, or cerebellar tremor, but on positive criteria demonstrating characteristic features. Early recognition and management are critical for good long-term outcome.

Facial expression system on video using widrow hoff

Science.gov (United States)

Jannah, M.; Zarlis, M.; Mawengkang, H.

2018-03-01

Facial expressions recognition is one of interesting research. This research contains human feeling to computer application Such as the interaction between human and computer, data compression, facial animation and facial detection from the video. The purpose of this research is to create facial expression system that captures image from the video camera. The system in this research uses Widrow-Hoff learning method in training and testing image with Adaptive Linear Neuron (ADALINE) approach. The system performance is evaluated by two parameters, detection rate and false positive rate. The system accuracy depends on good technique and face position that trained and tested.
A Multimodal Database for Affect Recognition and Implicit Tagging

NARCIS (Netherlands)

Soleymani, Mohammad; Lichtenauer, Jeroen; Pun, Thierry; Pantic, Maja

MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system
Video Quality Prediction Models Based on Video Content Dynamics for H.264 Video over UMTS Networks

Directory of Open Access Journals (Sweden)

Asiya Khan

2010-01-01

Full Text Available The aim of this paper is to present video quality prediction models for objective non-intrusive, prediction of H.264 encoded video for all content types combining parameters both in the physical and application layer over Universal Mobile Telecommunication Systems (UMTS networks. In order to characterize the Quality of Service (QoS level, a learning model based on Adaptive Neural Fuzzy Inference System (ANFIS and a second model based on non-linear regression analysis is proposed to predict the video quality in terms of the Mean Opinion Score (MOS. The objective of the paper is two-fold. First, to find the impact of QoS parameters on end-to-end video quality for H.264 encoded video. Second, to develop learning models based on ANFIS and non-linear regression analysis to predict video quality over UMTS networks by considering the impact of radio link loss models. The loss models considered are 2-state Markov models. Both the models are trained with a combination of physical and application layer parameters and validated with unseen dataset. Preliminary results show that good prediction accuracy was obtained from both the models. The work should help in the development of a reference-free video prediction model and QoS control methods for video over UMTS networks.
Threshold models of recognition and the recognition heuristic

Directory of Open Access Journals (Sweden)

Edgar Erdfelder

2011-02-01

Full Text Available According to the recognition heuristic (RH theory, decisions follow the recognition principle: Given a high validity of the recognition cue, people should prefer recognized choice options compared to unrecognized ones. Assuming that the memory strength of choice options is strongly correlated with both the choice criterion and recognition judgments, the RH is a reasonable strategy that approximates optimal decisions with a minimum of cognitive effort (Davis-Stober, Dana, and Budescu, 2010. However, theories of recognition memory are not generally compatible with this assumption. For example, some threshold models of recognition presume that recognition judgments can arise from two types of cognitive states: (1 certainty states in which judgments are almost perfectly correlated with memory strength and (2 uncertainty states in which recognition judgments reflect guessing rather than differences in memory strength. We report an experiment designed to test the prediction that the RH applies to certainty states only. Our results show that memory states rather than recognition judgments affect use of recognition information in binary decisions.
An audiovisual emotion recognition system

Science.gov (United States)

Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

2007-12-01

Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
SECRETS OF SONG VIDEO

Directory of Open Access Journals (Sweden)

Chernyshov Alexander V.

2014-04-01

Full Text Available The article focuses on the origins of the song videos as TV and Internet-genre. In addition, it considers problems of screen images creation depending on the musical form and the text of a songs in connection with relevant principles of accent and phraseological video editing and filming techniques as well as with additional frames and sound elements.
Medical students' perceptions of video-linked lectures and video-streaming

Directory of Open Access Journals (Sweden)

Karen Mattick

2010-12-01

Full Text Available Video-linked lectures allow healthcare students across multiple sites, and between university and hospital bases, to come together for the purposes of shared teaching. Recording and streaming video-linked lectures allows students to view them at a later date and provides an additional resource to support student learning. As part of a UK Higher Education Academy-funded Pathfinder project, this study explored medical students' perceptions of video-linked lectures and video-streaming, and their impact on learning. The methodology involved semi-structured interviews with 20 undergraduate medical students across four sites and five year groups. Several key themes emerged from the analysis. Students generally preferred live lectures at the home site and saw interaction between sites as a major challenge. Students reported that their attendance at live lectures was not affected by the availability of streamed lectures and tended to be influenced more by the topic and speaker than the technical arrangements. These findings will inform other educators interested in employing similar video technologies in their teaching.Keywords: video-linked lecture; video-streaming; student perceptions; decisionmaking; cross-campus teaching.
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with special ...
Door recognition in cluttered building interiors using imagery and lidar data

Directory of Open Access Journals (Sweden)

L. Díaz-Vilariño

2014-06-01

Full Text Available Building indoors reconstruction is an active research topic due to the importance of the wide range of applications to which they can be subjected, from architecture and furniture design, to movies and video games editing, or even crime scene investigation. Among the constructive elements defining the inside of a building, doors are important entities in applications like routing and navigation, and their automated recognition is advantageous e.g. in case of large multi-storey buildings with many office rooms. The inherent complexity of the automation of the recognition process is increased by the presence of clutter and occlusions, difficult to avoid in indoor scenes. In this work, we present a pipeline of techniques used for the reconstruction and interpretation of building interiors using information acquired in the form of point clouds and images. The methodology goes in depth with door detection and labelling as either opened, closed or furniture (false positive
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with ...
Tracking a Subset of Skeleton Joints: An Effective Approach towards Complex Human Activity Recognition

Directory of Open Access Journals (Sweden)

Muhammad Latif Anjum

2017-01-01

Full Text Available We present a robust algorithm for complex human activity recognition for natural human-robot interaction. The algorithm is based on tracking the position of selected joints in human skeleton. For any given activity, only a few skeleton joints are involved in performing the activity, so a subset of joints contributing the most towards the activity is selected. Our approach of tracking a subset of skeleton joints (instead of tracking the whole skeleton is computationally efficient and provides better recognition accuracy. We have developed both manual and automatic approaches for the selection of these joints. The position of the selected joints is tracked for the duration of the activity and is used to construct feature vectors for each activity. Once the feature vectors have been constructed, we use a Support Vector Machines (SVM multiclass classifier for training and testing the algorithm. The algorithm has been tested on a purposely built dataset of depth videos recorded using Kinect camera. The dataset consists of 250 videos of 10 different activities being performed by different users. Experimental results show classification accuracy of 83% when tracking all skeleton joints, 95% when using manual selection of subset joints, and 89% when using automatic selection of subset joints.
A freely-available authoring system for browser-based CALL with speech recognition

Directory of Open Access Journals (Sweden)

Myles O'Brien

2017-06-01

Full Text Available A system for authoring browser-based CALL material incorporating Google speech recognition has been developed and made freely available for download. The system provides a teacher with a simple way to set up CALL material, including an optional image, sound or video, which will elicit spoken (and/or typed answers from the user and check them against a list of specified permitted answers, giving feedback with hints when necessary. The teacher needs no HTML or Javascript expertise, just the facilities and ability to edit text files and upload to the Internet. The structure and functioning of the system are explained in detail, and some suggestions are given for practical use. Finally, some of its limitations are described.
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

KAUST Repository

Heilbron, Fabian Caba

2015-06-02

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper we introduce ActivityNet, a new largescale video benchmark for human activity understanding. Our benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. We illustrate three scenarios in which ActivityNet can be used to compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

KAUST Repository

Heilbron, Fabian Caba; Castillo, Victor; Ghanem, Bernard; Niebles, Juan Carlos

2015-01-01

In spite of many dataset efforts for human action recognition, current computer vision algorithms are still severely limited in terms of the variability and complexity of the actions that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on simple actions and movements occurring on manually trimmed videos. In this paper we introduce ActivityNet, a new largescale video benchmark for human activity understanding. Our benchmark aims at covering a wide range of complex human activities that are of interest to people in their daily living. In its current version, ActivityNet provides samples from 203 activity classes with an average of 137 untrimmed videos per class and 1.41 activity instances per video, for a total of 849 video hours. We illustrate three scenarios in which ActivityNet can be used to compare algorithms for human activity understanding: untrimmed video classification, trimmed activity classification and activity detection.
Impact of Interactive Video Communication Versus Text-Based Feedback on Teaching, Social, and Cognitive Presence in Online Learning Communities.

Science.gov (United States)

Seckman, Charlotte

A key element to online learning is the ability to create a sense of presence to improve learning outcomes. This quasi-experimental study evaluated the impact of interactive video communication versus text-based feedback and found a significant difference between the 2 groups related to teaching, social, and cognitive presence. Recommendations to enhance presence should focus on providing timely feedback, interactive learning experiences, and opportunities for students to establish relationships with peers and faculty.
Changing facial affect recognition in schizophrenia: Effects of training on brain dynamics

Directory of Open Access Journals (Sweden)

Petia Popova

2014-01-01

Full Text Available Deficits in social cognition including facial affect recognition and their detrimental effects on functional outcome are well established in schizophrenia. Structured training can have substantial effects on social cognitive measures including facial affect recognition. Elucidating training effects on cortical mechanisms involved in facial affect recognition may identify causes of dysfunctional facial affect recognition in schizophrenia and foster remediation strategies. In the present study, 57 schizophrenia patients were randomly assigned to (a computer-based facial affect training that focused on affect discrimination and working memory in 20 daily 1-hour sessions, (b similarly intense, targeted cognitive training on auditory-verbal discrimination and working memory, or (c treatment as usual. Neuromagnetic activity was measured before and after training during a dynamic facial affect recognition task (5 s videos showing human faces gradually changing from neutral to fear or to happy expressions. Effects on 10–13 Hz (alpha power during the transition from neutral to emotional expressions were assessed via MEG based on previous findings that alpha power increase is related to facial affect recognition and is smaller in schizophrenia than in healthy subjects. Targeted affect training improved overt performance on the training tasks. Moreover, alpha power increase during the dynamic facial affect recognition task was larger after affect training than after treatment-as-usual, though similar to that after targeted perceptual–cognitive training, indicating somewhat nonspecific benefits. Alpha power modulation was unrelated to general neuropsychological test performance, which improved in all groups. Results suggest that specific neural processes supporting facial affect recognition, evident in oscillatory phenomena, are modifiable. This should be considered when developing remediation strategies targeting social cognition in schizophrenia.
Video-based Chinese Input System via Fingertip Tracking

Directory of Open Access Journals (Sweden)

Chih-Chang Yu

2012-10-01

Full Text Available In this paper, we propose a system to detect and track fingertips online and recognize Mandarin Phonetic Symbol (MPS for user-friendly Chinese input purposes. Using fingertips and cameras to replace pens and touch panels as input devices could reduce the cost and improve the ease-of-use and comfort of computer-human interface. In the proposed framework, particle filters with enhanced appearance models are applied for robust fingertip tracking. Afterwards, MPS combination recognition is performed on the tracked fingertip trajectories using Hidden Markov Models. In the proposed system, the fingertips of the users could be robustly tracked. Also, the challenges of entering, leaving and virtual strokes caused by video-based fingertip input can be overcome. Experimental results have shown the feasibility and effectiveness of the proposed work.
Video and literacy: the Nigerian experience | Adeyemi | Lwati: A ...

African Journals Online (AJOL)

One major reason for this shortcoming is the myopic and pedestrian conceptualization of literacy as a cognitive skill, which entails merely being able to read and write. Nevertheless, with greater recognition and extolment of cultural diversities, sensitivity to the socio-linguistic propensities of video literacy is beginning to ...
IDENTIFICAÇÃO E RASTREAMENTO DE PESSOAS POR MEIO DE IMAGENS DIGITAIS CAPTURADAS A PARTIR DE CÂMERAS DE VIDEO

Directory of Open Access Journals (Sweden)

Lucas do Vale Manue

2016-03-01

Full Text Available The use of computational vision plays an important role for security purposes, nowadays it is largely used and, day by day, is earning new resources due to the technological progress. The majority of current security systems are based in a monitoring where the analysis is made by the user's interpretation. To improve this process, it has been proposed models of analysis and recognition of personal characteristics, like height, skin tone, facial recognition, among others, in order to provide the search for recognitions more efficient. In this work, there are used different algorithms like HOG and SVM for identification and tracking of people, color spaces RGB, HSV and YCrCb, that are combined and used for the identification and extraction of characteristics like the type and color of clothes of each person for monitoring along the video.
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...

Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
A Kinect based sign language recognition system using spatio-temporal features

Science.gov (United States)

Memiş, Abbas; Albayrak, Songül

2013-12-01

This paper presents a sign language recognition system that uses spatio-temporal features on RGB video images and depth maps for dynamic gestures of Turkish Sign Language. Proposed system uses motion differences and accumulation approach for temporal gesture analysis. Motion accumulation method, which is an effective method for temporal domain analysis of gestures, produces an accumulated motion image by combining differences of successive video frames. Then, 2D Discrete Cosine Transform (DCT) is applied to accumulated motion images and temporal domain features transformed into spatial domain. These processes are performed on both RGB images and depth maps separately. DCT coefficients that represent sign gestures are picked up via zigzag scanning and feature vectors are generated. In order to recognize sign gestures, K-Nearest Neighbor classifier with Manhattan distance is performed. Performance of the proposed sign language recognition system is evaluated on a sign database that contains 1002 isolated dynamic signs belongs to 111 words of Turkish Sign Language (TSL) in three different categories. Proposed sign language recognition system has promising success rates.
Facilitation or disengagement? Attention bias in facial affect processing after short-term violent video game exposure.

Directory of Open Access Journals (Sweden)

Yanling Liu

Full Text Available Previous research has been inconsistent on whether violent video games exert positive and/or negative effects on cognition. In particular, attentional bias in facial affect processing after violent video game exposure continues to be controversial. The aim of the present study was to investigate attentional bias in facial recognition after short term exposure to violent video games and to characterize the neural correlates of this effect. In order to accomplish this, participants were exposed to either neutral or violent video games for 25 min and then event-related potentials (ERPs were recorded during two emotional search tasks. The first search task assessed attentional facilitation, in which participants were required to identify an emotional face from a crowd of neutral faces. In contrast, the second task measured disengagement, in which participants were required to identify a neutral face from a crowd of emotional faces. Our results found a significant presence of the ERP component, N2pc, during the facilitation task; however, no differences were observed between the two video game groups. This finding does not support a link between attentional facilitation and violent video game exposure. Comparatively, during the disengagement task, N2pc responses were not observed when participants viewed happy faces following violent video game exposure; however, a weak N2pc response was observed after neutral video game exposure. These results provided only inconsistent support for the disengagement hypothesis, suggesting that participants found it difficult to separate a neutral face from a crowd of emotional faces.
Classifying Normal and Abnormal Status Based on Video Recordings of Epileptic Patients

Directory of Open Access Journals (Sweden)

Jing Li

2014-01-01

Full Text Available Based on video recordings of the movement of the patients with epilepsy, this paper proposed a human action recognition scheme to detect distinct motion patterns and to distinguish the normal status from the abnormal status of epileptic patients. The scheme first extracts local features and holistic features, which are complementary to each other. Afterwards, a support vector machine is applied to classification. Based on the experimental results, this scheme obtains a satisfactory classification result and provides a fundamental analysis towards the human-robot interaction with socially assistive robots in caring the patients with epilepsy (or other patients with brain disorders in order to protect them from injury.
A LITERATURE SURVEY ON VARIOUS ILLUMINATION NORMALIZATION TECHNIQUES FOR FACE RECOGNITION WITH FUZZY K NEAREST NEIGHBOUR CLASSIFIER

Directory of Open Access Journals (Sweden)

A. Thamizharasi

2015-05-01

Full Text Available The face recognition is popular in video surveillance, social networks and criminal identifications nowadays. The performance of face recognition would be affected by variations in illumination, pose, aging and partial occlusion of face by Wearing Hats, scarves and glasses etc. The illumination variations are still the challenging problem in face recognition. The aim is to compare the various illumination normalization techniques. The illumination normalization techniques include: Log transformations, Power Law transformations, Histogram equalization, Adaptive histogram equalization, Contrast stretching, Retinex, Multi scale Retinex, Difference of Gaussian, DCT, DCT Normalization, DWT, Gradient face, Self Quotient, Multi scale Self Quotient and Homomorphic filter. The proposed work consists of three steps. First step is to preprocess the face image with the above illumination normalization techniques; second step is to create the train and test database from the preprocessed face images and third step is to recognize the face images using Fuzzy K nearest neighbor classifier. The face recognition accuracy of all preprocessing techniques is compared using the AR face database of color images.
Psychodynamic experience enhances recognition of hidden childhood trauma.

Directory of Open Access Journals (Sweden)

David Cohen

Full Text Available BACKGROUND: Experimental psychology has only recently provided supporting evidence for Freud's and Janet's description of unconscious phenomena. Here, we aimed to assess whether specific abilities, such as personal psychodynamic experience, enhance the ability to recognize unconscious phenomena in peers - in other words, to better detect implicit knowledge related to individual self-experience. METHODOLOGY AND PRINCIPAL FINDINGS: First, we collected 14 videos from seven healthy adults who had experienced a sibling's cancer during childhood and seven matched controls. Subjects and controls were asked to give a 5-minute spontaneous free-associating speech following specific instructions created in order to activate a buffer zone between fantasy and reality. Then, 18 raters (three psychoanalysts, six medical students, three oncologists, three cognitive behavioral therapists and three individuals with the same experience of trauma were randomly shown the videos and asked to blindly classify them according to whether the speaker had a sibling with cancer using a Likert scale. Using a permutation test, we found a significant association between group and recognition score (ANOVA: p = .0006. Psychoanalysts were able to recognize, above chance levels, healthy adults who had experienced sibling cancer during childhood without explicit knowledge of this history (Power = 88%; p = .002. In contrast, medical students, oncologists, cognitive behavioral therapists and individuals who had the same history of a sibling's cancer were unable to do so. CONCLUSION: This experiment supports the view that implicit recognition of a subject's history depends on the rater's specific abilities. In the case of subjects who did have a sibling with cancer during childhood, psychoanalysts appear better able to recognize this particular history.
Deep learning architecture for recognition of abnormal activities

Science.gov (United States)

Khatrouch, Marwa; Gnouma, Mariem; Ejbali, Ridha; Zaied, Mourad

2018-04-01

The video surveillance is one of the key areas in computer vision researches. The scientific challenge in this field involves the implementation of automatic systems to obtain detailed information about individuals and groups behaviors. In particular, the detection of abnormal movements of groups or individuals requires a fine analysis of frames in the video stream. In this article, we propose a new method to detect anomalies in crowded scenes. We try to categorize the video in a supervised mode accompanied by unsupervised learning using the principle of the autoencoder. In order to construct an informative concept for the recognition of these behaviors, we use a technique of representation based on the superposition of human silhouettes. The evaluation of the UMN dataset demonstrates the effectiveness of the proposed approach.
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

Directory of Open Access Journals (Sweden)

D.V. Ivanko

2016-05-01

Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
A Fisher Kernel Approach for Multiple Instance Based Object Retrieval in Video Surveillance

Directory of Open Access Journals (Sweden)

MIRONICA, I.

2015-11-01

Full Text Available This paper presents an automated surveillance system that exploits the Fisher Kernel representation in the context of multiple-instance object retrieval task. The proposed algorithm has the main purpose of tracking a list of persons in several video sources, using only few training examples. In the first step, the Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. Then, we learn the generative probability distribution over all features extracted from a reduced set of relevant frames. The proposed approach shows significant improvements and we demonstrate that Fisher kernels are well suited for this task. We demonstrate the generality of our approach in terms of features by conducting an extensive evaluation with a broad range of keypoints features. Also, we evaluate our method on two standard video surveillance datasets attaining superior results comparing to state-of-the-art object recognition algorithms.
Automatic speech recognition used for evaluation of text-to-speech systems

Czech Academy of Sciences Publication Activity Database

Vích, Robert; Nouza, J.; Vondra, Martin

-, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
Statistical Analysis of Video Frame Size Distribution Originating from Scalable Video Codec (SVC

Directory of Open Access Journals (Sweden)

Sima Ahmadpour

2017-01-01

Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.
Fall Detection for Elderly from Partially Observed Depth-Map Video Sequences Based on View-Invariant Human Activity Representation

Directory of Open Access Journals (Sweden)

Rami Alazrai

2017-03-01

Full Text Available This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor (MPGD. Furthermore, we have developed a histogram-based representation (HBR based on the MPGD to construct a length-independent representation of the observed video subsequences. Using the constructed HBR, we formulate the fall detection problem as a posterior-maximization problem in which the posteriori probability for each observed video subsequence is estimated using a multi-class SVM (support vector machine classifier. Then, we combine the computed posteriori probabilities from all of the observed subsequences to obtain an overall class posteriori probability of the entire partially-observed depth-map video sequence. To evaluate the performance of the proposed approach, we have utilized the Kinect sensor to record a dataset of depth-map video sequences that simulates four fall-related activities of elderly people, including: walking, sitting, falling form standing and falling from sitting. Then, using the collected dataset, we have developed three evaluation scenarios based on the number of unobserved video subsequences in the testing videos, including: fully-observed video sequence scenario, single unobserved video subsequence of random lengths scenarios and two unobserved video subsequences of random lengths scenarios. Experimental results show that the proposed approach achieved an average recognition accuracy of 93 . 6 % , 77 . 6 % and 65 . 1 % , in recognizing the activities during the first, second and third evaluation scenario, respectively. These results demonstrate the feasibility of the proposed approach to detect falls from partially-observed videos.
Speech Recognition

Directory of Open Access Journals (Sweden)

Adrian Morariu

2009-01-01

Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Camera Networks The Acquisition and Analysis of Videos over Wide Areas

CERN Document Server

Roy-Chowdhury, Amit K

2012-01-01

As networks of video cameras are installed in many applications like security and surveillance, environmental monitoring, disaster response, and assisted living facilities, among others, image understanding in camera networks is becoming an important area of research and technology development. There are many challenges that need to be addressed in the process. Some of them are listed below: - Traditional computer vision challenges in tracking and recognition, robustness to pose, illumination, occlusion, clutter, recognition of objects, and activities; - Aggregating local information for wide
Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

Directory of Open Access Journals (Sweden)

Yuchou Chang

2008-02-01

Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

Directory of Open Access Journals (Sweden)

Hong Yi

2008-01-01

Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Coding visual features extracted from video sequences.

Science.gov (United States)

Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

2014-05-01

Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Development of a Kinect Software Tool to Classify Movements during Active Video Gaming.

Directory of Open Access Journals (Sweden)

Michael Rosenberg

Full Text Available While it has been established that using full body motion to play active video games results in increased levels of energy expenditure, there is little information on the classification of human movement during active video game play in relationship to fundamental movement skills. The aim of this study was to validate software utilising Kinect sensor motion capture technology to recognise fundamental movement skills (FMS, during active video game play. Two human assessors rated jumping and side-stepping and these assessments were compared to the Kinect Action Recognition Tool (KART, to establish a level of agreement and determine the number of movements completed during five minutes of active video game play, for 43 children (m = 12 years 7 months ± 1 year 6 months. During five minutes of active video game play, inter-rater reliability, when examining the two human raters, was found to be higher for the jump (r = 0.94, p < .01 than the sidestep (r = 0.87, p < .01, although both were excellent. Excellent reliability was also found between human raters and the KART system for the jump (r = 0.84, p, .01 and moderate reliability for sidestep (r = 0.6983, p < .01 during game play, demonstrating that both humans and KART had higher agreement for jumps than sidesteps in the game play condition. The results of the study provide confidence that the Kinect sensor can be used to count the number of jumps and sidestep during five minutes of active video game play with a similar level of accuracy as human raters. However, in contrast to humans, the KART system required a fraction of the time to analyse and tabulate the results.
High-speed holographic correlation system for video identification on the internet

Science.gov (United States)

Watanabe, Eriko; Ikeda, Kanami; Kodate, Kashiko

2013-12-01

Automatic video identification is important for indexing, search purposes, and removing illegal material on the Internet. By combining a high-speed correlation engine and web-scanning technology, we developed the Fast Recognition Correlation system (FReCs), a video identification system for the Internet. FReCs is an application thatsearches through a number of websites with user-generated content (UGC) and detects video content that violates copyright law. In this paper, we describe the FReCs configuration and an approach to investigating UGC websites using FReCs. The paper also illustrates the combination of FReCs with an optical correlation system, which is capable of easily replacing a digital authorization sever in FReCs with optical correlation.
Evaluation of Different Features for Face Recognition in Video

Science.gov (United States)

2014-09-01

All PROVE-IT(FRiV) project reports are listed below. 1. E. Granger, P. Radtke , and D. Gorodnichy, “Survey of academic research and prototypes for face...recognition in video”, Border Technology Division, Division Report 2014-25 (TR). 2. D. Gorodnichy, E.Granger, and P. Radtke , “Survey of commercial...Gorodnichy, E. Choy, W. Khreich, P. Radtke , J. Bergeron, and D. Bissessar, “Results from evaluation of three commercial off-the-shelf face

Fast and efficient search for MPEG-4 video using adjacent pixel intensity difference quantization histogram feature

Science.gov (United States)

Lee, Feifei; Kotani, Koji; Chen, Qiu; Ohmi, Tadahiro

2010-02-01

In this paper, a fast search algorithm for MPEG-4 video clips from video database is proposed. An adjacent pixel intensity difference quantization (APIDQ) histogram is utilized as the feature vector of VOP (video object plane), which had been reliably applied to human face recognition previously. Instead of fully decompressed video sequence, partially decoded data, namely DC sequence of the video object are extracted from the video sequence. Combined with active search, a temporal pruning algorithm, fast and robust video search can be realized. The proposed search algorithm has been evaluated by total 15 hours of video contained of TV programs such as drama, talk, news, etc. to search for given 200 MPEG-4 video clips which each length is 15 seconds. Experimental results show the proposed algorithm can detect the similar video clip in merely 80ms, and Equal Error Rate (ERR) of 2 % in drama and news categories are achieved, which are more accurately and robust than conventional fast video search algorithm.
Teaching Bioethics via the Production of Student-Generated Videos

Science.gov (United States)

Willmott, Christopher J. R.

2015-01-01

There is growing recognition that science is not conducted in a vacuum and that advances in the biosciences have ethical and social implications for the wider community. An exercise is described in which undergraduate students work in teams to produce short videos about the science and ethical dimensions of current developments in biomedicine.…
Contemplation, Subcreation, and Video Games

Directory of Open Access Journals (Sweden)

Mark J. P. Wolf

2018-04-01

Full Text Available This essay asks how religion and theological ideas might be made manifest in video games, and particularly the creation of video games as a religious activity, looking at contemplative experiences in video games, and the creation and world-building of game worlds as a form of Tolkienian subcreation, which itself leads to contemplation regarding the creation of worlds.
Psychophysiological indices of recognition memory

OpenAIRE

Heaver, Becky

2012-01-01

It has recently been found that during recognition memory tests participants’ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ‘‘Pupil Old/New Effect’’ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participants’ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a varie...
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... Eye Disease Dilated Eye Exam Dry Eye For Kids Glaucoma Healthy Vision Tips Leber Congenital Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded ...
2nd International Symposium on Signal Processing and Intelligent Recognition Systems

CERN Document Server

Bandyopadhyay, Sanghamitra; Krishnan, Sri; Li, Kuan-Ching; Mosin, Sergey; Ma, Maode

2016-01-01

This Edited Volume contains a selection of refereed and revised papers originally presented at the second International Symposium on Signal Processing and Intelligent Recognition Systems (SIRS-2015), December 16-19, 2015, Trivandrum, India. The program committee received 175 submissions. Each paper was peer reviewed by at least three or more independent referees of the program committee and the 59 papers were finally selected. The papers offer stimulating insights into biometrics, digital watermarking, recognition systems, image and video processing, signal and speech processing, pattern recognition, machine learning and knowledge-based systems. The book is directed to the researchers and scientists engaged in various field of signal processing and related areas. .
Hybrid gesture recognition system for short-range use

Science.gov (United States)

Minagawa, Akihiro; Fan, Wei; Katsuyama, Yutaka; Takebe, Hiroaki; Ozawa, Noriaki; Hotta, Yoshinobu; Sun, Jun

2012-03-01

In recent years, various gesture recognition systems have been studied for use in television and video games[1]. In such systems, motion areas ranging from 1 to 3 meters deep have been evaluated[2]. However, with the burgeoning popularity of small mobile displays, gesture recognition systems capable of operating at much shorter ranges have become necessary. The problems related to such systems are exacerbated by the fact that the camera's field of view is unknown to the user during operation, which imposes several restrictions on his/her actions. To overcome the restrictions generated from such mobile camera devices, and to create a more flexible gesture recognition interface, we propose a hybrid hand gesture system, in which two types of gesture recognition modules are prepared and with which the most appropriate recognition module is selected by a dedicated switching module. The two recognition modules of this system are shape analysis using a boosting approach (detection-based approach)[3] and motion analysis using image frame differences (motion-based approach)(for example, see[4]). We evaluated this system using sample users and classified the resulting errors into three categories: errors that depend on the recognition module, errors caused by incorrect module identification, and errors resulting from user actions. In this paper, we show the results of our investigations and explain the problems related to short-range gesture recognition systems.
The effects of video self-modeling on the decoding skills of children at risk for reading disabilities

OpenAIRE

Ayala, SM; O'Connor, R

2013-01-01

Ten first grade students who had responded poorly to a Tier 2 reading intervention in a response to intervention (RTI) model received an intervention of video self-modeling to improve decoding skills and sight word recognition. Students were video recorded blending and segmenting decodable words and reading sight words. Videos were edited and viewed a minimum of four times per week. Data were collected twice per week using curriculum-based measures. A single subject multiple baseline across p...
Deep Spatial-Temporal Joint Feature Representation for Video Object Detection

Directory of Open Access Journals (Sweden)

Baojun Zhao

2018-03-01

Full Text Available With the development of deep neural networks, many object detection frameworks have shown great success in the fields of smart surveillance, self-driving cars, and facial recognition. However, the data sources are usually videos, and the object detection frameworks are mostly established on still images and only use the spatial information, which means that the feature consistency cannot be ensured because the training procedure loses temporal information. To address these problems, we propose a single, fully-convolutional neural network-based object detection framework that involves temporal information by using Siamese networks. In the training procedure, first, the prediction network combines the multiscale feature map to handle objects of various sizes. Second, we introduce a correlation loss by using the Siamese network, which provides neighboring frame features. This correlation loss represents object co-occurrences across time to aid the consistent feature generation. Since the correlation loss should use the information of the track ID and detection label, our video object detection network has been evaluated on the large-scale ImageNet VID dataset where it achieves a 69.5% mean average precision (mAP.
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... will allow you to take a more active role in your care. The information in these videos ... Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for ...
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 ... Patient to an Adult Rheumatologist Drug Information for Patients Arthritis Drug Information Sheets Benefits and Risks of ...
Deep Convolutional Neural Networks and Support Vector Machines for Gender Recognition

NARCIS (Netherlands)

van de Wolfshaar, Jos; Karaaba, Mahir; Wiering, Marco

2015-01-01

Social behavior and many cultural etiquettes are influenced by gender. There are numerous potential applications of automatic face gender recognition such as human-computer interaction systems, content based image search, video surveillance and more. The immense increase of images that are uploaded
Performance Analysis of Video Transmission Using Sequential Distortion Minimization Method for Digital Video Broadcasting Terrestrial

Directory of Open Access Journals (Sweden)

Novita Astin

2016-12-01

Full Text Available This paper presents about the transmission of Digital Video Broadcasting system with streaming video resolution 640x480 on different IQ rate and modulation. In the video transmission, distortion often occurs, so the received video has bad quality. Key frames selection algorithm is flexibel on a change of video, but on these methods, the temporal information of a video sequence is omitted. To minimize distortion between the original video and received video, we aimed at adding methodology using sequential distortion minimization algorithm. Its aim was to create a new video, better than original video without significant loss of content between the original video and received video, fixed sequentially. The reliability of video transmission was observed based on a constellation diagram, with the best result on IQ rate 2 Mhz and modulation 8 QAM. The best video transmission was also investigated using SEDIM (Sequential Distortion Minimization Method and without SEDIM. The experimental result showed that the PSNR (Peak Signal to Noise Ratio average of video transmission using SEDIM was an increase from 19,855 dB to 48,386 dB and SSIM (Structural Similarity average increase 10,49%. The experimental results and comparison of proposed method obtained a good performance. USRP board was used as RF front-end on 2,2 GHz.
Subjective Video Quality Assessment in H.264/AVC Video Coding Standard

Directory of Open Access Journals (Sweden)

Z. Miličević

2012-11-01

Full Text Available This paper seeks to provide an approach for subjective video quality assessment in the H.264/AVC standard. For this purpose a special software program for the subjective assessment of quality of all the tested video sequences is developed. It was developed in accordance with recommendation ITU-T P.910, since it is suitable for the testing of multimedia applications. The obtained results show that in the proposed selective intra prediction and optimized inter prediction algorithm there is a small difference in picture quality (signal-to-noise ratio between decoded original and modified video sequences.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract Convergence ... is maintained by the NEI Office of Science Communications, Public Liaison, and Education. Technical questions about this ...
Communicative Signals Promote Object Recognition Memory and Modulate the Right Posterior STS.

Science.gov (United States)

Redcay, Elizabeth; Ludlum, Ruth S; Velnoskey, Kayla R; Kanwal, Simren

2016-01-01

Detection of communicative signals is thought to facilitate knowledge acquisition early in life, but less is known about the role these signals play in adult learning or about the brain systems supporting sensitivity to communicative intent. The current study examined how ostensive gaze cues and communicative actions affect adult recognition memory and modulate neural activity as measured by fMRI. For both the behavioral and fMRI experiments, participants viewed a series of videos of an actress acting on one of two objects in front of her. Communicative context in the videos was manipulated in a 2 × 2 design in which the actress either had direct gaze (Gaze) or wore a visor (NoGaze) and either pointed at (Point) or reached for (Reach) one of the objects (target) in front of her. Participants then completed a recognition memory task with old (target and nontarget) objects and novel objects. Recognition memory for target objects in the Gaze conditions was greater than NoGaze, but no effects of gesture type were seen. Similarly, the fMRI video-viewing task revealed a significant effect of Gaze within right posterior STS (pSTS), but no significant effects of Gesture. Furthermore, pSTS sensitivity to Gaze conditions was related to greater memory for objects viewed in Gaze, as compared with NoGaze, conditions. Taken together, these results demonstrate that the ostensive, communicative signal of direct gaze preceding an object-directed action enhances recognition memory for attended items and modulates the pSTS response to object-directed actions. Thus, establishment of a communicative context through ostensive signals remains an important component of learning and memory into adulthood, and the pSTS may play a role in facilitating this type of social learning.
Distributed video data fusion and mining

Science.gov (United States)

Chang, Edward Y.; Wang, Yuan-Fang; Rodoplu, Volkan

2004-09-01

This paper presents an event sensing paradigm for intelligent event-analysis in a wireless, ad hoc, multi-camera, video surveillance system. In particilar, we present statistical methods that we have developed to support three aspects of event sensing: 1) energy-efficient, resource-conserving, and robust sensor data fusion and analysis, 2) intelligent event modeling and recognition, and 3) rapid deployment, dynamic configuration, and continuous operation of the camera networks. We outline our preliminary results, and discuss future directions that research might take.
Celiac Family Health Education Video Series

Medline Plus

Full Text Available ... Videos Experiencing Celiac Disease What is Celiac Disease Diet Information At ... Us Celiac Disease Program | Videos Boston Children's Hospital will teach you and your family about a ...
Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual Modalities

DEFF Research Database (Denmark)

Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.

2018-01-01

, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data......PAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance...... recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate....
Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual Modalities

DEFF Research Database (Denmark)

Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.

2018-01-01

, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data...... recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate....

Sistema audiovisual para reconocimiento de comandos Audiovisual system for recognition of commands

Directory of Open Access Journals (Sweden)

Alexander Ceballos

2011-08-01

Full Text Available Se presenta el desarrollo de un sistema automático de reconocimiento audiovisual del habla enfocado en el reconocimiento de comandos. La representación del audio se realizó mediante los coeficientes cepstrales de Mel y las primeras dos derivadas temporales. Para la caracterización del vídeo se hizo seguimiento automático de características visuales de alto nivel a través de toda la secuencia. Para la inicialización automática del algoritmo se emplearon transformaciones de color y contornos activos con información de flujo del vector gradiente ("GVF snakes" sobre la región labial, mientras que para el seguimiento se usaron medidas de similitud entre vecindarios y restricciones morfológicas definidas en el estándar MPEG-4. Inicialmente, se presenta el diseño del sistema de reconocimiento automático del habla, empleando únicamente información de audio (ASR, mediante Modelos Ocultos de Markov (HMMs y un enfoque de palabra aislada; posteriormente, se muestra el diseño de los sistemas empleando únicamente características de vídeo (VSR, y empleando características de audio y vídeo combinadas (AVSR. Al final se comparan los resultados de los tres sistemas para una base de datos propia en español y francés, y se muestra la influencia del ruido acústico, mostrando que el sistema de AVSR es más robusto que ASR y VSR.We present the development of an automatic audiovisual speech recognition system focused on the recognition of commands. Signal audio representation was done using Mel cepstral coefficients and their first and second order time derivatives. In order to characterize the video signal, a set of high-level visual features was tracked throughout the sequences. Automatic initialization of the algorithm was performed using color transformations and active contour models based on Gradient Vector Flow (GVF Snakes on the lip region, whereas visual tracking used similarity measures across neighborhoods and morphological
Problem with multi-video format M-learning applications

CSIR Research Space (South Africa)

Adeyeye, MO

2014-01-01

Full Text Available in conjunction with the technical aspects of video display in browsers, when varying media formats are used. The <video> tag used in this work renders videos from two sources with different MIME types. Feeds from the video sources, namely YouTube and UCT...
Eye movement analysis for activity recognition using electrooculography.

Science.gov (United States)

Bulling, Andreas; Ward, Jamie A; Gellersen, Hans; Tröster, Gerhard

2011-04-01

In this work, we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data were recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals-saccades, fixations, and blinks-and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance (mRMR) feature selection. We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1 percent and recall of 70.5 percent over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.
Psychophysiological Assessment Of Fear Experience In Response To Sound During Computer Video Gameplay

DEFF Research Database (Denmark)

Garner, Tom Alexander; Grimshaw, Mark

2013-01-01

The potential value of a looping biometric feedback system as a key component of adaptive computer video games is significant. Psychophysiological measures are essential to the development of an automated emotion recognition program, capable of interpreting physiological data into models of affect...... and systematically altering the game environment in response. This article presents empirical data the analysis of which advocates electrodermal activity and electromyography as suitable physiological measures to work effectively within a computer video game-based biometric feedback loop, within which sound...
Recent advances in intelligent image search and video retrieval

CERN Document Server

2017-01-01

This book initially reviews the major feature representation and extraction methods and effective learning and recognition approaches, which have broad applications in the context of intelligent image search and video retrieval. It subsequently presents novel methods, such as improved soft assignment coding, Inheritable Color Space (InCS) and the Generalized InCS framework, the sparse kernel manifold learner method, the efficient Support Vector Machine (eSVM), and the Scale-Invariant Feature Transform (SIFT) features in multiple color spaces. Lastly, the book presents clothing analysis for subject identification and retrieval, and performance evaluation methods of video analytics for traffic monitoring. Digital images and videos are proliferating at an amazing speed in the fields of science, engineering and technology, media and entertainment. With the huge accumulation of such data, keyword searches and manual annotation schemes may no longer be able to meet the practical demand for retrieving relevant conte...
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... videos from Veterans Health Administration Veterans Crisis Line -- After the Call see more videos from Veterans Health ... videos from Veterans Health Administration Talking About It Matters see more videos from Veterans Health Administration Stand ...
Basic and complex emotion recognition in children with autism: cross-cultural findings.

Science.gov (United States)

Fridenson-Hayo, Shimrit; Berggren, Steve; Lassalle, Amandine; Tal, Shahar; Pigat, Delia; Bölte, Sven; Baron-Cohen, Simon; Golan, Ofer

2016-01-01

Children with autism spectrum conditions (ASC) have emotion recognition deficits when tested in different expression modalities (face, voice, body). However, these findings usually focus on basic emotions, using one or two expression modalities. In addition, cultural similarities and differences in emotion recognition patterns in children with ASC have not been explored before. The current study examined the similarities and differences in the recognition of basic and complex emotions by children with ASC and typically developing (TD) controls across three cultures: Israel, Britain, and Sweden. Fifty-five children with high-functioning ASC, aged 5-9, were compared to 58 TD children. On each site, groups were matched on age, sex, and IQ. Children were tested using four tasks, examining recognition of basic and complex emotions from voice recordings, videos of facial and bodily expressions, and emotional video scenarios including all modalities in context. Compared to their TD peers, children with ASC showed emotion recognition deficits in both basic and complex emotions on all three modalities and their integration in context. Complex emotions were harder to recognize, compared to basic emotions for the entire sample. Cross-cultural agreement was found for all major findings, with minor deviations on the face and body tasks. Our findings highlight the multimodal nature of ER deficits in ASC, which exist for basic as well as complex emotions and are relatively stable cross-culturally. Cross-cultural research has the potential to reveal both autism-specific universal deficits and the role that specific cultures play in the way empathy operates in different countries.
A multi-environment dataset for activity of daily living recognition in video streams.

Science.gov (United States)

Borreo, Alessandro; Onofri, Leonardo; Soda, Paolo

2015-08-01

Public datasets played a key role in the increasing level of interest that vision-based human action recognition has attracted in last years. While the production of such datasets has been influenced by the variability introduced by various actors performing the actions, the different modalities of interactions with the environment introduced by the variation of the scenes around the actors has been scarcely took into account. As a consequence, public datasets do not provide a proper test-bed for recognition algorithms that aim at achieving high accuracy, irrespective of the environment where actions are performed. This is all the more so, when systems are designed to recognize activities of daily living (ADL), which are characterized by a high level of human-environment interaction. For that reason, we present in this manuscript the MEA dataset, a new multi-environment ADL dataset, which permitted us to show how the change of scenario can affect the performances of state-of-the-art approaches for action recognition.
Video content analysis on body-worn cameras for retrospective investigation

Science.gov (United States)

Bouma, Henri; Baan, Jan; ter Haar, Frank B.; Eendebak, Pieter T.; den Hollander, Richard J. M.; Burghouts, Gertjan J.; Wijn, Remco; van den Broek, Sebastiaan P.; van Rest, Jeroen H. C.

2015-10-01

In the security domain, cameras are important to assess critical situations. Apart from fixed surveillance cameras we observe an increasing number of sensors on mobile platforms, such as drones, vehicles and persons. Mobile cameras allow rapid and local deployment, enabling many novel applications and effects, such as the reduction of violence between police and citizens. However, the increased use of bodycams also creates potential challenges. For example: how can end-users extract information from the abundance of video, how can the information be presented, and how can an officer retrieve information efficiently? Nevertheless, such video gives the opportunity to stimulate the professionals' memory, and support complete and accurate reporting. In this paper, we show how video content analysis (VCA) can address these challenges and seize these opportunities. To this end, we focus on methods for creating a complete summary of the video, which allows quick retrieval of relevant fragments. The content analysis for summarization consists of several components, such as stabilization, scene selection, motion estimation, localization, pedestrian tracking and action recognition in the video from a bodycam. The different components and visual representations of summaries are presented for retrospective investigation.
A novel visual saliency detection method for infrared video sequences

Science.gov (United States)

Wang, Xin; Zhang, Yuzhen; Ning, Chen

2017-12-01

Infrared video applications such as target detection and recognition, moving target tracking, and so forth can benefit a lot from visual saliency detection, which is essentially a method to automatically localize the ;important; content in videos. In this paper, a novel visual saliency detection method for infrared video sequences is proposed. Specifically, for infrared video saliency detection, both the spatial saliency and temporal saliency are considered. For spatial saliency, we adopt a mutual consistency-guided spatial cues combination-based method to capture the regions with obvious luminance contrast and contour features. For temporal saliency, a multi-frame symmetric difference approach is proposed to discriminate salient moving regions of interest from background motions. Then, the spatial saliency and temporal saliency are combined to compute the spatiotemporal saliency using an adaptive fusion strategy. Besides, to highlight the spatiotemporal salient regions uniformly, a multi-scale fusion approach is embedded into the spatiotemporal saliency model. Finally, a Gestalt theory-inspired optimization algorithm is designed to further improve the reliability of the final saliency map. Experimental results demonstrate that our method outperforms many state-of-the-art saliency detection approaches for infrared videos under various backgrounds.
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... Patients from Johns Hopkins Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 2010 E.S.C.A.P.E. Study Patient Update Transitioning the JRA ...
Human action recognition using trajectory-based representation

Directory of Open Access Journals (Sweden)

Haiam A. Abdul-Azim

2015-07-01

Full Text Available Recognizing human actions in video sequences has been a challenging problem in the last few years due to its real-world applications. A lot of action representation approaches have been proposed to improve the action recognition performance. Despite the popularity of local features-based approaches together with “Bag-of-Words” model for action representation, it fails to capture adequate spatial or temporal relationships. In an attempt to overcome this problem, a trajectory-based local representation approaches have been proposed to capture the temporal information. This paper introduces an improvement of trajectory-based human action recognition approaches to capture discriminative temporal relationships. In our approach, we extract trajectories by tracking the detected spatio-temporal interest points named “cuboid features” with matching its SIFT descriptors over the consecutive frames. We, also, propose a linking and exploring method to obtain efficient trajectories for motion representation in realistic conditions. Then the volumes around the trajectories’ points are described to represent human actions based on the Bag-of-Words (BOW model. Finally, a support vector machine is used to classify human actions. The effectiveness of the proposed approach was evaluated on three popular datasets (KTH, Weizmann and UCF sports. Experimental results showed that the proposed approach yields considerable performance improvement over the state-of-the-art approaches.
A Fuzzy Aproach For Facial Emotion Recognition

Science.gov (United States)

Gîlcă, Gheorghe; Bîzdoacă, Nicu-George

2015-09-01

This article deals with an emotion recognition system based on the fuzzy sets. Human faces are detected in images with the Viola - Jones algorithm and for its tracking in video sequences we used the Camshift algorithm. The detected human faces are transferred to the decisional fuzzy system, which is based on the variable fuzzyfication measurements of the face: eyebrow, eyelid and mouth. The system can easily determine the emotional state of a person.
Video Classification and Adaptive QoP/QoS Control for Multiresolution Video Applications on IPTV

Directory of Open Access Journals (Sweden)

Huang Shyh-Fang

2012-01-01

Full Text Available With the development of heterogeneous networks and video coding standards, multiresolution video applications over networks become important. It is critical to ensure the service quality of the network for time-sensitive video services. Worldwide Interoperability for Microwave Access (WIMAX is a good candidate for delivering video signals because through WIMAX the delivery quality based on the quality-of-service (QoS setting can be guaranteed. The selection of suitable QoS parameters is, however, not trivial for service users. Instead, what a video service user really concerns with is the video quality of presentation (QoP which includes the video resolution, the fidelity, and the frame rate. In this paper, we present a quality control mechanism in multiresolution video coding structures over WIMAX networks and also investigate the relationship between QoP and QoS in end-to-end connections. Consequently, the video presentation quality can be simply mapped to the network requirements by a mapping table, and then the end-to-end QoS is achieved. We performed experiments with multiresolution MPEG coding over WIMAX networks. In addition to the QoP parameters, the video characteristics, such as, the picture activity and the video mobility, also affect the QoS significantly.
Stimulation over primary motor cortex during action observation impairs effector recognition.

Science.gov (United States)

Naish, Katherine R; Barnes, Brittany; Obhi, Sukhvinder S

2016-04-01

Recent work suggests that motor cortical processing during action observation plays a role in later recognition of the object involved in the action. Here, we investigated whether recognition of the effector making an action is also impaired when transcranial magnetic stimulation (TMS) - thought to interfere with normal cortical activity - is applied over the primary motor cortex (M1) during action observation. In two experiments, single-pulse TMS was delivered over the hand area of M1 while participants watched short clips of hand actions. Participants were then asked whether an image (experiment 1) or a video (experiment 2) of a hand presented later in the trial was the same or different to the hand in the preceding video. In Experiment 1, we found that participants' ability to recognise static images of hands was significantly impaired when TMS was delivered over M1 during action observation, compared to when no TMS was delivered, or when stimulation was applied over the vertex. Conversely, stimulation over M1 did not affect recognition of dot configurations, or recognition of hands that were previously presented as static images (rather than action movie clips) with no object. In Experiment 2, we found that effector recognition was impaired when stimulation was applied part way through (300ms) and at the end (500ms) of the action observation period, indicating that 200ms of action-viewing following stimulation was not long enough to form a new representation that could be used for later recognition. The findings of both experiments suggest that interfering with cortical motor activity during action observation impairs subsequent recognition of the effector involved in the action, which complements previous findings of motor system involvement in object memory. This work provides some of the first evidence that motor processing during action observation is involved in forming representations of the effector that are useful beyond the action observation period
Real-time unmanned aircraft systems surveillance video mosaicking using GPU

Science.gov (United States)

Camargo, Aldo; Anderson, Kyle; Wang, Yi; Schultz, Richard R.; Fevig, Ronald A.

2010-04-01

Digital video mosaicking from Unmanned Aircraft Systems (UAS) is being used for many military and civilian applications, including surveillance, target recognition, border protection, forest fire monitoring, traffic control on highways, monitoring of transmission lines, among others. Additionally, NASA is using digital video mosaicking to explore the moon and planets such as Mars. In order to compute a "good" mosaic from video captured by a UAS, the algorithm must deal with motion blur, frame-to-frame jitter associated with an imperfectly stabilized platform, perspective changes as the camera tilts in flight, as well as a number of other factors. The most suitable algorithms use SIFT (Scale-Invariant Feature Transform) to detect the features consistent between video frames. Utilizing these features, the next step is to estimate the homography between two consecutives video frames, perform warping to properly register the image data, and finally blend the video frames resulting in a seamless video mosaick. All this processing takes a great deal of resources of resources from the CPU, so it is almost impossible to compute a real time video mosaic on a single processor. Modern graphics processing units (GPUs) offer computational performance that far exceeds current CPU technology, allowing for real-time operation. This paper presents the development of a GPU-accelerated digital video mosaicking implementation and compares it with CPU performance. Our tests are based on two sets of real video captured by a small UAS aircraft; one video comes from Infrared (IR) and Electro-Optical (EO) cameras. Our results show that we can obtain a speed-up of more than 50 times using GPU technology, so real-time operation at a video capture of 30 frames per second is feasible.
THE DETERMINATION OF THE SHARPNESS DEPTH BORDERS AND CORRESPONDING PHOTOGRAPHY AND VIDEO RECORDING PARAMETERS FOR CONTEMPORARY VIDEO TECHNOLOGY

Directory of Open Access Journals (Sweden)

E. G. Zaytseva

2011-01-01

Full Text Available The method of determination of the sharpness depth borders was improved for contemporary video technology. The computer programme for determination of corresponding video recording parameters was created.
Super Normal Vector for Human Activity Recognition with Depth Cameras.

Science.gov (United States)

Yang, Xiaodong; Tian, YingLi

2017-05-01

The advent of cost-effectiveness and easy-operation depth cameras has facilitated a variety of visual recognition tasks including human activity recognition. This paper presents a novel framework for recognizing human activities from video sequences captured by depth cameras. We extend the surface normal to polynormal by assembling local neighboring hypersurface normals from a depth sequence to jointly characterize local motion and shape information. We then propose a general scheme of super normal vector (SNV) to aggregate the low-level polynormals into a discriminative representation, which can be viewed as a simplified version of the Fisher kernel representation. In order to globally capture the spatial layout and temporal order, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time cells. In the extensive experiments, the proposed approach achieves superior performance to the state-of-the-art methods on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.
Digital video recording and archiving in ophthalmic surgery

Directory of Open Access Journals (Sweden)

Raju Biju

2006-01-01

Full Text Available Currently most ophthalmic operating rooms are equipped with an analog video recording system [analog Charge Couple Device camera for video grabbing and a Video Cassette Recorder for recording]. We discuss the various advantages of a digital video capture device, its archiving capabilities and our experience during the transition from analog to digital video recording and archiving. The basic terminology and concepts related to analog and digital video, along with the choice of hardware, software and formats for archiving are discussed.
False recognition of facial expressions of emotion: causes and implications.

Science.gov (United States)

Fernández-Dols, José-Miguel; Carrera, Pilar; Barchard, Kimberly A; Gacitua, Marta

2008-08-01

This article examines the importance of semantic processes in the recognition of emotional expressions, through a series of three studies on false recognition. The first study found a high frequency of false recognition of prototypical expressions of emotion when participants viewed slides and video clips of nonprototypical fearful and happy expressions. The second study tested whether semantic processes caused false recognition. The authors found that participants made significantly higher error rates when asked to detect expressions that corresponded to semantic labels than when asked to detect visual stimuli. Finally, given that previous research reported that false memories are less prevalent in younger children, the third study tested whether false recognition of prototypical expressions increased with age. The authors found that 67% of eight- to nine-year-old children reported nonpresent prototypical expressions of fear in a fearful context, but only 40% of 6- to 7-year-old children did so. Taken together, these three studies demonstrate the importance of semantic processes in the detection and categorization of prototypical emotional expressions.

ABOUT SOUNDS IN VIDEO GAMES

Directory of Open Access Journals (Sweden)

Denikin Anton A.

2012-12-01

Full Text Available The article considers the aesthetical and practical possibilities for sounds (sound design in video games and interactive applications. Outlines the key features of the game sound, such as simulation, representativeness, interactivity, immersion, randomization, and audio-visuality. The author defines the basic terminology in study of game audio, as well as identifies significant aesthetic differences between film sounds and sounds in video game projects. It is an attempt to determine the techniques of art analysis for the approaches in study of video games including aesthetics of their sounds. The article offers a range of research methods, considering the video game scoring as a contemporary creative practice.
Word Recognition in Auditory Cortex

Science.gov (United States)

DeWitt, Iain D. J.

2013-01-01

Although spoken word recognition is more fundamental to human communication than text recognition, knowledge of word-processing in auditory cortex is comparatively impoverished. This dissertation synthesizes current models of auditory cortex, models of cortical pattern recognition, models of single-word reading, results in phonetics and results in…
Motor cortical processing is causally involved in object recognition.

Science.gov (United States)

Decloe, Rebecca; Obhi, Sukhvinder S

2013-12-14

Motor activity during vicarious experience of actions is a widely reported and studied phenomenon, and motor system activity also accompanies observation of graspable objects in the absence of any actions. Such motor activity is thought to reflect simulation of the observed action, or preparation to interact with the object, respectively. Here, in an initial exploratory study, we ask whether motor activity during observation of object directed actions is involved in processes related to recognition of the object after initial exposure. Single pulse Transcranial Magnetic Stimulation (TMS) was applied over the thumb representation of the motor cortex, or over the vertex, during observation of a model thumb typing on a cell-phone, and performance on a phone recognition task at the end of the trial was assessed. Disrupting motor processing over the thumb representation 100 ms after the onset of the typing video impaired the ability to recognize the phone in the recognition test, whereas there was no such effect for TMS applied over the vertex and no TMS trials. Furthermore, this effect only manifested for videos observed from the first person perspective. In an additional control condition, there was no evidence for any effects of TMS to the thumb representation or vertex when observing and recognizing non-action related shape stimuli. Overall, these data provide evidence that motor cortical processing during observation of object-directed actions from a first person perspective is causally linked to the formation of enduring representations of objects-of-action.
DEVELOPING VISUAL NOVEL GAME WITH SPEECH-RECOGNITION INTERACTIVITY TO ENHANCE STUDENTS’ MASTERY ON ENGLISH EXPRESSIONS

Directory of Open Access Journals (Sweden)

Elizabeth Anggraeni Amalo

2017-11-01

Full Text Available The teaching of English-expressions has always been done through conversation samples in form of written texts, audio recordings, and videos. In the meantime, the development of computer-aided learning technology has made autonomous language learning possible. Game, as one of computer-aided learning technology products, can serve as a medium to provide educational contents like that of language teaching and learning. Visual Novel is considered as a conversational game that is suitable to be combined with English-expressions material. Unlike the other click-based interaction Visual Novel Games, the visual novel game in this research implements speech recognition as the interaction trigger. Hence, this paper aims at elaborating how visual novel games are utilized to deliver English-expressions with speech recognition command for the interaction. This research used Research and Development (R&D method with Experimental design through control and experimental groups to measure its effectiveness in enhancing students’ English-expressions mastery. ANOVA was utilized to prove the significant differences between the control and experimental groups. It is expected that the result of this development and experiment can devote benefits to the English teaching and learning, especially on English-expressions.
ACOUSTIC SPEECH RECOGNITION FOR MARATHI LANGUAGE USING SPHINX

Directory of Open Access Journals (Sweden)

Aman Ankit

2016-09-01

Full Text Available Speech recognition or speech to text processing, is a process of recognizing human speech by the computer and converting into text. In speech recognition, transcripts are created by taking recordings of speech as audio and their text transcriptions. Speech based applications which include Natural Language Processing (NLP techniques are popular and an active area of research. Input to such applications is in natural language and output is obtained in natural language. Speech recognition mostly revolves around three approaches namely Acoustic phonetic approach, Pattern recognition approach and Artificial intelligence approach. Creation of acoustic model requires a large database of speech and training algorithms. The output of an ASR system is recognition and translation of spoken language into text by computers and computerized devices. ASR today finds enormous application in tasks that require human machine interfaces like, voice dialing, and etc. Our key contribution in this paper is to create corpora for Marathi language and explore the use of Sphinx engine for automatic speech recognition
Video Game Accessibility: A Legal Approach

Directory of Open Access Journals (Sweden)

George Powers

2015-02-01

Full Text Available Video game accessibility may not seem of significance to some, and it may sound trivial to anyone who does not play video games. This assumption is false. With the digitalization of our culture, video games are an ever increasing part of our life. They contribute to peer to peer interactions, education, music and the arts. A video game can be created by hundreds of musicians and artists, and they can have production budgets that exceed modern blockbuster films. Inaccessible video games are analogous to movie theaters without closed captioning or accessible facilities. The movement to have accessible video games is small, unorganized and misdirected. Just like the other battles to make society accessible were accomplished through legislation and law, the battle for video game accessibility must be focused toward the law and not the market.
Multimodal Approach for Automatic Emotion Recognition Applied to the Tension Levels Study in TV Newscasts

Directory of Open Access Journals (Sweden)

Moisés Henrique Ramos Pereira

2015-12-01

Full Text Available This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance with the distribution of audiovisual indicators extracted over a TV newscast, demonstrating the potential of the approach to support the TV journalistic discourse analysis.This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance
Possibility of object recognition using Altera's model based design approach

International Nuclear Information System (INIS)

Tickle, A J; Harvey, P K; Smith, J S; Wu, F

2009-01-01

Object recognition is an image processing task of finding a given object in a selected image or video sequence. Object recognition can be divided into two areas: one of these is decision-theoretic and deals with patterns described by quantitative descriptors, for example such as length, area, shape and texture. With this Graphical User Interface Circuitry (GUIC) methodology employed here being relatively new for object recognition systems, the aim of this work is to identify if the developed circuitry can detect certain shapes or strings within the target image. A much smaller reference image feeds the preset data for identification, tests are conducted for both binary and greyscale and the additional mathematical morphology to highlight the area within the target image with the object(s) are located is also presented. This then provides proof that basic recognition methods are valid and would allow the progression to developing decision-theoretical and learning based approaches using GUICs for use in multidisciplinary tasks.
Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition.

Science.gov (United States)

Peng, Min; Wang, Chongyang; Chen, Tong; Liu, Guangyuan; Fu, Xiaolan

2017-01-01

Facial micro-expression is a brief involuntary facial movement and can reveal the genuine emotion that people try to conceal. Traditional methods of spontaneous micro-expression recognition rely excessively on sophisticated hand-crafted feature design and the recognition rate is not high enough for its practical application. In this paper, we proposed a Dual Temporal Scale Convolutional Neural Network (DTSCNN) for spontaneous micro-expressions recognition. The DTSCNN is a two-stream network. Different of stream of DTSCNN is used to adapt to different frame rate of micro-expression video clips. Each stream of DSTCNN consists of independent shallow network for avoiding the overfitting problem. Meanwhile, we fed the networks with optical-flow sequences to ensure that the shallow networks can further acquire higher-level features. Experimental results on spontaneous micro-expression databases (CASME I/II) showed that our method can achieve a recognition rate almost 10% higher than what some state-of-the-art method can achieve.
Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition

Science.gov (United States)

Peng, Min; Wang, Chongyang; Chen, Tong; Liu, Guangyuan; Fu, Xiaolan

2017-01-01

Facial micro-expression is a brief involuntary facial movement and can reveal the genuine emotion that people try to conceal. Traditional methods of spontaneous micro-expression recognition rely excessively on sophisticated hand-crafted feature design and the recognition rate is not high enough for its practical application. In this paper, we proposed a Dual Temporal Scale Convolutional Neural Network (DTSCNN) for spontaneous micro-expressions recognition. The DTSCNN is a two-stream network. Different of stream of DTSCNN is used to adapt to different frame rate of micro-expression video clips. Each stream of DSTCNN consists of independent shallow network for avoiding the overfitting problem. Meanwhile, we fed the networks with optical-flow sequences to ensure that the shallow networks can further acquire higher-level features. Experimental results on spontaneous micro-expression databases (CASME I/II) showed that our method can achieve a recognition rate almost 10% higher than what some state-of-the-art method can achieve. PMID:29081753
Multi-stream CNN: Learning representations based on human-related regions for action recognition

NARCIS (Netherlands)

Tu, Zhigang; Xie, Wei; Qin, Qianqing; Poppe, R.W.; Veltkamp, R.C.; Li, Baoxin; Yuan, Junsong

2018-01-01

The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Answers (Q&A) Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) ... Nondiscrimination Visit the Nemours Web site. Note: All information on KidsHealth® is for educational purposes only. For ...
Pattern recognition & machine learning

CERN Document Server

Anzai, Y

1992-01-01

This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... more videos from Veterans Health Administration Lost: The Power of One Connection see more videos from Veterans Health Administration The Power of 1 PSA see more videos from Veterans ...
Relationships between egg-recognition and egg-ejection in a grasp-ejector species.

Directory of Open Access Journals (Sweden)

Manuel Soler

Full Text Available Brood parasitism frequently leads to a total loss of host fitness, which selects for the evolution of defensive traits in host species. Experimental studies have demonstrated that recognition and rejection of the parasite egg is the most common and efficient defence used by host species. Egg-recognition experiments have advanced our knowledge of the evolutionary and coevolutionary implications of egg recognition and rejection. However, our understanding of the proximate mechanisms underlying both processes remains poor. Egg rejection is a complex behavioural process consisting of three stages: egg recognition, the decision whether or not to reject the putative parasitic egg and the act of ejection itself. We have used the blackbird (Turdus merula as a model species to explore the relationship between egg recognition and the act of egg ejection. We have manipulated the two main characteristics of parasitic eggs affecting egg ejection in this grasp-ejector species: the degree of colour mimicry (mimetic and non-mimetic, which mainly affects the egg-recognition stage of the egg-rejection process and egg size (small, medium and large, which affects the decision to eject, while maintaining a control group of non-parasitized nests. The behaviour of the female when confronted with an experimental egg was filmed using a video camera. Our results show that egg touching is an indication of egg recognition and demonstrate that blackbirds recognized (i.e., touched non-mimetic experimental eggs significantly more than mimetic eggs. However, twenty per cent of the experimental eggs were touched but not subsequently ejected, which confirms that egg recognition does not necessarily mean egg ejection and that accepting parasitic eggs, at least sometimes, is the consequence of acceptance decisions. Regarding proximate mechanisms, our results show that the delay in egg ejection is not only due to recognition problems as usually suggested, given that experimental
Automatic generation of pictorial transcripts of video programs

Science.gov (United States)

Shahraray, Behzad; Gibbon, David C.

1995-03-01

An automatic authoring system for the generation of pictorial transcripts of video programs which are accompanied by closed caption information is presented. A number of key frames, each of which represents the visual information in a segment of the video (i.e., a scene), are selected automatically by performing a content-based sampling of the video program. The textual information is recovered from the closed caption signal and is initially segmented based on its implied temporal relationship with the video segments. The text segmentation boundaries are then adjusted, based on lexical analysis and/or caption control information, to account for synchronization errors due to possible delays in the detection of scene boundaries or the transmission of the caption information. The closed caption text is further refined through linguistic processing for conversion to lower- case with correct capitalization. The key frames and the related text generate a compact multimedia presentation of the contents of the video program which lends itself to efficient storage and transmission. This compact representation can be viewed on a computer screen, or used to generate the input to a commercial text processing package to generate a printed version of the program.
Robust Watermarking of Video Streams

Directory of Open Access Journals (Sweden)

T. Polyák

2006-01-01

Full Text Available In the past few years there has been an explosion in the use of digital video data. Many people have personal computers at home, and with the help of the Internet users can easily share video files on their computer. This makes possible the unauthorized use of digital media, and without adequate protection systems the authors and distributors have no means to prevent it.Digital watermarking techniques can help these systems to be more effective by embedding secret data right into the video stream. This makes minor changes in the frames of the video, but these changes are almost imperceptible to the human visual system. The embedded information can involve copyright data, access control etc. A robust watermark is resistant to various distortions of the video, so it cannot be removed without affecting the quality of the host medium. In this paper I propose a video watermarking scheme that fulfills the requirements of a robust watermark.
Applications of Speech-to-Text Recognition and Computer-Aided Translation for Facilitating Cross-Cultural Learning through a Learning Activity: Issues and Their Solutions

Science.gov (United States)

Shadiev, Rustam; Wu, Ting-Ting; Sun, Ai; Huang, Yueh-Min

2018-01-01

In this study, 21 university students, who represented thirteen nationalities, participated in an online cross-cultural learning activity. The participants were engaged in interactions and exchanges carried out on Facebook® and Skype® platforms, and their multilingual communications were supported by speech-to-text recognition (STR) and…
AN EFFICIENT SELF-UPDATING FACE RECOGNITION SYSTEM FOR PLASTIC SURGERY FACE

Directory of Open Access Journals (Sweden)

A. Devi

2016-08-01

Full Text Available Facial recognition system is fundamental a computer application for the automatic identification of a person through a digitized image or a video source. The major cause for the overall poor performance is related to the transformations in appearance of the user based on the aspects akin to ageing, beard growth, sun-tan etc. In order to overcome the above drawback, Self-update process has been developed in which, the system learns the biometric attributes of the user every time the user interacts with the system and the information gets updated automatically. The procedures of Plastic surgery yield a skilled and endurable means of enhancing the facial appearance by means of correcting the anomalies in the feature and then treating the facial skin with the aim of getting a youthful look. When plastic surgery is performed on an individual, the features of the face undergo reconstruction either locally or globally. But, the changes which are introduced new by plastic surgery remain hard to get modeled by the available face recognition systems and they deteriorate the performances of the face recognition algorithm. Hence the Facial plastic surgery produces changes in the facial features to larger extent and thereby creates a significant challenge to the face recognition system. This work introduces a fresh Multimodal Biometric approach making use of novel approaches to boost the rate of recognition and security. The proposed method consists of various processes like Face segmentation using Active Appearance Model (AAM, Face Normalization using Kernel Density Estimate/ Point Distribution Model (KDE-PDM, Feature extraction using Local Gabor XOR Patterns (LGXP and Classification using Independent Component Analysis (ICA. Efficient techniques have been used in each phase of the FRAS in order to obtain improved results.
Automatic recognition of ship types from infrared images using superstructure moment invariants

Science.gov (United States)

Li, Heng; Wang, Xinyu

2007-11-01

Automatic object recognition is an active area of interest for military and commercial applications. In this paper, a system addressing autonomous recognition of ship types in infrared images is proposed. Firstly, an approach of segmentation based on detection of salient features of the target with subsequent shadow removing is proposed, as is the base of the subsequent object recognition. Considering the differences between the shapes of various ships mainly lie in their superstructures, we then use superstructure moment functions invariant to translation, rotation and scale differences in input patterns and develop a robust algorithm of obtaining ship superstructure. Subsequently a back-propagation neural network is used as a classifier in the recognition stage and projection images of simulated three-dimensional ship models are used as the training sets. Our recognition model was implemented and experimentally validated using both simulated three-dimensional ship model images and real images derived from video of an AN/AAS-44V Forward Looking Infrared(FLIR) sensor.

ADAPTIVE STREAMING OVER HTTP (DASH UNTUK APLIKASI VIDEO STREAMING

Directory of Open Access Journals (Sweden)

I Made Oka Widyantara

2015-12-01

Full Text Available This paper aims to analyze Internet-based streaming video service in the communication media with variable bit rates. The proposed scheme on Dynamic Adaptive Streaming over HTTP (DASH using the internet network that adapts to the protocol Hyper Text Transfer Protocol (HTTP. DASH technology allows a video in the video segmentation into several packages that will distreamingkan. DASH initial stage is to compress the video source to lower the bit rate video codec uses H.26. Video compressed further in the segmentation using MP4Box generates streaming packets with the specified duration. These packages are assembled into packets in a streaming media format Presentation Description (MPD or known as MPEG-DASH. Streaming video format MPEG-DASH run on a platform with the player bitdash teritegrasi bitcoin. With this scheme, the video will have several variants of the bit rates that gave rise to the concept of scalability of streaming video services on the client side. The main target of the mechanism is smooth the MPEG-DASH streaming video display on the client. The simulation results show that the scheme based scalable video streaming MPEG-DASH able to improve the quality of image display on the client side, where the procedure bufering videos can be made constant and fine for the duration of video views
Advertisement recognition using mode voting acoustic fingerprint

Science.gov (United States)

Fahmi, Reza; Abedi Firouzjaee, Hosein; Janalizadeh Choobbasti, Ali; Mortazavi Najafabadi, S. H. E.; Safavi, Saeid

2017-12-01

Emergence of media outlets and public relations tools such as TV, radio and the Internet since the 20th century provided the companies with a good platform for advertising their goods and services. Advertisement recognition is an important task that can help companies measure the efficiency of their advertising campaigns in the market and make it possible to compare their performance with competitors in order to get better business insights. Advertisement recognition is usually performed manually with help of human labor or is done through automated methods that are mainly based on heuristics features, these methods usually lack abilities such as scalability, being able to be generalized and be used in different situations. In this paper, we present an automated method for advertisement recognition based on audio processing method that could make this process fairly simple and eliminate the human factor out of the equation. This method has ultimately been used in Miras information technology in order to monitor 56 TV channels to detect all ad video clips broadcast over some networks.
Rapid, low-cost, image analysis through video processing

International Nuclear Information System (INIS)

Levinson, R.A.; Marrs, R.W.; Grantham, D.G.

1976-01-01

Remote Sensing now provides the data necessary to solve many resource problems. However, many of the complex image processing and analysis functions used in analysis of remotely-sensed data are accomplished using sophisticated image analysis equipment. High cost of this equipment places many of these techniques beyond the means of most users. A new, more economical, video system capable of performing complex image analysis has now been developed. This report describes the functions, components, and operation of that system. Processing capability of the new video image analysis system includes many of the tasks previously accomplished with optical projectors and digital computers. Video capabilities include: color separation, color addition/subtraction, contrast stretch, dark level adjustment, density analysis, edge enhancement, scale matching, image mixing (addition and subtraction), image ratioing, and construction of false-color composite images. Rapid input of non-digital image data, instantaneous processing and display, relatively low initial cost, and low operating cost gives the video system a competitive advantage over digital equipment. Complex pre-processing, pattern recognition, and statistical analyses must still be handled through digital computer systems. The video system at the University of Wyoming has undergone extensive testing, comparison to other systems, and has been used successfully in practical applications ranging from analysis of x-rays and thin sections to production of color composite ratios of multispectral imagery. Potential applications are discussed including uranium exploration, petroleum exploration, tectonic studies, geologic mapping, hydrology sedimentology and petrography, anthropology, and studies on vegetation and wildlife habitat
Video Games as a Multifaceted Medium: A Review of Quantitative Social Science Research on Video Games and a Typology of Video Game Research Approaches

Directory of Open Access Journals (Sweden)

James D. Ivory

2013-01-01

Full Text Available Although there is a vast and useful body of quantitative social science research dealing with the social role and impact of video games, it is difficult to compare studies dealing with various dimensions of video games because they are informed by different perspectives and assumptions, employ different methodologies, and address different problems. Studies focusing on different social dimensions of video games can produce varied findings about games’ social function that are often difficult to reconcile— or even contradictory. Research is also often categorized by topic area, rendering a comprehensive view of video games’ social role across topic areas difficult. This interpretive review presents a novel typology of four identified approaches that categorize much of the quantitative social science video game research conducted to date: “video games as stimulus,” “video games as avocation,” “video games as skill,” and “video games as social environment.” This typology is useful because it provides an organizational structure within which the large and growing number of studies on video games can be categorized, guiding comparisons between studies on different research topics and aiding a more comprehensive understanding of video games’ social role. Categorizing the different approaches to video game research provides a useful heuristic for those critiquing and expanding that research, as well as an understandable entry point for scholars new to video game research. Further, and perhaps more importantly, the typology indicates when topics should be explored using different approaches than usual to shed new light on the topic areas. Lastly, the typology exposes the conceptual disconnects between the different approaches to video game research, allowing researchers to consider new ways to bridge gaps between the different approaches’ strengths and limitations with novel methods.
Intelligent Model for Video Survillance Security System

Directory of Open Access Journals (Sweden)

J. Vidhya

2013-12-01

Full Text Available Video surveillance system senses and trails out all the threatening issues in the real time environment. It prevents from security threats with the help of visual devices which gather the information related to videos like CCTV’S and IP (Internet Protocol cameras. Video surveillance system has become a key for addressing problems in the public security. They are mostly deployed on the IP based network. So, all the possible security threats exist in the IP based application might also be the threats available for the reliable application which is available for video surveillance. In result, it may increase cybercrime, illegal video access, mishandling videos and so on. Hence, in this paper an intelligent model is used to propose security for video surveillance system which ensures safety and it provides secured access on video.
You Tube Video Genres. Amateur how-to Videos Versus Professional Tutorials

Directory of Open Access Journals (Sweden)

Andreea Mogoș

2015-12-01

Full Text Available In spite of the fact that there is a vast literature on traditional textual and visual genre classifications, the categorization of web content is still a difficult task, because this medium is fluid, unstable and fast-paced on one hand and, on the other hand, the genre classifications are socially constructed through the tagging process and the interactions (commenting, rating, chatting. This paper focuses on YouTube tutorials and aims to compare video tutorials produced by professionals with amateur video tutorials.
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... v/K5u3sb-Dbkc Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more videos from Veterans ...
Hand Gesture Recognition Using Modified 1$ and Background Subtraction Algorithms

Directory of Open Access Journals (Sweden)

Hazem Khaled

2015-01-01

Full Text Available Computers and computerized machines have tremendously penetrated all aspects of our lives. This raises the importance of Human-Computer Interface (HCI. The common HCI techniques still rely on simple devices such as keyboard, mice, and joysticks, which are not enough to convoy the latest technology. Hand gesture has become one of the most important attractive alternatives to existing traditional HCI techniques. This paper proposes a new hand gesture detection system for Human-Computer Interaction using real-time video streaming. This is achieved by removing the background using average background algorithm and the 1$ algorithm for hand’s template matching. Then every hand gesture is translated to commands that can be used to control robot movements. The simulation results show that the proposed algorithm can achieve high detection rate and small recognition time under different light changes, scales, rotation, and background.
How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Science.gov (United States)

2016-09-25

benefits of using ei- ther a CNN or a temporal model like an RNN or Long Short Term Memory ( LSTM ) network individually, very few works [14, 15] have...subjects. In our experiments, we focus on predicting the valence score using just the video modality. Also, since the test set labels were not readily...correlation coefficient, σ2x and σ 2 y are the variance of the predicted and ground truth values re- spectively and µx and µy are their means
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/ ... Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/v/ ... the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Facial expression recognition in the wild based on multimodal texture features

Science.gov (United States)

Sun, Bo; Li, Liandong; Zhou, Guoyan; He, Jun

2016-11-01

Facial expression recognition in the wild is a very challenging task. We describe our work in static and continuous facial expression recognition in the wild. We evaluate the recognition results of gray deep features and color deep features, and explore the fusion of multimodal texture features. For the continuous facial expression recognition, we design two temporal-spatial dense scale-invariant feature transform (SIFT) features and combine multimodal features to recognize expression from image sequences. For the static facial expression recognition based on video frames, we extract dense SIFT and some deep convolutional neural network (CNN) features, including our proposed CNN architecture. We train linear support vector machine and partial least squares classifiers for those kinds of features on the static facial expression in the wild (SFEW) and acted facial expression in the wild (AFEW) dataset, and we propose a fusion network to combine all the extracted features at decision level. The final achievement we gained is 56.32% on the SFEW testing set and 50.67% on the AFEW validation set, which are much better than the baseline recognition rates of 35.96% and 36.08%.
Motor cortical processing is causally involved in object recognition

Science.gov (United States)

2013-01-01

Background Motor activity during vicarious experience of actions is a widely reported and studied phenomenon, and motor system activity also accompanies observation of graspable objects in the absence of any actions. Such motor activity is thought to reflect simulation of the observed action, or preparation to interact with the object, respectively. Results Here, in an initial exploratory study, we ask whether motor activity during observation of object directed actions is involved in processes related to recognition of the object after initial exposure. Single pulse Transcranial Magnetic Stimulation (TMS) was applied over the thumb representation of the motor cortex, or over the vertex, during observation of a model thumb typing on a cell-phone, and performance on a phone recognition task at the end of the trial was assessed. Disrupting motor processing over the thumb representation 100 ms after the onset of the typing video impaired the ability to recognize the phone in the recognition test, whereas there was no such effect for TMS applied over the vertex and no TMS trials. Furthermore, this effect only manifested for videos observed from the first person perspective. In an additional control condition, there was no evidence for any effects of TMS to the thumb representation or vertex when observing and recognizing non-action related shape stimuli. Conclusion Overall, these data provide evidence that motor cortical processing during observation of object-directed actions from a first person perspective is causally linked to the formation of enduring representations of objects-of-action. PMID:24330638
Robust keyword retrieval method for OCRed text

Science.gov (United States)

Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu

2011-01-01

Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Using Video in the English Language Clasroom

Directory of Open Access Journals (Sweden)

Amado Vicente

2002-08-01

Full Text Available Video is a popular and a motivating potential medium in schools. Using video in the language classroom helps the language teachers in many different ways. Video, for instance, brings the outside world into the language classroom, providing the class with many different topics and reasons to talk about. It can provide comprehensible input to the learners through contextualised models of language use. It also offers good opportunities to introduce native English speech into the language classroom. Through this article I will try to show what the benefits of using video are and, at the end, I present an instrument to select and classify video materials.
An Energy-Efficient and High-Quality Video Transmission Architecture in Wireless Video-Based Sensor Networks

Directory of Open Access Journals (Sweden)

Yasaman Samei

2008-08-01

Full Text Available Technological progress in the fields of Micro Electro-Mechanical Systems (MEMS and wireless communications and also the availability of CMOS cameras, microphones and small-scale array sensors, which may ubiquitously capture multimedia content from the field, have fostered the development of low-cost limited resources Wireless Video-based Sensor Networks (WVSN. With regards to the constraints of videobased sensor nodes and wireless sensor networks, a supporting video stream is not easy to implement with the present sensor network protocols. In this paper, a thorough architecture is presented for video transmission over WVSN called Energy-efficient and high-Quality Video transmission Architecture (EQV-Architecture. This architecture influences three layers of communication protocol stack and considers wireless video sensor nodes constraints like limited process and energy resources while video quality is preserved in the receiver side. Application, transport, and network layers are the layers in which the compression protocol, transport protocol, and routing protocol are proposed respectively, also a dropping scheme is presented in network layer. Simulation results over various environments with dissimilar conditions revealed the effectiveness of the architecture in improving the lifetime of the network as well as preserving the video quality.
Modeling of video traffic in packet networks, low rate video compression, and the development of a lossy+lossless image compression algorithm

Science.gov (United States)

Sayood, K.; Chen, Y. C.; Wang, X.

1992-01-01

During this reporting period we have worked on three somewhat different problems. These are modeling of video traffic in packet networks, low rate video compression, and the development of a lossy + lossless image compression algorithm, which might have some application in browsing algorithms. The lossy + lossless scheme is an extension of work previously done under this grant. It provides a simple technique for incorporating browsing capability. The low rate coding scheme is also a simple variation on the standard discrete cosine transform (DCT) coding approach. In spite of its simplicity, the approach provides surprisingly high quality reconstructions. The modeling approach is borrowed from the speech recognition literature, and seems to be promising in that it provides a simple way of obtaining an idea about the second order behavior of a particular coding scheme. Details about these are presented.
Estimating Body Related Soft Biometric Traits in Video Frames

Directory of Open Access Journals (Sweden)

Olasimbo Ayodeji Arigbabu

2014-01-01

Full Text Available Soft biometrics can be used as a prescreening filter, either by using single trait or by combining several traits to aid the performance of recognition systems in an unobtrusive way. In many practical visual surveillance scenarios, facial information becomes difficult to be effectively constructed due to several varying challenges. However, from distance the visual appearance of an object can be efficiently inferred, thereby providing the possibility of estimating body related information. This paper presents an approach for estimating body related soft biometrics; specifically we propose a new approach based on body measurement and artificial neural network for predicting body weight of subjects and incorporate the existing technique on single view metrology for height estimation in videos with low frame rate. Our evaluation on 1120 frame sets of 80 subjects from a newly compiled dataset shows that the mentioned soft biometric information of human subjects can be adequately predicted from set of frames.
Markov Models for Handwriting Recognition

CERN Document Server

Plotz, Thomas

2011-01-01

Since their first inception, automatic reading systems have evolved substantially, yet the recognition of handwriting remains an open research problem due to its substantial variation in appearance. With the introduction of Markovian models to the field, a promising modeling and recognition paradigm was established for automatic handwriting recognition. However, no standard procedures for building Markov model-based recognizers have yet been established. This text provides a comprehensive overview of the application of Markov models in the field of handwriting recognition, covering both hidden
Deception Detection in Videos

OpenAIRE

Wu, Zhe; Singh, Bharat; Davis, Larry S.; Subrahmanian, V. S.

2017-01-01

We present a system for covert automated deception detection in real-life courtroom trial videos. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely ...

Parkinson's Disease Videos

Medline Plus

Full Text Available ... Nonmotor Symptoms of Parkinson's Disease Expert Briefings: Gait, Balance and Falls in Parkinson's Disease Expert Briefings: Coping ... Library is an extensive collection of books, fact sheets, videos, podcasts, and more. To get started, use ...
Violent Interaction Detection in Video Based on Deep Learning

Science.gov (United States)

Zhou, Peipei; Ding, Qinghai; Luo, Haibo; Hou, Xinglin

2017-06-01

Violent interaction detection is of vital importance in some video surveillance scenarios like railway stations, prisons or psychiatric centres. Existing vision-based methods are mainly based on hand-crafted features such as statistic features between motion regions, leading to a poor adaptability to another dataset. En lightened by the development of convolutional networks on common activity recognition, we construct a FightNet to represent the complicated visual violence interaction. In this paper, a new input modality, image acceleration field is proposed to better extract the motion attributes. Firstly, each video is framed as RGB images. Secondly, optical flow field is computed using the consecutive frames and acceleration field is obtained according to the optical flow field. Thirdly, the FightNet is trained with three kinds of input modalities, i.e., RGB images for spatial networks, optical flow images and acceleration images for temporal networks. By fusing results from different inputs, we conclude whether a video tells a violent event or not. To provide researchers a common ground for comparison, we have collected a violent interaction dataset (VID), containing 2314 videos with 1077 fight ones and 1237 no-fight ones. By comparison with other algorithms, experimental results demonstrate that the proposed model for violent interaction detection shows higher accuracy and better robustness.
Emotion Recognition in Face and Body Motion in Bulimia Nervosa.

Science.gov (United States)

Dapelo, Marcela Marin; Surguladze, Simon; Morris, Robin; Tchanturia, Kate

2017-11-01

Social cognition has been studied extensively in anorexia nervosa (AN), but there are few studies in bulimia nervosa (BN). This study investigated the ability of people with BN to recognise emotions in ambiguous facial expressions and in body movement. Participants were 26 women with BN, who were compared with 35 with AN, and 42 healthy controls. Participants completed an emotion recognition task by using faces portraying blended emotions, along with a body emotion recognition task by using videos of point-light walkers. The results indicated that BN participants exhibited difficulties recognising disgust in less-ambiguous facial expressions, and a tendency to interpret non-angry faces as anger, compared with healthy controls. These difficulties were similar to those found in AN. There were no significant differences amongst the groups in body motion emotion recognition. The findings suggest that difficulties with disgust and anger recognition in facial expressions may be shared transdiagnostically in people with eating disorders. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association.
On finding the C in CBT: the challenges of applying gambling-related cognitive approaches to video-gaming.

Science.gov (United States)

Delfabbro, Paul; King, Daniel

2015-03-01

Many similarities have been drawn between the activities of gambling and video-gaming. Both are repetitive activities with intermittent reinforcement, decision-making opportunities, and elements of risk-taking. As a result, it might be tempting to believe that cognitive strategies that are used to treat problem gambling might also be applied to problematic video gaming. In this paper, we argue that many cognitive approaches to gambling that typically involve a focus on erroneous beliefs about probabilities and randomness are not readily applicable to video gaming. Instead, we encourage a focus on other clusters of cognitions that relate to: (a) the salience and over-valuing of gaming rewards, experiences, and identities, (b) maladaptive and inflexible rules about behaviour, (c) the use of video-gaming to maintain self-esteem, and (d) video-gaming for social status and recognition. This theoretical discussion is advanced as a starting point for the development of more refined cognitive treatment approaches for problematic video gaming.
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... info@ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese (Simplified) Chinese ( ...
Recognizing Cursive Typewritten Text Using Segmentation-Free System

Directory of Open Access Journals (Sweden)

Mohammad S. Khorsheed

2015-01-01

Full Text Available Feature extraction plays an important role in text recognition as it aims to capture essential characteristics of the text image. Feature extraction algorithms widely range between robust and hard to extract features and noise sensitive and easy to extract features. Among those feature types are statistical features which are derived from the statistical distribution of the image pixels. This paper presents a novel method for feature extraction where simple statistical features are extracted from a one-pixel wide window that slides across the text line. The feature set is clustered in the feature space using vector quantization. The feature vector sequence is then injected to a classification engine for training and recognition purposes. The recognition system is applied to a data corpus which includes cursive Arabic text of more than 600 A4-size sheets typewritten in multiple computer-generated fonts. The system performance is compared to a previously published system from the literature with a similar engine but a different feature set.
A dynamic texture based approach to recognition of facial actions and their temporal models

NARCIS (Netherlands)

Koelstra, Sander; Pantic, Maja; Patras, Ioannis (Yannis)

2010-01-01

In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... 8211 info@ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese (Simplified) Chinese ( ...
A Comparative Study of Registration Methods for RGB-D Video of Static Scenes

Directory of Open Access Journals (Sweden)

Vicente Morell-Gimenez

2014-05-01

Full Text Available The use of RGB-D sensors for mapping and recognition tasks in robotics or, in general, for virtual reconstruction has increased in recent years. The key aspect of these kinds of sensors is that they provide both depth and color information using the same device. In this paper, we present a comparative analysis of the most important methods used in the literature for the registration of subsequent RGB-D video frames in static scenarios. The analysis begins by explaining the characteristics of the registration problem, dividing it into two representative applications: scene modeling and object reconstruction. Then, a detailed experimentation is carried out to determine the behavior of the different methods depending on the application. For both applications, we used standard datasets and a new one built for object reconstruction.
The Improvement of Behavior Recognition Accuracy of Micro Inertial Accelerometer by Secondary Recognition Algorithm

Directory of Open Access Journals (Sweden)

Yu Liu

2014-05-01

Full Text Available Behaviors of “still”, “walking”, “running”, “jumping”, “upstairs” and “downstairs” can be recognized by micro inertial accelerometer of low cost. By using the features as inputs to the well-trained BP artificial neural network which is selected as classifier, those behaviors can be recognized. But the experimental results show that the recognition accuracy is not satisfactory. This paper presents secondary recognition algorithm and combine it with BP artificial neural network to improving the recognition accuracy. The Algorithm is verified by the Android mobile platform, and the recognition accuracy can be improved more than 8 %. Through extensive testing statistic analysis, the recognition accuracy can reach 95 % through BP artificial neural network and the secondary recognition, which is a reasonable good result from practical point of view.
Fish4Knowledge collecting and analyzing massive coral reef fish video data

CERN Document Server

Chen-Burger, Yun-Heh; Giordano, Daniela; Hardman, Lynda; Lin, Fang-Pang

2016-01-01

This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and tracking, fish species recognition and analysis, a large SQL database to record the results and an efficient retrieval mechanism. Novel user interface mechanisms were developed to provide easy access for marine ecologists, who wanted to explore the dataset. The book is a useful resource for system builders, as it gives an overview of the many new methods that were created to build the Fish4Knowledge system in a manner that also allows readers to see ho...
Video processing project

CSIR Research Space (South Africa)

Globisch, R

2009-03-01

Full Text Available Video processing source code for algorithms and tools used in software media pipelines (e.g. image scalers, colour converters, etc.) The currently available source code is written in C++ with their associated libraries and DirectShow- Filters....
Temporal Segmentation of MPEG Video Streams

Directory of Open Access Journals (Sweden)

Janko Calic

2002-06-01

Full Text Available Many algorithms for temporal video partitioning rely on the analysis of uncompressed video features. Since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream, higher efficiency can be achieved utilizing information from the MPEG compressed domain. This paper introduces a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence. Results of computer simulations are reported.
Textual emotion recognition for enhancing enterprise computing

Science.gov (United States)

Quan, Changqin; Ren, Fuji

2016-05-01

The growing interest in affective computing (AC) brings a lot of valuable research topics that can meet different application demands in enterprise systems. The present study explores a sub area of AC techniques - textual emotion recognition for enhancing enterprise computing. Multi-label emotion recognition in text is able to provide a more comprehensive understanding of emotions than single label emotion recognition. A representation of 'emotion state in text' is proposed to encompass the multidimensional emotions in text. It ensures the description in a formal way of the configurations of basic emotions as well as of the relations between them. Our method allows recognition of the emotions for the words bear indirect emotions, emotion ambiguity and multiple emotions. We further investigate the effect of word order for emotional expression by comparing the performances of bag-of-words model and sequence model for multi-label sentence emotion recognition. The experiments show that the classification results under sequence model are better than under bag-of-words model. And homogeneous Markov model showed promising results of multi-label sentence emotion recognition. This emotion recognition system is able to provide a convenient way to acquire valuable emotion information and to improve enterprise competitive ability in many aspects.
Contagious Content: Viral Video Ads Identification of Content Characteristics that Help Online Video Advertisements Go Viral

Directory of Open Access Journals (Sweden)

Yentl Knossenburg

2016-12-01

Full Text Available Why do some online video advertisements go viral while others remain unnoticed? What kind of video content keeps the viewer interested and motivated to share? Many companies have realized the need to innovate their marketing strategies and have embraced the newest ways of using technology, as the Internet, to their advantage as in the example of virality. Yet few marketers actually understand how, and academic literature on this topic is still in development. This study investigated which content characteristics distinguish successful from non-successful online viral video advertisements by analyzing 641 cases using Structural Equation Modeling. Results show that Engagement and Surprise are two main content characteristics that significantly increase the chance of online video advertisements to go viral.
Emotion recognition in girls with conduct problems.

Science.gov (United States)

Schwenck, Christina; Gensthaler, Angelika; Romanos, Marcel; Freitag, Christine M; Schneider, Wolfgang; Taurines, Regina

2014-01-01

A deficit in emotion recognition has been suggested to underlie conduct problems. Although several studies have been conducted on this topic so far, most concentrated on male participants. The aim of the current study was to compare recognition of morphed emotional faces in girls with conduct problems (CP) with elevated or low callous-unemotional (CU+ vs. CU-) traits and a matched healthy developing control group (CG). Sixteen girls with CP-CU+, 16 girls with CP-CU- and 32 controls (mean age: 13.23 years, SD=2.33 years) were included. Video clips with morphed faces were presented in two runs to assess emotion recognition. Multivariate analysis of variance with the factors group and run was performed. Girls with CP-CU- needed more time than the CG to encode sad, fearful, and happy faces and they correctly identified sadness less often. Girls with CP-CU+ outperformed the other groups in the identification of fear. Learning effects throughout runs were the same for all groups except that girls with CP-CU- correctly identified fear less often in the second run compared to the first run. Results need to be replicated with comparable tasks, which might result in subgroup-specific therapeutic recommendations.
PERBANDINGAN EUCLIDEAN DISTANCE DENGAN CANBERRA DISTANCE PADA FACE RECOGNITION

Directory of Open Access Journals (Sweden)

Sendhy Rachmat Wurdianarto

2014-08-01

Full Text Available Perkembangan ilmu pada dunia komputer sangatlah pesat. Salah satu yang menandai hal ini adalah ilmu komputer telah merambah pada dunia biometrik. Arti biometrik sendiri adalah karakter-karakter manusia yang dapat digunakan untuk membedakan antara orang yang satu dengan yang lainnya. Salah satu pemanfaatan karakter / organ tubuh pada setiap manusia yang digunakan untuk identifikasi (pengenalan adalah dengan memanfaatkan wajah. Dari permasalahan diatas dalam pengenalan lebih tentang aplikasi Matlab pada Face Recognation menggunakan metode Euclidean Distance dan Canberra Distance. Model pengembangan aplikasi yang digunakan adalah model waterfall. Model waterfall beriisi rangkaian aktivitas proses yang disajikan dalam proses analisa kebutuhan, desain menggunakan UML (Unified Modeling Language, inputan objek gambar diproses menggunakan Euclidean Distance dan Canberra Distance. Kesimpulan yang dapat ditarik adalah aplikasi face Recognation menggunakan metode euclidean Distance dan Canverra Distance terdapat kelebihan dan kekurangan masing-masing. Untuk kedepannya aplikasi tersebut dapat dikembangkan dengan menggunakan objek berupa video ataupun objek lainnya. Kata kunci : Euclidean Distance, Face Recognition, Biometrik, Canberra Distance
Making Sense of Video Analytics: Lessons Learned from Clickstream Interactions, Attitudes, and Learning Outcome in a Video-Assisted Course

Directory of Open Access Journals (Sweden)

Michail N. Giannakos

2015-02-01

Full Text Available Online video lectures have been considered an instructional media for various pedagogic approaches, such as the flipped classroom and open online courses. In comparison to other instructional media, online video affords the opportunity for recording student clickstream patterns within a video lecture. Video analytics within lecture videos may provide insights into student learning performance and inform the improvement of video-assisted teaching tactics. Nevertheless, video analytics are not accessible to learning stakeholders, such as researchers and educators, mainly because online video platforms do not broadly share the interactions of the users with their systems. For this purpose, we have designed an open-access video analytics system for use in a video-assisted course. In this paper, we present a longitudinal study, which provides valuable insights through the lens of the collected video analytics. In particular, we found that there is a relationship between video navigation (repeated views and the level of cognition/thinking required for a specific video segment. Our results indicated that learning performance progress was slightly improved and stabilized after the third week of the video-assisted course. We also found that attitudes regarding easiness, usability, usefulness, and acceptance of this type of course remained at the same levels throughout the course. Finally, we triangulate analytics from diverse sources, discuss them, and provide the lessons learned for further development and refinement of video-assisted courses and practices.
BioFoV - An open platform for forensic video analysis and biometric data extraction

DEFF Research Database (Denmark)

Almeida, Miguel; Correia, Paulo Lobato; Larsen, Peter Kastmand

2016-01-01

to tailor-made software, based on state of art knowledge in fields such as soft biometrics, gait recognition, photogrammetry, etc. This paper proposes an open and extensible platform, BioFoV (Biometric Forensic Video tool), for forensic video analysis and biometric data extraction, aiming to host some...... of the developments that researchers come up with for solving specific problems, but that are often not shared with the community. BioFoV includes a simple to use Graphical User Interface (GUI), is implemented with open software that can run in multiple software platforms, and its implementation is publicly available....
Developmental changes in emotion recognition from full-light and point-light displays of body movement.

Directory of Open Access Journals (Sweden)

Patrick D Ross

Full Text Available To date, research on the development of emotion recognition has been dominated by studies on facial expression interpretation; very little is known about children's ability to recognize affective meaning from body movements. In the present study, we acquired simultaneous video and motion capture recordings of two actors portraying four basic emotions (Happiness Sadness, Fear and Anger. One hundred and seven primary and secondary school children (aged 4-17 and 14 adult volunteers participated in the study. Each participant viewed the full-light and point-light video clips and was asked to make a forced-choice as to which emotion was being portrayed. As a group, children performed worse than adults for both point-light and full-light conditions. Linear regression showed that both age and lighting condition were significant predictors of performance in children. Using piecewise regression, we found that a bilinear model with a steep improvement in performance until 8.5 years of age, followed by a much slower improvement rate through late childhood and adolescence best explained the data. These findings confirm that, like for facial expression, adolescents' recognition of basic emotions from body language is not fully mature and seems to follow a non-linear development. This is in line with observations of non-linear developmental trajectories for different aspects of human stimuli processing (voices and faces, perhaps suggesting a shift from one perceptual or cognitive strategy to another during adolescence. These results have important implications to understanding the maturation of social cognition.

Video Vortex reader II: moving images beyond YouTube

NARCIS (Netherlands)

Lovink, G.; Somers Miles, R.

2011-01-01

Video Vortex Reader II is the Institute of Network Cultures' second collection of texts that critically explore the rapidly changing landscape of online video and its use. With the success of YouTube ('2 billion views per day') and the rise of other online video sharing platforms, the moving image
Gender and video games: How is female gender generally represented in various genres of video games?

Directory of Open Access Journals (Sweden)

Xeniya Kondrat

2015-06-01

Full Text Available Gender representation in video games is a current sensitive topic in entertainment media. Gender studies in video games look at the difference between the portrayal of female and male characters. Most video games tend to over-represent stereotypes and in general use extensive violence and cruelty (Maietti, 2008. Some video games use wrong, disrespectful and sometimes even violent representations of both genders. This research paper focuses on the current representation of female gender in video games and how they are represented, stereotyped and used as characters in games. Results show that there is a difference between portraying women in the past and present. This research paper is based on previous academic research and results which were achieved with online questionnaire among game players and two interviews with professionals in the field of game design. The results show that there is still negative stereotyping of female gender. However, at the same time, the answers of the respondents show that the target audience of video games desires improvements in presentation of female gender as well as male.
High-quality and small-capacity e-learning video featuring lecturer-superimposing PC screen images

Science.gov (United States)

Nomura, Yoshihiko; Murakami, Michinobu; Sakamoto, Ryota; Sugiura, Tokuhiro; Matsui, Hirokazu; Kato, Norihiko

2006-10-01

Information processing and communication technology are progressing quickly, and are prevailing throughout various technological fields. Therefore, the development of such technology should respond to the needs for improvement of quality in the e-learning education system. The authors propose a new video-image compression processing system that ingeniously employs the features of the lecturing scene. While dynamic lecturing scene is shot by a digital video camera, screen images are electronically stored by a PC screen image capturing software in relatively long period at a practical class. Then, a lecturer and a lecture stick are extracted from the digital video images by pattern recognition techniques, and the extracted images are superimposed on the appropriate PC screen images by off-line processing. Thus, we have succeeded to create a high-quality and small-capacity (HQ/SC) video-on-demand educational content featuring the advantages: the high quality of image sharpness, the small electronic file capacity, and the realistic lecturer motion.
Inclusion in the Workplace - Text Version | NREL

Science.gov (United States)

Careers Â» Inclusion in the Workplace - Text Version Inclusion in the Workplace - Text Version This is the text version for the Inclusion: Leading by Example video. I'm Martin Keller. I'm the NREL of the laboratory. Another very important element in inclusion is diversity. Because if we have a
A simplified 2D to 3D video conversion technology——taking virtual campus video production as an example

Directory of Open Access Journals (Sweden)

ZHUANG Huiyang

2012-10-01

Full Text Available This paper describes a simplified 2D to 3D Video Conversion Technology, taking virtual campus 3D video production as an example. First, it clarifies the meaning of the 2D to 3D Video Conversion Technology, and points out the disadvantages of traditional methods. Second, it forms an innovative and convenient method. A flow diagram, software and hardware configurations are presented. Finally, detailed description of the conversion steps and precautions are given in turn to the three processes, namely, preparing materials, modeling objects and baking landscapes, recording screen and converting videos .
Designing a large-scale video chat application

OpenAIRE

Scholl, Jeremiah; Parnes, Peter; McCarthy, John D.; Sasse, Angela

2005-01-01

Studies of video conferencing systems generally focus on scenarios where users communicate using an audio channel. However, text chat serves users in a wide variety of contexts, and is commonly included in multimedia conferencing systems as a complement to the audio channel. This paper introduces a prototype application which integrates video and text communication, and describes a formative evaluation of the prototype with 53 users in a social setting. We focus the evaluation on bandwidth an...
The software for automatic creation of the formal grammars used by speech recognition, computer vision, editable text conversion systems, and some new functions

Science.gov (United States)

Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan

2017-02-01

For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... from Veterans Health Administration The Power of 1 PSA see more videos from Veterans Health Administration Commitments PSA see more videos from Veterans Health Administration The ...
Designing with video focusing the user-centred design process

CERN Document Server

Ylirisku, Salu Pekka

2007-01-01

Digital video for user-centered co-design is an emerging field of design, gaining increasing interest in both industry and academia. It merges the techniques and approaches of design ethnography, participatory design, interaction analysis, scenario-based design, and usability studies. This book covers the complete user-centered design project. It illustrates in detail how digital video can be utilized throughout the design process, from early user studies to making sense of video content and envisioning the future with video scenarios to provoking change with video artifacts. The text includes
Utterance Verification for Text-Dependent Speaker Recognition

DEFF Research Database (Denmark)

Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan

2016-01-01

Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...
Home Video Telemetry vs inpatient telemetry: A comparative study looking at video quality

Directory of Open Access Journals (Sweden)

Sutapa Biswas

Full Text Available Objective: To compare the quality of home video recording with inpatient telemetry (IPT to evaluate our current Home Video Telemetry (HVT practice. Method: To assess our HVT practice, a retrospective comparison of the video quality against IPT was conducted with the latter as the gold standard. A pilot study had been conducted in 2008 on 5 patients.Patients (n = 28 were included in each group over a period of one year.The data was collected from referral spreadsheets, King’s EPR and telemetry archive.Scoring of the events captured was by consensus using two scorers.The variables compared included: visibility of the body part of interest, visibility of eyes, time of event, illumination, contrast, sound quality and picture clarity when amplified to 200%.Statistical evaluation was carried out using Shapiro–Wilk and Chi-square tests. The P-value of ⩽0.05 was considered statistically significant. Results: Significant differences were demonstrated in lighting and contrast between the two groups (HVT performed better in both.Amplified picture quality was slightly better in the HVT group. Conclusion: Video quality of HVT is comparable to IPT, even surpassing IPT in certain aspects such as the level of illumination and contrast. Results were reconfirmed in a larger sample of patients with more variables. Significance: Despite the user and environmental variability in HVT, it looks promising and can be seriously considered as a preferable alternative for patients who may require investigation at locations remote from an EEG laboratory. Keywords: Home Video Telemetry, EEG, Home video monitoring, Video quality
Video Browsing on Handheld Devices

Science.gov (United States)

Hürst, Wolfgang

Recent improvements in processing power, storage space, and video codec development enable users now to playback video on their handheld devices in a reasonable quality. However, given the form factor restrictions of such a mobile device, screen size still remains a natural limit and - as the term "handheld" implies - always will be a critical resource. This is not only true for video but any data that is processed on such devices. For this reason, developers have come up with new and innovative ways to deal with large documents in such limited scenarios. For example, if you look at the iPhone, innovative techniques such as flicking have been introduced to skim large lists of text (e.g. hundreds of entries in your music collection). Automatically adapting the zoom level to, for example, the width of table cells when double tapping on the screen enables reasonable browsing of web pages that have originally been designed for large, desktop PC sized screens. A multi touch interface allows you to easily zoom in and out of large text documents and images using two fingers. In the next section, we will illustrate that advanced techniques to browse large video files have been developed in the past years, as well. However, if you look at state-of-the-art video players on mobile devices, normally just simple, VCR like controls are supported (at least at the time of this writing) that only allow users to just start, stop, and pause video playback. If supported at all, browsing and navigation functionality is often restricted to simple skipping of chapters via two single buttons for backward and forward navigation and a small and thus not very sensitive timeline slider.
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... patient kit Treatment Options Overview Observation Radiation Surgery What is acoustic neuroma Diagnosing ... Back Community Patient Stories Share Your Story Video Stories Caregivers Milestones Gallery Submit Your Milestone Team ANA Volunteer ...
USING CONVOLUTIONAL NEURAL NETWORKS FOR LICENSE PLATES RECOGNITION. ADVANTAGES AND DISADVANTAGES IN COMPARISON WITH TEMPLATE-BASED METHOD

OpenAIRE

Mikhalevich Y. S.; Tkachenko V. V.

2016-01-01

Car license plates recognition problem is one of the typical tasks of computer vision. Video surveillance software usually provides license plates recognition function. Meanwhile, there are many approaches to solve this problem, where template-based methods are the most common. Such methods providing predictable and short enough execution time, and little percent of mistakes. However, such methods are far less effective in case there is a need to recognize car’s license plate, which may be lo...
No-Reference Video Quality Assessment Model for Distortion Caused by Packet Loss in the Real-Time Mobile Video Services

Directory of Open Access Journals (Sweden)

Jiarun Song

2014-01-01

Full Text Available Packet loss will make severe errors due to the corruption of related video data. For most video streams, because the predictive coding structures are employed, the transmission errors in one frame will not only cause decoding failure of itself at the receiver side, but also propagate to its subsequent frames along the motion prediction path, which will bring a significant degradation of end-to-end video quality. To quantify the effects of packet loss on video quality, a no-reference objective quality assessment model is presented in this paper. Considering the fact that the degradation of video quality significantly relies on the video content, the temporal complexity is estimated to reflect the varying characteristic of video content, using the macroblocks with different motion activities in each frame. Then, the quality of the frame affected by the reference frame loss, by error propagation, or by both of them is evaluated, respectively. Utilizing a two-level temporal pooling scheme, the video quality is finally obtained. Extensive experimental results show that the video quality estimated by the proposed method matches well with the subjective quality.
High efficiency video coding coding tools and specification

CERN Document Server

Wien, Mathias

2015-01-01

The video coding standard High Efficiency Video Coding (HEVC) targets at improved compression performance for video resolutions of HD and beyond, providing Ultra HD video at similar compressed bit rates as for HD video encoded with the well-established video coding standard H.264 | AVC. Based on known concepts, new coding structures and improved coding tools have been developed and specified in HEVC. The standard is expected to be taken up easily by established industry as well as new endeavors, answering the needs of todays connected and ever-evolving online world. This book presents the High Efficiency Video Coding standard and explains it in a clear and coherent language. It provides a comprehensive and consistently written description, all of a piece. The book targets at both, newbies to video coding as well as experts in the field. While providing sections with introductory text for the beginner, it suits as a well-arranged reference book for the expert. The book provides a comprehensive reference for th...
Parkinson's Disease Videos

Medline Plus

Full Text Available ... Is Initiated After Diagnosis? CareMAP: When Is It Time to Get Help? Unconditional Love CareMAP: Rest and Sleep: ... CareMAP: Mealtime and Swallowing: Part 1 ... of books, fact sheets, videos, podcasts, and more. To get started, use the search feature or check ...
A Novel Morphometry-Based Protocol of Automated Video-Image Analysis for Species Recognition and Activity Rhythms Monitoring in Deep-Sea Fauna

Directory of Open Access Journals (Sweden)

Paolo Menesatti

2009-10-01

Full Text Available The understanding of ecosystem dynamics in deep-sea areas is to date limited by technical constraints on sampling repetition. We have elaborated a morphometry-based protocol for automated video-image analysis where animal movement tracking (by frame subtraction is accompanied by species identification from animals’ outlines by Fourier Descriptors and Standard K-Nearest Neighbours methods. One-week footage from a permanent video-station located at 1,100 m depth in Sagami Bay (Central Japan was analysed. Out of 150,000 frames (1 per 4 s, a subset of 10.000 was analyzed by a trained operator to increase the efficiency of the automated procedure. Error estimation of the automated and trained operator procedure was computed as a measure of protocol performance. Three displacing species were identified as the most recurrent: Zoarcid fishes (eelpouts, red crabs (Paralomis multispina, and snails (Buccinum soyomaruae. Species identification with KNN thresholding produced better results in automated motion detection. Results were discussed assuming that the technological bottleneck is to date deeply conditioning the exploration of the deep-sea.
Bollywood Movie Corpus for Text, Images and Videos

OpenAIRE

Madaan, Nishtha; Mehta, Sameep; Saxena, Mayank; Aggarwal, Aditi; Agrawaal, Taneea S; Malhotra, Vrinda

2017-01-01

In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1...
Video Texture Synthesis Based on Flow-Like Stylization Painting

Directory of Open Access Journals (Sweden)

Qian Wenhua

2014-01-01

Full Text Available The paper presents an NP-video rendering system based on natural phenomena. It provides a simple nonphotorealistic video synthesis system in which user can obtain a flow-like stylization painting and infinite video scene. Firstly, based on anisotropic Kuwahara filtering in conjunction with line integral convolution, the phenomena video scene can be rendered to flow-like stylization painting. Secondly, the methods of frame division, patches synthesis, will be used to synthesize infinite playing video. According to selection examples from different natural video texture, our system can generate stylized of flow-like and infinite video scenes. The visual discontinuities between neighbor frames are decreased, and we also preserve feature and details of frames. This rendering system is easy and simple to implement.

Two-Stage Classification Approach for Human Detection in Camera Video in Bulk Ports

Directory of Open Access Journals (Sweden)

Mi Chao

2015-09-01

Full Text Available With the development of automation in ports, the video surveillance systems with automated human detection begun to be applied in open-air handling operation areas for safety and security. The accuracy of traditional human detection based on the video camera is not high enough to meet the requirements of operation surveillance. One of the key reasons is that Histograms of Oriented Gradients (HOG features of the human body will show great different between front & back standing (F&B and side standing (Side human body. Therefore, the final training for classifier will only gain a few useful specific features which have contribution to classification and are insufficient to support effective classification, while using the HOG features directly extracted by the samples from different human postures. This paper proposes a two-stage classification method to improve the accuracy of human detection. In the first stage, during preprocessing classification, images is mainly divided into possible F&B human body and not F&B human body, and then they were put into the second-stage classification among side human and non-human recognition. The experimental results in Tianjin port show that the two-stage classifier can improve the classification accuracy of human detection obviously.
Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts

NARCIS (Netherlands)

E.M. Van Mulligen (Erik M.); Z. Afzal (Zubair); S.A. Akhondi (Saber); D. Vo (Dang); J.A. Kors (Jan)

2016-01-01

textabstractWe participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach.
A Proposed Arabic Handwritten Text Normalization Method

Directory of Open Access Journals (Sweden)

Tarik Abu-Ain

2014-11-01

Full Text Available Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line. The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.
Self-aligning and compressed autosophy video databases

Science.gov (United States)

Holtz, Klaus E.

1993-04-01

Autosophy, an emerging new science, explains `self-assembling structures,' such as crystals or living trees, in mathematical terms. This research provides a new mathematical theory of `learning' and a new `information theory' which permits the growing of self-assembling data network in a computer memory similar to the growing of `data crystals' or `data trees' without data processing or programming. Autosophy databases are educated very much like a human child to organize their own internal data storage. Input patterns, such as written questions or images, are converted to points in a mathematical omni dimensional hyperspace. The input patterns are then associated with output patterns, such as written answers or images. Omni dimensional information storage will result in enormous data compression because each pattern fragment is only stored once. Pattern recognition in the text or image files is greatly simplified by the peculiar omni dimensional storage method. Video databases will absorb input images from a TV camera and associate them with textual information. The `black box' operations are totally self-aligning where the input data will determine their own hyperspace storage locations. Self-aligning autosophy databases may lead to a new generation of brain-like devices.
Improved chaos-based video steganography using DNA alphabets

Directory of Open Access Journals (Sweden)

Nirmalya Kar

2018-03-01

Full Text Available DNA based steganography plays a vital role in the field of privacy and secure communication. Here, we propose a DNA properties-based mechanism to send data hidden inside a video file. Initially, the video file is converted into image frames. Random frames are then selected and data is hidden in these at random locations by using the Least Significant Bit substitution method. We analyze the proposed architecture in terms of peak signal-to-noise ratio as well as mean squared error measured between the original and steganographic files averaged over all video frames. The results show minimal degradation of the steganographic video file. Keywords: Chaotic map, DNA, Linear congruential generator, Video steganography, Least significant bit
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... Care Disease Types FAQ Handout for Patients and Families Is It Right for You How to Get ... For the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... Support Groups Is a support group for me? Find a Group Upcoming Events Video Library Photo Gallery ... Support ANetwork Peer Support Program Community Connections Overview Find a Meeting Host a Meeting Volunteer Become a ...
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and Families ... For Family Caregivers Glossary Resources Browse our palliative care resources below: Links Videos Podcasts Webinars For the ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... Mission, Vision & Values Shop ANA Leadership & Staff Annual Reports Acoustic Neuroma Association 600 Peachtree Parkway Suite 108 ... About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... About ANA Mission, Vision & Values Shop ANA Leadership & Staff Annual Reports Acoustic Neuroma Association 600 Peachtree Parkway ... ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... Facts What is acoustic neuroma? Diagnosing Symptoms Side Effects Keywords World Language Videos Questions to ask Choosing ... Surgery What is acoustic neuroma Diagnosing Symptoms Side effects Question To Ask Treatment Options Back Overview Observation ...
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos Podcasts ... the Media For Clinicians For Policymakers For Family Caregivers Glossary Resources Browse our palliative care resources below: ...
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources ...
Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese ( ...
THE EDUCATIONAL POTENTIAL OF VIDEO GAMES

Directory of Open Access Journals (Sweden)

Ruxandra Claudia CHIRCA (NEACȘU

2015-11-01

Full Text Available In nowadays' world, technological assistance is no longer confined to its primary purpose of communication or informational support and the boundaries between real and virtual world are becoming increasingly harder to be defined. This is the world of digital natives, today's children, who grow up in a technology-brimming environment and who spend most of their time playing video games. Are these video games constructive in any way? Scientific studies state they are. Video games help children in setting their goals, provide constant feedback and offer immediate rewards, along with the opportunity to collaborate with other players. Furthermore, video games can generate strong emotional reactions, such as joy or fear, and they have a captivating story line, which reveals itself within a realm of elaborate graphics.
Sensor agnostic object recognition using a map seeking circuit

Science.gov (United States)

Overman, Timothy L.; Hart, Michael

2012-05-01

Automatic object recognition capabilities are traditionally tuned to exploit the specific sensing modality they were designed to. Their successes (and shortcomings) are tied to object segmentation from the background, they typically require highly skilled personnel to train them, and they become cumbersome with the introduction of new objects. In this paper we describe a sensor independent algorithm based on the biologically inspired technology of map seeking circuits (MSC) which overcomes many of these obstacles. In particular, the MSC concept offers transparency in object recognition from a common interface to all sensor types, analogous to a USB device. It also provides a common core framework that is independent of the sensor and expandable to support high dimensionality decision spaces. Ease in training is assured by using commercially available 3D models from the video game community. The search time remains linear no matter how many objects are introduced, ensuring rapid object recognition. Here, we report results of an MSC algorithm applied to object recognition and pose estimation from high range resolution radar (1D), electrooptical imagery (2D), and LIDAR point clouds (3D) separately. By abstracting the sensor phenomenology from the underlying a prior knowledge base, MSC shows promise as an easily adaptable tool for incorporating additional sensor inputs.
A Method for Estimating Surveillance Video Georeferences

Directory of Open Access Journals (Sweden)

Aleksandar Milosavljević

2017-07-01

Full Text Available The integration of a surveillance camera video with a three-dimensional (3D geographic information system (GIS requires the georeferencing of that video. Since a video consists of separate frames, each frame must be georeferenced. To georeference a video frame, we rely on the information about the camera view at the moment that the frame was captured. A camera view in 3D space is completely determined by the camera position, orientation, and field-of-view. Since the accurate measuring of these parameters can be extremely difficult, in this paper we propose a method for their estimation based on matching video frame coordinates of certain point features with their 3D geographic locations. To obtain these coordinates, we rely on high-resolution orthophotos and digital elevation models (DEM of the area of interest. Once an adequate number of points are matched, Levenberg–Marquardt iterative optimization is applied to find the most suitable video frame georeference, i.e., position and orientation of the camera.
YouTube and Video Quizzes

Directory of Open Access Journals (Sweden)

Kevin YEE

2010-04-01

Full Text Available The Internet sensation YouTube (http://www.youtube.com has become such a force online that it was estimated in 2006 to account for a full tenth of the bandwidth by the entire Internet in the United States (WebProNews, 2007, and to use as much bandwidth in 2007 as the entire Internet had done in 2000 (Carter, 2008. Like many technological tools created with entertainment or profit in mind, YouTube can now be easily and usefully adopted by instructors for educational purposes, and indeed many professors use YouTube in their classroom teaching already (Brooks, 2000. This is especially true for passive uses of YouTube; watching videos that are already online and using them in the classroom experience to support a concept and provide another mechanism for students to connect with the topics. It is fruitful to consider Bloom's Taxonomy of Educational Objectives (Bloom & Krathwohl, 1956 when employing video or any media in the classroom to maximize the intentionality of teaching and learning. The use of video for demonstration or modeling corresponds well to Blooms levels of Knowledge, Comprehension, and Application; while case studies offer a chance to demonstrate Analysis and Synthesis, and perhaps even Evaluation, when comparing a video to information from a text book or other content.
Development of a Kinect Software Tool to Classify Movements during Active Video Gaming.

Science.gov (United States)

Rosenberg, Michael; Thornton, Ashleigh L; Lay, Brendan S; Ward, Brodie; Nathan, David; Hunt, Daniel; Braham, Rebecca

2016-01-01

While it has been established that using full body motion to play active video games results in increased levels of energy expenditure, there is little information on the classification of human movement during active video game play in relationship to fundamental movement skills. The aim of this study was to validate software utilising Kinect sensor motion capture technology to recognise fundamental movement skills (FMS), during active video game play. Two human assessors rated jumping and side-stepping and these assessments were compared to the Kinect Action Recognition Tool (KART), to establish a level of agreement and determine the number of movements completed during five minutes of active video game play, for 43 children (m = 12 years 7 months ± 1 year 6 months). During five minutes of active video game play, inter-rater reliability, when examining the two human raters, was found to be higher for the jump (r = 0.94, p game play, demonstrating that both humans and KART had higher agreement for jumps than sidesteps in the game play condition. The results of the study provide confidence that the Kinect sensor can be used to count the number of jumps and sidestep during five minutes of active video game play with a similar level of accuracy as human raters. However, in contrast to humans, the KART system required a fraction of the time to analyse and tabulate the results.
Video Game Training and the Reward System

Directory of Open Access Journals (Sweden)

Robert C. Lorenz

2015-02-01

Full Text Available Video games contain elaborate reinforcement and reward schedules that have the potential to maximize motivation. Neuroimaging studies suggest that video games might have an influence on the reward system. However, it is not clear whether reward-related properties represent a precondition, which biases an individual towards playing video games, or if these changes are the result of playing video games. Therefore, we conducted a longitudinal study to explore reward-related functional predictors in relation to video gaming experience as well as functional changes in the brain in response to video game training.Fifty healthy participants were randomly assigned to a video game training (TG or control group (CG. Before and after training/control period, functional magnetic resonance imaging (fMRI was conducted using a non-video game related reward task.At pretest, both groups showed strongest activation in ventral striatum (VS during reward anticipation. At posttest, the TG showed very similar VS activity compared to pretest. In the CG, the VS activity was significantly attenuated.This longitudinal study revealed that video game training may preserve reward responsiveness in the ventral striatum in a retest situation over time. We suggest that video games are able to keep striatal responses to reward flexible, a mechanism which might be of critical value for applications such as therapeutic cognitive training.

Acoustic Neuroma Educational Video

Medline Plus

Full Text Available ... 30041 770-205-8211 info@ANAUSA.org The world’s #1 acoustic neuroma resource Click to learn more... ... is acoustic neuroma? Diagnosing Symptoms Side Effects Keywords World Language Videos Questions to ask Choosing a healthcare ...
Integration of video and radiation analysis data

International Nuclear Information System (INIS)

Menlove, H.O.; Howell, J.A.; Rodriguez, C.A.; Eccleston, G.W.; Beddingfield, D.; Smith, J.E.; Baumgart, C.W.

1995-01-01

For the past several years, the integration of containment and surveillance (C/S) with nondestructive assay (NDA) sensors for monitoring the movement of nuclear material has focused on the hardware and communications protocols in the transmission network. Little progress has been made in methods to utilize the combined C/S and NDA data for safeguards and to reduce the inspector time spent in nuclear facilities. One of the fundamental problems in the integration of the combined data is that the two methods operate in different dimensions. The C/S video data is spatial in nature; whereas, the NDA sensors provide radiation levels versus time data. The authors have introduced a new method to integrate spatial (digital video) with time (radiation monitoring) information. This technology is based on pattern recognition by neural networks, provides significant capability to analyze complex data, and has the ability to learn and adapt to changing situations. This technique has the potential of significantly reducing the frequency of inspection visits to key facilities without a loss of safeguards effectiveness
Video-documentation: 'The Pannonic ozon project'

International Nuclear Information System (INIS)

Loibl, W.; Cabela, E.; Mayer, H. F.; Schmidt, M.

1998-07-01

Goal of the project was the production of a video film as documentation of the Pannonian Ozone Project- POP. The main part of the video describes the POP-model consisting of the modules meteorology, emissions and chemistry, developed during the POP-project. The model considers the European emission patterns of ozone precursors and the actual wind fields. It calculates ozone build up and depletion within air parcels due to emission and weather situation along trajectory routes. Actual ozone concentrations are calculated during model runs simulating the photochemical processes within air parcels moving along 4 day trajectories before reaching the Vienna region. The model computations were validated during extensive ground and aircraft-based measurements of ozone precursors and ozone concentration within the POP study area. Scenario computations were used to determine how much ozone can be reduced in north-eastern Austria by emissions control measures. The video lasts 12:20 minutes and consists of computer animations and life video scenes, presenting the ozone problem in general, the POP model and the model results. The video was produced in co-operation by the Austrian Research Center Seibersdorf - Department of Environmental Planning (ARCS) and Joanneum Research - Institute of Informationsystems (JR). ARCS was responsible for idea, concept, storyboard and text while JR was responsible for computer animation and general video production. The speaker text was written with scientific advice by the POP - project partners: Institute of Meteorology and Physics, University of Agricultural Sciences- Vienna, Environment Agency Austria - Air Quality Department, Austrian Research Center Seibersdorf- Environmental Planning Department/System Research Division. The film was produced as German and English version. (author)
Video based object representation and classification using multiple covariance matrices.

Science.gov (United States)

Zhang, Yurong; Liu, Quan

2017-01-01

Video based object recognition and classification has been widely studied in computer vision and image processing area. One main issue of this task is to develop an effective representation for video. This problem can generally be formulated as image set representation. In this paper, we present a new method called Multiple Covariance Discriminative Learning (MCDL) for image set representation and classification problem. The core idea of MCDL is to represent an image set using multiple covariance matrices with each covariance matrix representing one cluster of images. Firstly, we use the Nonnegative Matrix Factorization (NMF) method to do image clustering within each image set, and then adopt Covariance Discriminative Learning on each cluster (subset) of images. At last, we adopt KLDA and nearest neighborhood classification method for image set classification. Promising experimental results on several datasets show the effectiveness of our MCDL method.
Error Resilient Video Compression Using Behavior Models

Directory of Open Access Journals (Sweden)

Jacco R. Taal

2004-03-01

Full Text Available Wireless and Internet video applications are inherently subjected to bit errors and packet errors, respectively. This is especially so if constraints on the end-to-end compression and transmission latencies are imposed. Therefore, it is necessary to develop methods to optimize the video compression parameters and the rate allocation of these applications that take into account residual channel bit errors. In this paper, we study the behavior of a predictive (interframe video encoder and model the encoders behavior using only the statistics of the original input data and of the underlying channel prone to bit errors. The resulting data-driven behavior models are then used to carry out group-of-pictures partitioning and to control the rate of the video encoder in such a way that the overall quality of the decoded video with compression and channel errors is optimized.
State Recognition of High Voltage Isolation Switch Based on Background Difference and Iterative Search

Science.gov (United States)

Xu, Jiayuan; Yu, Chengtao; Bo, Bin; Xue, Yu; Xu, Changfu; Chaminda, P. R. Dushantha; Hu, Chengbo; Peng, Kai

2018-03-01

The automatic recognition of the high voltage isolation switch by remote video monitoring is an effective means to ensure the safety of the personnel and the equipment. The existing methods mainly include two ways: improving monitoring accuracy and adopting target detection technology through equipment transformation. Such a method is often applied to specific scenarios, with limited application scope and high cost. To solve this problem, a high voltage isolation switch state recognition method based on background difference and iterative search is proposed in this paper. The initial position of the switch is detected in real time through the background difference method. When the switch starts to open and close, the target tracking algorithm is used to track the motion trajectory of the switch. The opening and closing state of the switch is determined according to the angle variation of the switch tracking point and the center line. The effectiveness of the method is verified by experiments on different switched video frames of switching states. Compared with the traditional methods, this method is more robust and effective.
Figure text extraction in biomedical literature.

Directory of Open Access Journals (Sweden)

Daehyun Kim

2011-01-01

Full Text Available Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures.We first evaluated an off-the-shelf Optical Character Recognition (OCR tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons.The evaluation on 382 figures (9,643 figure texts in total randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for
A dynamic texture-based approach to recognition of facial actions and their temporal models.

Science.gov (United States)

Koelstra, Sander; Pantic, Maja; Patras, Ioannis

2010-11-01

In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set.
VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

Science.gov (United States)

Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... Resources Spread the Word Videos Homeless Resources Additional Information Make the Connection Get Help When To Call ... Suicide Spread the Word Videos Homeless Resources Additional Information Make the Connection Resource Locator If you or ...
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available ... videos from Veterans Health Administration Talking About It Matters see more videos from Veterans Health Administration Stand ... Health Administration I am A Veteran Family/Friend Active Duty/Reserve and Guard Signs of Crisis Identifying ...
Statistical conditional sampling for variable-resolution video compression.

Directory of Open Access Journals (Sweden)

Alexander Wong

Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.
Action Recognition in Semi-synthetic Images using Motion Primitives

DEFF Research Database (Denmark)

Fihl, Preben; Holte, Michael Boelstoft; Moeslund, Thomas B.

This technical report describes an action recognition approach based on motion primitives. A few characteristic time instances are found in a sequence containing an action and the action is classified from these instances. The characteristic instances are defined solely on the human motion, hence...... motion primitives. The motion primitives are extracted by double difference images and represented by four features. In each frame the primitive, if any, that best explains the observed data is identified. This leads to a discrete recognition problem since a video sequence will be converted into a string...... containing a sequence of symbols, each representing a primitive. After pruning the string a probabilistic Edit Distance classifier is applied to identify which action best describes the pruned string. The method is evaluated on five one-arm gestures. A test is performed with semi-synthetic input data...
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

KAUST Repository

Giancola, Silvio; Amine, Mohieddine; Dghaily, Tarek; Ghanem, Bernard

2018-01-01

In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\delta$ ranging from 5 to 60 seconds.
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

KAUST Repository

Giancola, Silvio

2018-04-12

In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\\\delta$ ranging from 5 to 60 seconds.
Drawing on Text Features for Reading Comprehension and Composing

Science.gov (United States)

Risko, Victoria J.; Walker-Dalhouse, Doris

2011-01-01

Students read multiple-genre texts such as graphic novels, poetry, brochures, digitized texts with videos, and informational and narrative texts. Features such as overlapping illustrations and implied cause-and-effect relationships can affect students' comprehension. Teaching with these texts and drawing attention to organizational features hold…
Verbal-Visual Intertextuality: How do Multisemiotic Texts Dialogue?

Directory of Open Access Journals (Sweden)

Leonardo Mozdzenski

2013-11-01

Full Text Available The objective of this work is to understand how multisemiotic texts interact with each other to produce meanings, observing the complex intertextual relations among genres from various artistic and/or audiovisual fields. Therefore, I initially present a brief review of the literature on intertextuality, critically discussing how leading scholars address this issue. Then I argue that it is necessary to understand intertextuality in an integral and non-discretized way through a typological continuum of relationships between verbal-visual texts. Thus, I develop a model for understanding this phenomenon by means of a graph in which two continua intertwine: the representation of intertextuality through form (Implicitness/ Explicitness and function (Approach/Distance of the quoted voice assumed in communicative situations. To test the model,four music video clips of American singer Madonna were selected so we can verify how music video texts rely on other texts to build their discourses and evoked identities.
Content-based video retrieval by example video clip

Science.gov (United States)

Dimitrova, Nevenka; Abdel-Mottaleb, Mohamed

1997-01-01

This paper presents a novel approach for video retrieval from a large archive of MPEG or Motion JPEG compressed video clips. We introduce a retrieval algorithm that takes a video clip as a query and searches the database for clips with similar contents. Video clips are characterized by a sequence of representative frame signatures, which are constructed from DC coefficients and motion information (`DC+M' signatures). The similarity between two video clips is determined by using their respective signatures. This method facilitates retrieval of clips for the purpose of video editing, broadcast news retrieval, or copyright violation detection.
A Super-resolution Reconstruction Algorithm for Surveillance Video

Directory of Open Access Journals (Sweden)

Jian Shao

2017-01-01

Full Text Available Recent technological developments have resulted in surveillance video becoming a primary method of preserving public security. Many city crimes are observed in surveillance video. The most abundant evidence collected by the police is also acquired through surveillance video sources. Surveillance video footage offers very strong support for solving criminal cases, therefore, creating an effective policy, and applying useful methods to the retrieval of additional evidence is becoming increasingly important. However, surveillance video has had its failings, namely, video footage being captured in low resolution (LR and bad visual quality. In this paper, we discuss the characteristics of surveillance video and describe the manual feature registration – maximum a posteriori – projection onto convex sets to develop a super-resolution reconstruction method, which improves the quality of surveillance video. From this method, we can make optimal use of information contained in the LR video image, but we can also control the image edge clearly as well as the convergence of the algorithm. Finally, we make a suggestion on how to adjust the algorithm adaptability by analyzing the prior information of target image.
A Novel High Efficiency Fractal Multiview Video Codec

Directory of Open Access Journals (Sweden)

Shiping Zhu

2015-01-01

Full Text Available Multiview video which is one of the main types of three-dimensional (3D video signals, captured by a set of video cameras from various viewpoints, has attracted much interest recently. Data compression for multiview video has become a major issue. In this paper, a novel high efficiency fractal multiview video codec is proposed. Firstly, intraframe algorithm based on the H.264/AVC intraprediction modes and combining fractal and motion compensation (CFMC algorithm in which range blocks are predicted by domain blocks in the previously decoded frame using translational motion with gray value transformation is proposed for compressing the anchor viewpoint video. Then temporal-spatial prediction structure and fast disparity estimation algorithm exploiting parallax distribution constraints are designed to compress the multiview video data. The proposed fractal multiview video codec can exploit temporal and spatial correlations adequately. Experimental results show that it can obtain about 0.36 dB increase in the decoding quality and 36.21% decrease in encoding bitrate compared with JMVC8.5, and the encoding time is saved by 95.71%. The rate-distortion comparisons with other multiview video coding methods also demonstrate the superiority of the proposed scheme.

Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos Recipes ...
Combating bad weather part I rain removal from video

CERN Document Server

Mukhopadhyay, Sudipta

2015-01-01

Current vision systems are designed to perform in normal weather condition. However, no one can escape from severe weather conditions. Bad weather reduces scene contrast and visibility, which results in degradation in the performance of various computer vision algorithms such as object tracking, segmentation and recognition. Thus, current vision systems must include some mechanisms that enable them to perform up to the mark in bad weather conditions such as rain and fog. Rain causes the spatial and temporal intensity variations in images or video frames. These intensity changes are due to the
Lexical Information in Memory for Text.

Science.gov (United States)

Hayes-Roth, Barbara

Cued-recall and two-alternative, forced-choice recognition measures were used to evaluate subjects' retention of the specific wordings of studied texts. Results obtained after 10-minute and 24 hour retention intervals suggest that the studied wordings of texts are functional components of their memory representations. Theories that assume…
Immersive video

Science.gov (United States)

Moezzi, Saied; Katkere, Arun L.; Jain, Ramesh C.

1996-03-01

Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. That is accomplished by assimilating important information from each video stream into a comprehensive, dynamic, 3D model of the environment. Using this 3D digital video, interactive viewers can then move around the remote environment and observe the events taking place from any desired perspective. Our Immersive Video System currently provides interactive viewing and `walkthrus' of staged karate demonstrations, basketball games, dance performances, and typical campus scenes. In its full realization, Immersive Video will be a paradigm shift in visual communication which will revolutionize television and video media, and become an integral part of future telepresence and virtual reality systems.
Methods and Algorithms for Detecting Objects in Video Files

Directory of Open Access Journals (Sweden)

Nguyen The Cuong

2018-01-01

Full Text Available Video files are files that store motion pictures and sounds like in real life. In today's world, the need for automated processing of information in video files is increasing. Automated processing of information has a wide range of application including office/home surveillance cameras, traffic control, sports applications, remote object detection, and others. In particular, detection and tracking of object movement in video file plays an important role. This article describes the methods of detecting objects in video files. Today, this problem in the field of computer vision is being studied worldwide.
A performance comparison of two emotion-recognition implementations using OpenCV and Cognitive Services API

Directory of Open Access Journals (Sweden)

Beltrán Prieto Luis Antonio

2017-01-01

Full Text Available Emotions represent feelings about people in several situations. Various machine learning algorithms have been developed for emotion detection in a multimedia element, such as an image or a video. These techniques can be measured by comparing their accuracy with a given dataset in order to determine which algorithm can be selected among others. This paper deals with the comparison of two implementations of emotion recognition in faces, each implemented with specific technology. OpenCV is an open-source library of functions and packages mostly used for computer-vision analysis and applications. Cognitive services is a set of APIs containing artificial intelligence algorithms for computer-vision, speech, knowledge, and language processing. Two Android mobile applications were developed in order to test the performance between an OpenCV algorithm for emotion recognition and an implementation of Emotion cognitive service. For this research, one thousand tests were carried out per experiment. Our findings show that the OpenCV implementation got a better performance than the Cognitive services application. In both cases, performance can be improved by increasing the sample size per emotion during the training step.
Veterans Crisis Line: Videos About Reaching out for Help

Medline Plus

Full Text Available Veterans Crisis Line Skip to Main Content SuicidePreventionLifeline.org Get Help Materials Get Involved Crisis Centers About Be There ... see more videos from Veterans Health Administration Veterans Crisis Line -- After the Call see more videos from ...
Videos, Podcasts and Livechats

Medline Plus

Full Text Available ... Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary Sign Up for ... Us Provider Directory What Is Palliative Care Definition Disease Types ...
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Health Growth & Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos ...
A Vehicle Steering Recognition System Based on Low-Cost Smartphone Sensors

Directory of Open Access Journals (Sweden)

Xinhua Liu

2017-03-01

Full Text Available Recognizing how a vehicle is steered and then alerting drivers in real time is of utmost importance to the vehicle and driver’s safety, since fatal accidents are often caused by dangerous vehicle maneuvers, such as rapid turns, fast lane-changes, etc. Existing solutions using video or in-vehicle sensors have been employed to identify dangerous vehicle maneuvers, but these methods are subject to the effects of the environmental elements or the hardware is very costly. In the mobile computing era, smartphones have become key tools to develop innovative mobile context-aware systems. In this paper, we present a recognition system for dangerous vehicle steering based on the low-cost sensors found in a smartphone: i.e., the gyroscope and the accelerometer. To identify vehicle steering maneuvers, we focus on the vehicle’s angular velocity, which is characterized by gyroscope data from a smartphone mounted in the vehicle. Three steering maneuvers including turns, lane-changes and U-turns are defined, and a vehicle angular velocity matching algorithm based on Fast Dynamic Time Warping (FastDTW is adopted to recognize the vehicle steering. The results of extensive experiments show that the average accuracy rate of the presented recognition reaches 95%, which implies that the proposed smartphone-based method is suitable for recognizing dangerous vehicle steering maneuvers.
Video demystified

CERN Document Server

Jack, Keith

2004-01-01

This international bestseller and essential reference is the "bible" for digital video engineers and programmers worldwide. This is by far the most informative analog and digital video reference available, includes the hottest new trends and cutting-edge developments in the field. Video Demystified, Fourth Edition is a "one stop" reference guide for the various digital video technologies. The fourth edition is completely updated with all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video (Video over DSL, Ethernet, etc.), as well as discussions of the latest standards throughout. The accompanying CD-ROM is updated to include a unique set of video test files in the newest formats. *This essential reference is the "bible" for digital video engineers and programmers worldwide *Contains all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video *Completely revised with all the latest and most up-to-date industry standards.
4K Video Traffic Prediction using Seasonal Autoregressive Modeling

Directory of Open Access Journals (Sweden)

D. R. Marković

2017-06-01

Full Text Available From the perspective of average viewer, high definition video streams such as HD (High Definition and UHD (Ultra HD are increasing their internet presence year over year. This is not surprising, having in mind expansion of HD streaming services, such as YouTube, Netflix etc. Therefore, high definition video streams are starting to challenge network resource allocation with their bandwidth requirements and statistical characteristics. Need for analysis and modeling of this demanding video traffic has essential importance for better quality of service and experience support. In this paper we use an easy-to-apply statistical model for prediction of 4K video traffic. Namely, seasonal autoregressive modeling is applied in prediction of 4K video traffic, encoded with HEVC (High Efficiency Video Coding. Analysis and modeling were performed within R programming environment using over 17.000 high definition video frames. It is shown that the proposed methodology provides good accuracy in high definition video traffic modeling.
Segmentation Based Video Steganalysis to Detect Motion Vector Modification

Directory of Open Access Journals (Sweden)

Peipei Wang

2017-01-01

Full Text Available This paper presents a steganalytic approach against video steganography which modifies motion vector (MV in content adaptive manner. Current video steganalytic schemes extract features from fixed-length frames of the whole video and do not take advantage of the content diversity. Consequently, the effectiveness of the steganalytic feature is influenced by video content and the problem of cover source mismatch also affects the steganalytic performance. The goal of this paper is to propose a steganalytic method which can suppress the differences of statistical characteristics caused by video content. The given video is segmented to subsequences according to block’s motion in every frame. The steganalytic features extracted from each category of subsequences with close motion intensity are used to build one classifier. The final steganalytic result can be obtained by fusing the results of weighted classifiers. The experimental results have demonstrated that our method can effectively improve the performance of video steganalysis, especially for videos of low bitrate and low embedding ratio.
Medical Named Entity Recognition for Indonesian Language Using Word Representations

Science.gov (United States)

Rahman, Arief

2018-03-01

Nowadays, Named Entity Recognition (NER) system is used in medical texts to obtain important medical information, like diseases, symptoms, and drugs. While most NER systems are applied to formal medical texts, informal ones like those from social media (also called semi-formal texts) are starting to get recognition as a gold mine for medical information. We propose a theoretical Named Entity Recognition (NER) model for semi-formal medical texts in our medical knowledge management system by comparing two kinds of word representations: cluster-based word representation and distributed representation.
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... a more active role in your care. The information in these videos should not take the place of any advice you ... Management for Rheumatoid Arthritis Patients Rehabilitation of Older Adult ...
A method of mobile video transmission based on J2ee

Science.gov (United States)

Guo, Jian-xin; Zhao, Ji-chun; Gong, Jing; Chun, Yang

2013-03-01

As 3G (3rd-generation) networks evolve worldwide, the rising demand for mobile video services and the enormous growth of video on the internet is creating major new revenue opportunities for mobile network operators and application developers. The text introduced a method of mobile video transmission based on J2ME, giving the method of video compressing, then describing the video compressing standard, and then describing the software design. The proposed mobile video method based on J2EE is a typical mobile multimedia application, which has a higher availability and a wide range of applications. The users can get the video through terminal devices such as phone.
The use of telehealth (text messaging and video communications) in patients with cystic fibrosis: A pilot study.

Science.gov (United States)

Gur, Michal; Nir, Vered; Teleshov, Anna; Bar-Yoseph, Ronen; Manor, Eynav; Diab, Gizelle; Bentur, Lea

2017-05-01

Background Poor communications between cystic fibrosis (CF) patients and health-care providers may result in gaps in knowledge and misconceptions about medication usage, and can lead to poor adherence. We aimed to assess the feasibility of using WhatsApp and Skype to improve communications. Methods This single-centre pilot study included CF patients who were older than eight years of age assigned to two groups: one without intervention (control group), and one with intervention. Each patient from the intervention group received Skype-based online video chats and WhatsApp messages from members of the multidisciplinary CF team. CF questionnaires, revised (CFQ-R) scores, knowledge and adherence based on CF My Way and patients satisfaction were evaluated before and after three months. Feasibility was assessed by session attendance, acceptability and satisfaction survey. Descriptive analysis and paired and non-paired t-tests were used as applicable. Results Eighteen patients were recruited to this feasibility study (nine in each group). Each intervention group participant had between four and six Skype video chats and received 22-45 WhatsApp messages. In this small study, CFQ-R scores, knowledge, adherence and patient satisfaction were similar in both groups before and after the three-month intervention. Conclusions A telehealth-based approach, using Skype video chats and WhatsApp messages, was feasible and acceptable in this pilot study. A larger and longer multi-centre study is warranted to examine the efficacy of these interventions to improve knowledge, adherence and communication.
Video pedagogy

OpenAIRE

Länsitie, Janne; Stevenson, Blair; Männistö, Riku; Karjalainen, Tommi; Karjalainen, Asko

2016-01-01

The short film is an introduction to the concept of video pedagogy. The five categories of video pedagogy further elaborate how videos can be used as a part of instruction and learning process. Most pedagogical videos represent more than one category. A video itself doesn’t necessarily define the category – the ways in which the video is used as a part of pedagogical script are more defining factors. What five categories did you find? Did you agree with the categories, or are more...
DeepGait: A Learning Deep Convolutional Representation for View-Invariant Gait Recognition Using Joint Bayesian

Directory of Open Access Journals (Sweden)

Chao Li

2017-02-01

Full Text Available Human gait, as a soft biometric, helps to recognize people through their walking. To further improve the recognition performance, we propose a novel video sensor-based gait representation, DeepGait, using deep convolutional features and introduce Joint Bayesian to model view variance. DeepGait is generated by using a pre-trained “very deep” network “D-Net” (VGG-D without any fine-tuning. For non-view setting, DeepGait outperforms hand-crafted representations (e.g., Gait Energy Image, Frequency-Domain Feature and Gait Flow Image, etc.. Furthermore, for cross-view setting, 256-dimensional DeepGait after PCA significantly outperforms the state-of-the-art methods on the OU-ISR large population (OULP dataset. The OULP dataset, which includes 4007 subjects, makes our result reliable in a statistically reliable way.
Knowledge-based approach to video content classification

Science.gov (United States)

Chen, Yu; Wong, Edward K.

2001-01-01

A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.

Fostering science communication and outreach through video production in Dartmouth's IGERT Polar Environmental Change graduate program

Science.gov (United States)

Hammond Wagner, C. R.; McDavid, L. A.; Virginia, R. A.

2013-12-01

Dartmouth's NSF-supported IGERT Polar Environmental Change graduate program has focused on using video media to foster interdisciplinary thinking and to improve student skills in science communication and public outreach. Researchers, educators, and funding organizations alike recognize the value of video media for making research results more accessible and relevant to diverse audiences and across cultures. We present an affordable equipment set and the basic video training needed as well as available Dartmouth institutional support systems for students to produce outreach videos on climate change and its associated impacts on people. We highlight and discuss the successes and challenges of producing three types of video products created by graduate and undergraduate students affiliated with the Dartmouth IGERT. The video projects created include 1) graduate student profile videos, 2) a series of short student-created educational videos for Greenlandic high school students, and 3) an outreach video about women in science based on the experiences of women students conducting research during the IGERT field seminar at Summit Station and Kangerlussuaq, Greenland. The 'Science in Greenland--It's a Girl Thing' video was featured on The New York Times Dot Earth blog and the Huffington Post Green blog among others and received international recognition. While producing these videos, students 1) identified an audience and created story lines, 2) worked in front of and behind the camera, 3) utilized low-cost digital editing applications, and 4) shared the videos on multiple platforms from social media to live presentations. The three video projects were designed to reach different audiences, and presented unique challenges for content presentation and dissemination. Based on student and faculty assessment, we conclude that the video projects improved student science communication skills and increased public knowledge of polar science and the effects of climate change.
Teaching pain recognition through art: the Ramsay-Caravaggio sedation scale.

Science.gov (United States)

Poropat, Federico; Cozzi, Giorgio; Magnolato, Andrea; Monasta, Lorenzo; Borrometi, Fabio; Krauss, Baruch; Ventura, Alessandro; Barbi, Egidio

2018-01-31

Clinical observation is a key component of medical ability, enabling immediate evaluation of the patient's emotional state and contributing to a clinical clue that leads to final decision making. In medical schools, the art of learning to look can be taught using medical humanities and especially visual arts. By presenting a Ramsay sedation score (RSS) integrated with Caravaggio's paintings during a procedural sedation conference for pediatric residents, we want to test the effectiveness of this approach to improve the quality of learning. In this preliminary study, we presented videos showing sedated pediatric patients in the setting of a procedural sedation lesson to two randomized groups of residents, one attending a lesson on RSS explained through the masterpieces of Caravaggio, the other without artistic support. A week later we tested their learning with ten multi-choice questions focused on theoretical questions about sedation monitoring and ten more questions focused on recognizing the appropriate RSS viewing the videos. The primary outcome was the comparison of the total number of RSS layers properly recognized in both groups. We also evaluated the appreciation of the residents of the use of works of art integrated with the lesson. Eleven students were randomized to each group. Two residents in the standard lesson did not attend the test. The percentage of correct answers on the theoretical part was similar, 82% in the art group and 89% in the other (p > 0.05). No difference was found in the video recognition part of the RSS recognition test. Residents exposed to paintings shown great appreciation for the integration of the lesson with the Caravaggio's masterpieces. Adding artwork to a standard medical conference does not improve the performance of student tests, although this approach has been greatly appreciated by residents.
Affective State Level Recognition in Naturalistic Facial and Vocal Expressions.

Science.gov (United States)

Meng, Hongying; Bianchi-Berthouze, Nadia

2014-03-01

Naturalistic affective expressions change at a rate much slower than the typical rate at which video or audio is recorded. This increases the probability that consecutive recorded instants of expressions represent the same affective content. In this paper, we exploit such a relationship to improve the recognition performance of continuous naturalistic affective expressions. Using datasets of naturalistic affective expressions (AVEC 2011 audio and video dataset, PAINFUL video dataset) continuously labeled over time and over different dimensions, we analyze the transitions between levels of those dimensions (e.g., transitions in pain intensity level). We use an information theory approach to show that the transitions occur very slowly and hence suggest modeling them as first-order Markov models. The dimension levels are considered to be the hidden states in the Hidden Markov Model (HMM) framework. Their discrete transition and emission matrices are trained by using the labels provided with the training set. The recognition problem is converted into a best path-finding problem to obtain the best hidden states sequence in HMMs. This is a key difference from previous use of HMMs as classifiers. Modeling of the transitions between dimension levels is integrated in a multistage approach, where the first level performs a mapping between the affective expression features and a soft decision value (e.g., an affective dimension level), and further classification stages are modeled as HMMs that refine that mapping by taking into account the temporal relationships between the output decision labels. The experimental results for each of the unimodal datasets show overall performance to be significantly above that of a standard classification system that does not take into account temporal relationships. In particular, the results on the AVEC 2011 audio dataset outperform all other systems presented at the international competition.
Flexible frontiers for text division into rows

Directory of Open Access Journals (Sweden)

Dan L. Lacrămă

2009-01-01

Full Text Available This paper presents an original solution for flexible hand-written text division into rows. Unlike the standard procedure, the proposed method avoids the isolated characters extensions amputation and reduces the recognition error rate in the final stage.
Can You See Me Now Visualizing Battlefield Facial Recognition Technology in 2035

Science.gov (United States)

2010-04-01

this analogy: Assume that a normal individual, Tom, is very good at identifying different types of fruit juice such as orange juice , apple juice ...either compositing multiple images together to produce a more complete image or by creating a new algorithm to better deal with these problems...captures multiple frames of video and composites them into an appropriately high-resolution image that can be processed by the facial recognition software
Special Needs: Planning for Adulthood (Videos)

Medline Plus

Full Text Available ... Parents site Sitio para padres General Health Growth & Development Infections Diseases ... Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: ...
Spatial Pyramid Covariance based Compact Video Code for Robust Face Retrieval in TV-series.

Science.gov (United States)

Li, Yan; Wang, Ruiping; Cui, Zhen; Shan, Shiguang; Chen, Xilin

2016-10-10

We address the problem of face video retrieval in TV-series which searches video clips based on the presence of specific character, given one face track of his/her. This is tremendously challenging because on one hand, faces in TV-series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named Compact Video Code (CVC). Our method first models the face track by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature suitable for retrieval, the high-dimensional covariance representation is further encoded as a much lower-dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Besides, we further extend the descriptive granularity of covariance matrix from traditional pixel-level to more general patchlevel, and proceed to propose a novel hierarchical video representation named Spatial Pyramid Covariance (SPC) along with a fast calculation method. Face retrieval experiments on two challenging TV-series video databases, i.e., the Big Bang Theory and Prison Break, demonstrate the competitiveness of the proposed CVC over state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.
The Effects of a Web-Based Vocabulary Development Tool on Student Reading Comprehension of Science Texts

Directory of Open Access Journals (Sweden)

Karen Thompson

2012-10-01

Full Text Available The complexities of reading comprehension have received increasing recognition in recent years. In this realm, the power of vocabulary in predicting cognitive challenges in phonological, orthographic, and semantic processes is well documented. In this study, we present a web-based vocabulary development tool that has a series of interactive displays, including a list of the 50 most frequent words in a particular text, Google image and video results for any combination of those words, definitions, and synonyms for particular words from the text, and a list of sentences from the text in which particular words appear. Additionally, we report the results of an experiment that was performed working collaboratively with middle school science teachers from a large urban district in the United States. While this experiment did not show a significant positive effect of this tool on reading comprehension in science, we did find that girls seem to score worse on a reading comprehension assessment after using our web-based tool. This result could reflect prior research that suggests that some girls tend to have a negative attitude towards technology due to gender stereotypes that give girls the impression that they are not as good as boys in working with computers.
Automated music selection of video ads

Directory of Open Access Journals (Sweden)

Wiesener Oliver

2017-07-01

Full Text Available The importance of video ads on social media platforms can be measured by views. For instance, Samsung’s commercial ad for one of its new smartphones reached more than 46 million viewers at Youtube. A video ad addresses the visual as well as the auditive sense of users. Often the visual sense is busy in the sense that users focus other screens than the screen with the video ad. This is called the second screen syndrome. Therefore, the importance of the audio channel seems to grow. To get back the visual attention of users that are deflected from other visual impulses it appears reasonable to adapt the music to the target group. Additionally, it appears useful to adapt the music to content of the video. Thus, the overall success of a video ad could by increased by increasing the attention of the users. Humans typically make the decision about the music of a video ad. If there is a correlation between music, products and target groups, a digitization of the music selection process seems to be possible. Since the digitization progress in the music sector is mainly focused on music composing this article strives for making a first step towards the digitization of the music selection.
MPEG-2 Compressed-Domain Algorithms for Video Analysis

Directory of Open Access Journals (Sweden)

Hesseler Wolfgang

2006-01-01

Full Text Available This paper presents new algorithms for extracting metadata from video sequences in the MPEG-2 compressed domain. Three algorithms for efficient low-level metadata extraction in preprocessing stages are described. The first algorithm detects camera motion using the motion vector field of an MPEG-2 video. The second method extends the idea of motion detection to a limited region of interest, yielding an efficient algorithm to track objects inside video sequences. The third algorithm performs a cut detection using macroblock types and motion vectors.
Comparison of eye imaging pattern recognition using neural network

Science.gov (United States)

Bukhari, W. M.; Syed A., M.; Nasir, M. N. M.; Sulaima, M. F.; Yahaya, M. S.

2015-05-01

The beauty of eye recognition system that it is used in automatic identifying and verifies a human weather from digital images or video source. There are various behaviors of the eye such as the color of the iris, size of pupil and shape of the eye. This study represents the analysis, design and implementation of a system for recognition of eye imaging. All the eye images that had been captured from the webcam in RGB format must through several techniques before it can be input for the pattern and recognition processes. The result shows that the final value of weight and bias after complete training 6 eye images for one subject is memorized by the neural network system and be the reference value of the weight and bias for the testing part. The target classifies to 5 different types for 5 subjects. The eye images can recognize the subject based on the target that had been set earlier during the training process. When the values between new eye image and the eye image in the database are almost equal, it is considered the eye image is matched.
Content-Aware Video Adaptation under Low-Bitrate Constraint

Directory of Open Access Journals (Sweden)

Hsiao Ming-Ho

2007-01-01

Full Text Available With the development of wireless network and the improvement of mobile device capability, video streaming is more and more widespread in such an environment. Under the condition of limited resource and inherent constraints, appropriate video adaptations have become one of the most important and challenging issues in wireless multimedia applications. In this paper, we propose a novel content-aware video adaptation in order to effectively utilize resource and improve visual perceptual quality. First, the attention model is derived from analyzing the characteristics of brightness, location, motion vector, and energy features in compressed domain to reduce computation complexity. Then, through the integration of attention model, capability of client device and correlational statistic model, attractive regions of video scenes are derived. The information object- (IOB- weighted rate distortion model is used for adjusting the bit allocation. Finally, the video adaptation scheme dynamically adjusts video bitstream in frame level and object level. Experimental results validate that the proposed scheme achieves better visual quality effectively and efficiently.
Machine Translation from Text

Science.gov (United States)

Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.
Turning Video Resource Management into Cloud Computing

Directory of Open Access Journals (Sweden)

Weili Kou

2016-07-01

Full Text Available Big data makes cloud computing more and more popular in various fields. Video resources are very useful and important to education, security monitoring, and so on. However, issues of their huge volumes, complex data types, inefficient processing performance, weak security, and long times for loading pose challenges in video resource management. The Hadoop Distributed File System (HDFS is an open-source framework, which can provide cloud-based platforms and presents an opportunity for solving these problems. This paper presents video resource management architecture based on HDFS to provide a uniform framework and a five-layer model for standardizing the current various algorithms and applications. The architecture, basic model, and key algorithms are designed for turning video resources into a cloud computing environment. The design was tested by establishing a simulation system prototype.
Glyph Identification and Character Recognition for Sindhi OCR

Directory of Open Access Journals (Sweden)

NISAR AHMEDMEMON

2017-10-01

Full Text Available A computer can read and write multiple languages and today?s computers are capable of understanding various human languages. A computer can be given instructions through various input methods but OCR (Optical Character Recognition and handwritten character recognition are the input methods in which a scanned page containing text is converted into written or editable text. The change in language text available on scanned page demands different algorithm to recognize text because every language and script pose varying number of challenges to recognize text. The Latin language recognition pose less difficulties compared to Arabic script and languages that use Arabic script for writing and OCR systems for these Latin languages are near to perfection. Very little work has been done on regional languages of Pakistan. In this paper the Sindhi glyphs are identified and the number of characters and connected components are identified for this regional language of Pakistan. A graphical user interface has been created to perform identification task for glyphs and characters of Sindhi language. The glyphs of characters are successfully identified from scanned page and this information can be used to recognize characters. The language glyph identification can be used to apply suitable algorithm to identify language as well as to achieve a higher recognition rate.
Video microblogging

DEFF Research Database (Denmark)

Bornoe, Nis; Barkhuus, Louise

2010-01-01

Microblogging is a recently popular phenomenon and with the increasing trend for video cameras to be built into mobile phones, a new type of microblogging has entered the arena of electronic communication: video microblogging. In this study we examine video microblogging, which is the broadcasting...... of short videos. A series of semi-structured interviews offers an understanding of why and how video microblogging is used and what the users post and broadcast....
PERANCANGAN VIDEO PANDUAN FITNES SEBAGAI MEDIA PEMBELAJARAN

Directory of Open Access Journals (Sweden)

Rizkysari Meimaharani

2013-06-01

Full Text Available ABSTRACT Designing fitness exercise tutorial level beginner as learning and promotion media for life gym was designed to provide guidelines of good movement in the fitness training sessions for beginners, especially the gym because life member will be distributed free of charge for new members sign up. For the process of editing video tutorial software and hardware needed adequate for smooth production. The results also depend on the ability of either constituent knowledge of a general nature and especially directing, editing, creativity, and the ability of hardware, software and technology / computer. Excess video guide allows members to understand the movement is good and right to avoid unwanted injury. Not only guides the movement are presented in this video project but also the member is given petuntuk diet and proper diet for target practice can be easily achieved. Excess video guide allows members to understand the movement is good and right to avoid unwanted injury. Not only guides the movement are presented in this video project but also the member is given guide of diet and proper diet for target practice can be easily achieved. The presence of video editing technology offers convenience to an agency to educate the public through video learning and served as media promotion of a service or related agency theme of the video.
The Aesthetics of the Ambient Video Experience

Directory of Open Access Journals (Sweden)

Jim Bizzocchi

2008-01-01

Full Text Available Ambient Video is an emergent cultural phenomenon, with roots that go deeply into the history of experimental film and video art. Ambient Video, like Brian Eno's ambient music, is video that "must be as easy to ignore as notice" [9]. This minimalist description conceals the formidable aesthetic challenge that faces this new form. Ambient video art works will hang on the walls of our living rooms, corporate offices, and public spaces. They will play in the background of our lives, living video paintings framed by the new generation of elegant, high-resolution flat-panel display units. However, they cannot command attention like a film or television show. They will patiently play in the background of our lives, yet they must always be ready to justify our attention in any given moment. In this capacity, ambient video works need to be equally proficient at rewarding a fleeting glance, a more direct look, or a longer contemplative gaze. This paper connects a series of threads that collectively illuminate the aesthetics of this emergent form: its history as a popular culture phenomenon, its more substantive artistic roots in avant-garde cinema and video art, its relationship to new technologies, the analysis of the viewer's conditions of reception, and the work of current artists who practice within this form.
Proposal for best practice in the use of video-EEG when psychogenic non-epileptic seizures are a possible diagnosis

Directory of Open Access Journals (Sweden)

Kimberley Whitehead

Full Text Available The gold-standard for the diagnosis of psychogenic non-epileptic seizures (PNES is capturing an attack with typical semiology and lack of epileptic ictal discharges on video-EEG. Despite the importance of this diagnostic test, lack of standardisation has resulted in a wide variety of protocols and reporting practices. The goal of this review is to provide an overview of research findings on the diagnostic video-EEG procedure, in both the adult and paediatric literature. We discuss how uncertainties about the ethical use of suggestion can be resolved, and consider what constitutes best clinical practice. We stress the importance of ictal observation and assessment and consider how diagnostically useful information is best obtained. We also discuss the optimal format of video-EEG reports; and of highlighting features with high sensitivity and specificity to reduce the risk of miscommunication. We suggest that over-interpretation of the interictal EEG, and the failure to recognise differences between typical epileptic and nonepileptic seizure manifestations are the greatest pitfalls in neurophysiological assessment of patients with PNES. Meanwhile, under-recognition of semiological pointers towards frontal lobe seizures and of the absence of epileptiform ictal EEG patterns during some epileptic seizure types (especially some seizures not associated with loss of awareness, may lead to erroneous PNES diagnoses. We propose that a standardised approach to the video-EEG examination and the subsequent written report will facilitate a clear communication of its import, improving diagnostic certainty and thereby promoting appropriate patient management. Keywords: Psychogenic nonepileptic seizures, Nonepileptic attack disorder, Suggestion, EEG
Action recognition using mined hierarchical compound features.

Science.gov (United States)

Gilbert, Andrew; Illingworth, John; Bowden, Richard

2011-05-01

The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical

Adaboost-based algorithm for human action recognition

KAUST Repository

Zerrouki, Nabil

2017-11-28

This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of each class. Then, we apply the Adaboost algorithm for the classification process. We assessed the proposed approach using the UR Fall Detection dataset. In this study six classes of activities are considered namely: walking, standing, bending, lying, squatting, and sitting. Results demonstrate the efficiency of the proposed methodology.
Adaboost-based algorithm for human action recognition

KAUST Repository

Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane

2017-01-01

This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of each class. Then, we apply the Adaboost algorithm for the classification process. We assessed the proposed approach using the UR Fall Detection dataset. In this study six classes of activities are considered namely: walking, standing, bending, lying, squatting, and sitting. Results demonstrate the efficiency of the proposed methodology.
Continuity-Aware Scheduling Algorithm for Scalable Video Streaming

Directory of Open Access Journals (Sweden)

Atinat Palawan

2016-05-01

Full Text Available The consumer demand for retrieving and delivering visual content through consumer electronic devices has increased rapidly in recent years. The quality of video in packet networks is susceptible to certain traffic characteristics: average bandwidth availability, loss, delay and delay variation (jitter. This paper presents a scheduling algorithm that modifies the stream of scalable video to combat jitter. The algorithm provides unequal look-ahead by safeguarding the base layer (without the need for overhead of the scalable video. The results of the experiments show that our scheduling algorithm reduces the number of frames with a violated deadline and significantly improves the continuity of the video stream without compromising the average Y Peek Signal-to-Noise Ratio (PSNR.
Dynamic Programming Algorithms in Speech Recognition

Directory of Open Access Journals (Sweden)

Titus Felix FURTUNA

2008-01-01

Full Text Available In a system of speech recognition containing words, the recognition requires the comparison between the entry signal of the word and the various words of the dictionary. The problem can be solved efficiently by a dynamic comparison algorithm whose goal is to put in optimal correspondence the temporal scales of the two words. An algorithm of this type is Dynamic Time Warping. This paper presents two alternatives for implementation of the algorithm designed for recognition of the isolated words.
Fusing Facial Features for Face Recognition

Directory of Open Access Journals (Sweden)

Jamal Ahmad Dargham

2012-06-01

Full Text Available Face recognition is an important biometric method because of its potential applications in many fields, such as access control, surveillance, and human-computer interaction. In this paper, a face recognition system that fuses the outputs of three face recognition systems based on Gabor jets is presented. The first system uses the magnitude, the second uses the phase, and the third uses the phase-weighted magnitude of the jets. The jets are generated from facial landmarks selected using three selection methods. It was found out that fusing the facial features gives better recognition rate than either facial feature used individually regardless of the landmark selection method.
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

Science.gov (United States)

Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

2012-10-01

In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... to take a more active role in your care. The information in these videos should not take ... She is a critical member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing ...
Video games as a complementary therapy tool in mental disorders: PlayMancer, a European multicentre study.

Science.gov (United States)

Fernández-Aranda, Fernando; Jiménez-Murcia, Susana; Santamaría, Juan J; Gunnard, Katarina; Soto, Antonio; Kalapanidas, Elias; Bults, Richard G A; Davarakis, Costas; Ganchev, Todor; Granero, Roser; Konstantas, Dimitri; Kostoulas, Theodoros P; Lam, Tony; Lucas, Mikkel; Masuet-Aumatell, Cristina; Moussa, Maher H; Nielsen, Jeppe; Penelo, Eva

2012-08-01

Previous review studies have suggested that computer games can serve as an alternative or additional form of treatment in several areas (schizophrenia, asthma or motor rehabilitation). Although several naturalistic studies have been conducted showing the usefulness of serious video games in the treatment of some abnormal behaviours, there is a lack of serious games specially designed for treating mental disorders. The purpose of our project was to develop and evaluate a serious video game designed to remediate attitudinal, behavioural and emotional processes of patients with impulse-related disorders. The video game was created and developed within the European research project PlayMancer. It aims to prove potential capacity to change underlying attitudinal, behavioural and emotional processes of patients with impulse-related disorders. New interaction modes were provided by newly developed components, such as emotion recognition from speech, face and physiological reactions, while specific impulsive reactions were elicited. The video game uses biofeedback for helping patients to learn relaxation skills, acquire better self-control strategies and develop new emotional regulation strategies. In this article, we present a description of the video game used, rationale, user requirements, usability and preliminary data, in several mental disorders.
The effects of scene characteristics, resolution, and compression on the ability to recognize objects in video

Science.gov (United States)

Dumke, Joel; Ford, Carolyn G.; Stange, Irena W.

2011-03-01

Public safety practitioners increasingly use video for object recognition tasks. These end users need guidance regarding how to identify the level of video quality necessary for their application. The quality of video used in public safety applications must be evaluated in terms of its usability for specific tasks performed by the end user. The Public Safety Communication Research (PSCR) project performed a subjective test as one of the first in a series to explore visual intelligibility in video-a user's ability to recognize an object in a video stream given various conditions. The test sought to measure the effects on visual intelligibility of three scene parameters (target size, scene motion, scene lighting), several compression rates, and two resolutions (VGA (640x480) and CIF (352x288)). Seven similarly sized objects were used as targets in nine sets of near-identical source scenes, where each set was created using a different combination of the parameters under study. Viewers were asked to identify the objects via multiple choice questions. Objective measurements were performed on each of the scenes, and the ability of the measurement to predict visual intelligibility was studied.
Watching video games. Playing with Archaeology and Prehistory

Directory of Open Access Journals (Sweden)

Daniel García Raso

2016-12-01

Full Text Available Video games have become a mass culture phenomenon typical of the West Post-Industrial Society as well as an avant-garde narrative medium. The main focus of this paper is to explore and analyze the public image of Archaeology and Prehistory spread by video games and how we can achieve a virtual faithful image of both. Likewise, we are going to proceed to construct an archaeological outline of video games, understanding them as an element of the Contemporary Material Culture and, therefore, subject to being studied by Archaeology.
An Analysis of Video Navigation Behavior for Web Leisure

Directory of Open Access Journals (Sweden)

Ying-Han Chang

2012-12-01

Full Text Available People nowadays put much emphasis on leisure activities, and web video has gradually become one of the main sources for popular leisure. This article introduces the related concepts of leisure and navigation behavior as well as some recent research topics. Moreover, using YouTube as an experimental setting, the authors invited some experienced web video users and conducted an empirical study on their navigating the web videos for leisure purpose. The study used questionnaires, navigation logs, diaries, and interviews to collect data. Major results show: the subjects watched a variety of video content on the web either from traditional media or user-generated video; these videos can meet their leisure needs of both the broad and personal interests; during the navigation process, each subject quite focuses on video leisure, and is willingly to explore unknown videos; however, within a limited amount of time for leisure, a balance between leisure and rest becomes an issue of achieving real relaxation, which is worth of further attention. [Article content in Chinese
Modality of Input and Vocabulary Acquisition

Directory of Open Access Journals (Sweden)

Tetyana Sydorenko

2010-06-01

Full Text Available This study examines the effect of input modality (video, audio, and captions, i.e., on-screen text in the same language as audio on (a the learning of written and aural word forms, (b overall vocabulary gains, (c attention to input, and (d vocabulary learning strategies of beginning L2 learners. Twenty-six second-semester learners of Russian participated in this study. Group one (N = 8 saw video with audio and captions (VAC; group two (N = 9 saw video with audio (VA; group three (N = 9 saw video with captions (VC. All participants completed written and aural vocabulary tests and a final questionnaire.The results indicate that groups with captions (VAC and VC scored higher on written than on aural recognition of word forms, while the reverse applied to the VA group. The VAC group learned more word meanings than the VA group. Results from the questionnaire suggest that learners paid most attention to captions, followed by video and audio, and acquired most words by associating them with visual images. Pedagogical implications of this study are that captioned video tends to aid recognition of written word forms and the learning of word meaning, while non-captioned video tends to improve listening comprehension as it facilitates recognition of aural word forms.
The impact of inverted text on visual word processing: An fMRI study.

Science.gov (United States)

Sussman, Bethany L; Reddigari, Samir; Newman, Sharlene D

2018-06-01

Visual word recognition has been studied for decades. One question that has received limited attention is how different text presentation orientations disrupt word recognition. By examining how word recognition processes may be disrupted by different text orientations it is hoped that new insights can be gained concerning the process. Here, we examined the impact of rotating and inverting text on the neural network responsible for visual word recognition focusing primarily on a region of the occipto-temporal cortex referred to as the visual word form area (VWFA). A lexical decision task was employed in which words and pseudowords were presented in one of three orientations (upright, rotated or inverted). The results demonstrate that inversion caused the greatest disruption of visual word recognition processes. Both rotated and inverted text elicited increased activation in spatial attention regions within the right parietal cortex. However, inverted text recruited phonological and articulatory processing regions within the left inferior frontal and left inferior parietal cortices. Finally, the VWFA was found to not behave similarly to the fusiform face area in that unusual text orientations resulted in increased activation and not decreased activation. It is hypothesized here that the VWFA activation is modulated by feedback from linguistic processes. Copyright © 2018 Elsevier Inc. All rights reserved.
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... the special health problems and requirements of the blind.” News & Events Events Calendar NEI Press Releases News ... Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract Convergence Insufficiency Diabetic Eye Disease Dilated Eye ...
Rheumatoid Arthritis Educational Video Series

Medline Plus

Full Text Available ... and what other conditions are associated with RA. Learning more about your condition will allow you to ... Arthritis Educational Video Series Psoriatic Arthritis 101 2010 E.S.C.A.P.E. Study Patient Update Transitioning ...
NEI You Tube Videos: Amblyopia

Medline Plus

Full Text Available ... and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision ... DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> ...
Research on Construction of Road Network Database Based on Video Retrieval Technology

Directory of Open Access Journals (Sweden)

Wang Fengling

2017-01-01

Full Text Available Based on the characteristics of the video database and the basic structure of the video database and several typical video data models, the segmentation-based multi-level data model is used to describe the landscape information video database, the network database model and the road network management database system. Landscape information management system detailed design and implementation of a detailed preparation.
Multimodal Semantics Extraction from User-Generated Videos

Directory of Open Access Journals (Sweden)

Francesco Cricri

2012-01-01

Full Text Available User-generated video content has grown tremendously fast to the point of outpacing professional content creation. In this work we develop methods that analyze contextual information of multiple user-generated videos in order to obtain semantic information about public happenings (e.g., sport and live music events being recorded in these videos. One of the key contributions of this work is a joint utilization of different data modalities, including such captured by auxiliary sensors during the video recording performed by each user. In particular, we analyze GPS data, magnetometer data, accelerometer data, video- and audio-content data. We use these data modalities to infer information about the event being recorded, in terms of layout (e.g., stadium, genre, indoor versus outdoor scene, and the main area of interest of the event. Furthermore we propose a method that automatically identifies the optimal set of cameras to be used in a multicamera video production. Finally, we detect the camera users which fall within the field of view of other cameras recording at the same public happening. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real sport events and live music performances.
Online handwritten mathematical expression recognition

Science.gov (United States)

Büyükbayrak, Hakan; Yanikoglu, Berrin; Erçil, Aytül

2007-01-01

We describe a system for recognizing online, handwritten mathematical expressions. The system is designed with a user-interface for writing scientific articles, supporting the recognition of basic mathematical expressions as well as integrals, summations, matrices etc. A feed-forward neural network recognizes symbols which are assumed to be single-stroke and a recursive algorithm parses the expression by combining neural network output and the structure of the expression. Preliminary results show that writer-dependent recognition rates are very high (99.8%) while writer-independent symbol recognition rates are lower (75%). The interface associated with the proposed system integrates the built-in recognition capabilities of the Microsoft's Tablet PC API for recognizing textual input and supports conversion of hand-drawn figures into PNG format. This enables the user to enter text, mathematics and draw figures in a single interface. After recognition, all output is combined into one LATEX code and compiled into a PDF file.
Quality of Experience Assessment of Video Quality in Social Clouds

Directory of Open Access Journals (Sweden)

Asif Ali Laghari

2017-01-01

Full Text Available Video sharing on social clouds is popular among the users around the world. High-Definition (HD videos have big file size so the storing in cloud storage and streaming of videos with high quality from cloud to the client are a big problem for service providers. Social clouds compress the videos to save storage and stream over slow networks to provide quality of service (QoS. Compression of video decreases the quality compared to original video and parameters are changed during the online play as well as after download. Degradation of video quality due to compression decreases the quality of experience (QoE level of end users. To assess the QoE of video compression, we conducted subjective (QoE experiments by uploading, sharing, and playing videos from social clouds. Three popular social clouds, Facebook, Tumblr, and Twitter, were selected to upload and play videos online for users. The QoE was recorded by using questionnaire given to users to provide their experience about the video quality they perceive. Results show that Facebook and Twitter compressed HD videos more as compared to other clouds. However, Facebook gives a better quality of compressed videos compared to Twitter. Therefore, users assigned low ratings for Twitter for online video quality compared to Tumblr that provided high-quality online play of videos with less compression.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.