Tian, Shu; Yin, Xu-Cheng; Su, Ya; Hao, Hong-Wei
Video text extraction plays an important role for multimedia understanding and retrieval. Most previous research efforts are conducted within individual frames. A few of recent methods, which pay attention to text tracking using multiple frames, however, do not effectively mine the relations among text detection, tracking and recognition. In this paper, we propose a generic Bayesian-based framework of Tracking based Text Detection And Recognition (T2DAR) from web videos for embedded captions, which is composed of three major components, i.e., text tracking, tracking based text detection, and tracking based text recognition. In this unified framework, text tracking is first conducted by tracking-by-detection. Tracking trajectories are then revised and refined with detection or recognition results. Text detection or recognition is finally improved with multi-frame integration. Moreover, a challenging video text (embedded caption text) database (USTB-VidTEXT) is constructed and publicly available. A variety of experiments on this dataset verify that our proposed approach largely improves the performance of text detection and recognition from web videos.
Ramesh Mahadev Kagalkar
Full Text Available Sign language recognition has emerged in concert of the vital space of analysis in computer Vision. The problem long-faced by the researchers is that the instances of signs vary with each motion and look. Thus, during this paper a completely unique approach for recognizing varied alphabets of Kannada linguistic communication is projected wherever continuous video sequences of the signs are thought of. The system includes of three stages: Preprocessing stage, Feature Extraction and Classification. Preprocessing stage includes skin filtering, bar histogram matching. Eigen values and Eigen Vectors were thought of for feature extraction stage and at last Eigen value weighted Euclidean distance is employed to acknowledge the sign. It deals with vacant hands, so permitting the user to act with the system in natural manner. We have got thought of completely different alphabets within the video sequences and earned a hit rate of 95.25%.
Satish S Hiremath
Full Text Available An important task in content based video indexing is to extract text information from videos. The challenges involved in text extraction and recognition are variation of illumination on each video frame with text, the text present on the complex background and different font size of the text. Using various image processing algorithms like morphological operations, blob detection and histogram of oriented gradients the character recognition of video subtitles is implemented. Segmentation, feature extraction and classification are the major steps of character recognition. Several experimental results are shown to demonstrate the performance of the proposed algorithm
Quehl, Bernhard; Yang, Haojin; Sack, Harald
Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.
Swapnil Vitthal Tathe
Full Text Available Advancement in computer vision technology and availability of video capturing devices such as surveillance cameras has evoked new video processing applications. The research in video face recognition is mostly biased towards law enforcement applications. Applications involves human recognition based on face and iris, human computer interaction, behavior analysis, video surveillance etc. This paper presents face tracking framework that is capable of face detection using Haar features, recognition using Gabor feature extraction, matching using correlation score and tracking using Kalman filter. The method has good recognition rate for real-life videos and robust performance to changes due to illumination, environmental factors, scale, pose and orientations.
Karaoglu, S.; van Gemert, J.C.; Gevers, T.
We propose to use text recognition to aid in visual object class recognition. To this end we first propose a new algorithm for text detection in natural images. The proposed text detection is based on saliency cues and a context fusion step. The algorithm does not need any parameter tuning and can
Most biometric systems employed for human recognition require physical contact with, or close proximity to, a cooperative subject. Far more challenging is the ability to reliably recognize individuals at a distance, when viewed from an arbitrary angle under real-world environmental conditions. Gait and face data are the two biometrics that can be most easily captured from a distance using a video camera. This comprehensive and logically organized text/reference addresses the fundamental problems associated with gait and face-based human recognition, from color and infrared video data that are
Zhou, Saohua; Krüger, Volker; Chellappa, Rama
Recognition of human faces using a gallery of still or video images and a probe set of videos is systematically investigated using a probabilistic framework. In still-to-video recognition, where the gallery consists of still images, a time series state space model is proposed to fuse temporal...... demonstrate that, due to the propagation of the identity variable over time, a degeneracy in posterior probability of the identity variable is achieved to give improved recognition. The gallery is generalized to videos in order to realize video-to-video recognition. An exemplar-based learning strategy...... of the identity variable produces the recognition result. The model formulation is very general and it allows a variety of image representations and transformations. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach for both still-to-video and video...
Full Text Available Automated video object recognition is a topic of emerging importance in both defense and civilian applications. This work describes an accurate and low-power neuromorphic architecture and system for real-time automated video object recognition. Our system, Neuormorphic Visual Understanding of Scenes (NEOVUS, is inspired by recent findings in computational neuroscience on feed-forward object detection and classification pipelines for processing and extracting relevant information from visual data. The NEOVUS architecture is inspired by the ventral (what and dorsal (where streams of the mammalian visual pathway and combines retinal processing, form-based and motion-based object detection, and convolutional neural nets based object classification. Our system was evaluated by the Defense Advanced Research Projects Agency (DARPA under the NEOVISION2 program on a variety of urban area video datasets collected from both stationary and moving platforms. The datasets are challenging as they include a large number of targets in cluttered scenes with varying illumination and occlusion conditions. The NEOVUS system was also mapped to commercially available off-the-shelf hardware. The dynamic power requirement for the system that includes a 5.6Mpixel retinal camera processed by object detection and classification algorithms at 30 frames per second was measured at 21.7 Watts (W, for an effective energy consumption of 5.4 nanoJoules (nJ per bit of incoming video. In a systematic evaluation of five different teams by DARPA on three aerial datasets, the NEOVUS demonstrated the best performance with the highest recognition accuracy and at least three orders of magnitude lower energy consumption than two independent state of the art computer vision systems. These unprecedented results show that the NEOVUS has the potential to revolutionize automated video object recognition towards enabling practical low-power and mobile video processing applications.
Soleymani, Mohammad; Pantic, Maja; Pun, Thierry
This paper presents a user-independent emotion recognition method with the goal of recovering affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then,
Craciunescu, Razvan; Mihovska, Albena Dimitrova; Kyriazakos, Sofoklis
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current research focus includes on the emotion...... recognition from the face and hand gesture recognition. Gesture recognition enables humans to communicate with the machine and interact naturally without any mechanical devices. This paper investigates the possibility to use non-audio/video sensors in order to design a low-cost gesture recognition device...
Full Text Available This paper describes a mobile device which tries to give the blind or visually impaired access to text information. Three key technologies are required for this system: text detection, optical character recognition, and speech synthesis. Blind users and the mobile environment imply two strong constraints. First, pictures will be taken without control on camera settings and a priori information on text (font or size and background. The second issue is to link several techniques together with an optimal compromise between computational constraints and recognition efficiency. We will present the overall description of the system from text detection to OCR error correction.
Bouaziz, Baseem; Zlitni, Tarek; Walid MAHDI
In this paper, we propose a spatial temporal video-text detection technique which proceed in two principal steps:potential text region detection and a filtering process. In the first step we divide dynamically each pair of consecutive video frames into sub block in order to detect change. A significant difference between homologous blocks implies the appearance of an important object which may be a text region. The temporal redundancy is then used to filter these regions and forms an effectiv...
Full Text Available A novel video shot boundary recognition method is proposed, which includes two stages of video feature extraction and shot boundary recognition. Firstly, we use adaptive locality preserving projections (ALPP to extract video feature. Unlike locality preserving projections, we define the discriminating similarity with mode prior probabilities and adaptive neighborhood selection strategy which make ALPP more suitable to preserve the local structure and label information of the original data. Secondly, we use an optimized multiple kernel support vector machine to classify video frames into boundary and nonboundary frames, in which the weights of different types of kernels are optimized with an ant colony optimization method. Experimental results show the effectiveness of our method.
Li, Shuohao; Han, Anqi; Chen, Xu; Yin, Xiaoqing; Zhang, Jun
Recognizing text in images captured in the wild is a fundamental preprocessing task for many computer vision and machine learning applications and has gained significant attention in recent years. This paper proposes an end-to-end trainable deep review neural network for scene text recognition, which is a combination of feature extraction, feature reviewing, feature attention, and sequence recognition. Our model can generate the predicted text without any segmentation or grouping algorithm. Because the attention model in the feature attention stage lacks global modeling ability, a review network is applied to extract the global context of sequence data in the feature reviewing stage. We perform rigorous experiments across a number of standard benchmarks, including IIIT5K, SVT, ICDAR03, and ICDAR13 datasets. Experimental results show that our model is comparable to or outperforms state-of-the-art techniques.
U.S. Geological Survey, Department of the Interior — The ViTexOCR script presents a new method for extracting navigation data from videos with text overlays using optical character recognition (OCR) software. Over the...
Rashmi B Hiremath
Full Text Available Sign language is a way of expressing yourself with your body language, where every bit of ones expressions, goals, or sentiments are conveyed by physical practices, for example, outward appearances, body stance, motions, eye movements, touch and the utilization of space. Non-verbal communication exists in both creatures and people, yet this article concentrates on elucidations of human non-verbal or sign language interpretation into Hindi textual expression. The proposed method of implementation utilizes the image processing methods and synthetic intelligence strategies to get the goal of sign video recognition. To carry out the proposed task implementation it uses image processing methods such as frame analysing based tracking, edge detection, wavelet transform, erosion, dilation, blur elimination, noise elimination, on training videos. It also uses elliptical Fourier descriptors called SIFT for shape feature extraction and most important part analysis for feature set optimization and reduction. For result analysis, this paper uses different category videos such as sign of weeks, months, relations etc. Database of extracted outcomes are compared with the video fed to the system as a input of the signer by a trained unclear inference system.
Full Text Available The paper describes a system of hand gesture recognition by image processing for human robot interaction. The recognition and interpretation of the hand postures acquired through a video camera allow the control of the robotic arm activity: motion - translation and rotation in 3D - and tightening/releasing the clamp. A gesture dictionary was defined and heuristic algorithms for recognition were developed and tested. The system can be used for academic and industrial purposes, especially for those activities where the movements of the robotic arm were not previously scheduled, for training the robot easier than using a remote control. Besides the gesture dictionary, the novelty of the paper consists in a new technique for detecting the relative positions of the fingers in order to recognize the various hand postures, and in the achievement of a robust system for controlling robots by postures of the hands.
Mosleh, Ali; Bouguila, Nizar; Ben Hamza, Abdessamad
We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process.
Yu, Litao; Yang, Yang; Huang, Zi; Wang, Peng; Song, Jingkuan; Shen, Heng Tao
In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyze video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of Web video event recognition, where Web videos often describe large-granular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from Web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous Web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model, which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video data sets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of Web video event recognition.
Zhang, Jianguang; Han, Yahong; Tang, Jinhui; Hu, Qinghua; Jiang, Jianmin
Human action recognition has been well explored in applications of computer vision. Many successful action recognition methods have shown that action knowledge can be effectively learned from motion videos or still images. For the same action, the appropriate action knowledge learned from different types of media, e.g., videos or images, may be related. However, less effort has been made to improve the performance of action recognition in videos by adapting the action knowledge conveyed from images to videos. Most of the existing video action recognition methods suffer from the problem of lacking sufficient labeled training videos. In such cases, over-fitting would be a potential problem and the performance of action recognition is restrained. In this paper, we propose an adaptation method to enhance action recognition in videos by adapting knowledge from images. The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images. Meanwhile, we extend the adaptation method to a semi-supervised framework which can leverage both labeled and unlabeled videos. Thus, the over-fitting can be alleviated and the performance of action recognition is improved. Experiments on public benchmark datasets and real-world datasets show that our method outperforms several other state-of-the-art action recognition methods.
Bao, Tianlong; Ding, Chunhui; Karmoshi, Saleem; Zhu, Ming
Face recognition has been widely studied recently while video-based face recognition still remains a challenging task because of the low quality and large intra-class variation of video captured face images. In this paper, we focus on two scenarios of video-based face recognition: 1)Still-to-Video(S2V) face recognition, i.e., querying a still face image against a gallery of video sequences; 2)Video-to-Still(V2S) face recognition, in contrast to S2V scenario. A novel method was proposed in this paper to transfer still and video face images to an Euclidean space by a carefully designed convolutional neural network, then Euclidean metrics are used to measure the distance between still and video images. Identities of still and video images that group as pairs are used as supervision. In the training stage, a joint loss function that measures the Euclidean distance between the predicted features of training pairs and expanding vectors of still images is optimized to minimize the intra-class variation while the inter-class variation is guaranteed due to the large margin of still images. Transferred features are finally learned via the designed convolutional neural network. Experiments are performed on COX face dataset. Experimental results show that our method achieves reliable performance compared with other state-of-the-art methods.
Hong, Tao; Srikantan, Geetha; Zandy, V. C.; Fang, Chi; Srihari, Sargur N.
Cherry Blossom is a machine-printed Japanese document recognition system developed at CEDAR in past years. This paper focuses on the character recognition part of the system. for Japanese character classification, two feature sets are used in the system: one is the local stroke direction feature; another is the gradient, structural and concavity feature. Based on each of those features, two different classifiers are designed: one is the so-called minimum error subspace classifier; another is the fast nearest-neighbor (FNN) classifier. Although the original version of the FNN classifier uses Euclidean distance measurement, its new version uses both Euclidean distance and the distance calculation defined in the ME subspace method. This integration improved performance significantly. The number of character classes handled by those classifiers is about 3,300 (including alphanumeric, kana and level-1 Kanji JIS). Classifiers were trained and tested on 200 ppi character images from CEDAR Japanese character image CD-ROM.
Han, Yahong; Yang, Yi; Yan, Yan; Ma, Zhigang; Sebe, Nicu; Zhou, Xiaofang
To improve both the efficiency and accuracy of video semantic recognition, we can perform feature selection on the extracted video features to select a subset of features from the high-dimensional feature set for a compact and accurate video data representation. Provided the number of labeled videos is small, supervised feature selection could fail to identify the relevant features that are discriminative to target classes. In many applications, abundant unlabeled videos are easily accessible. This motivates us to develop semisupervised feature selection algorithms to better identify the relevant video features, which are discriminative to target classes by effectively exploiting the information underlying the huge amount of unlabeled video data. In this paper, we propose a framework of video semantic recognition by semisupervised feature selection via spline regression (S(2)FS(2)R) . Two scatter matrices are combined to capture both the discriminative information and the local geometry structure of labeled and unlabeled training videos: A within-class scatter matrix encoding discriminative information of labeled training videos and a spline scatter output from a local spline regression encoding data distribution. An l2,1 -norm is imposed as a regularization term on the transformation matrix to ensure it is sparse in rows, making it particularly suitable for feature selection. To efficiently solve S(2)FS(2)R , we develop an iterative algorithm and prove its convergency. In the experiments, three typical tasks of video semantic recognition, such as video concept detection, video classification, and human action recognition, are used to demonstrate that the proposed S(2)FS(2)R achieves better performance compared with the state-of-the-art methods.
Full Text Available In recent years, video surveillance and monitoring have gained importance because of security and safety concerns. Banks, borders, airports, stores, and parking areas are the important application areas. There are two main parts in scenario recognition: Low level processing, including moving object detection and object tracking, and feature extraction. We have developed new features through this work which are RUD (relative upper density, RMD (relative middle density and RLD (relative lower density, and we have used other features such as aspect ratio, width, height, and color of the object. High level processing, including event start-end point detection, activity detection for each frame and scenario recognition for sequence of images. This part is the focus of our research, and different pattern recognition and classification methods are implemented and experimental results are analyzed. We looked into several methods of classification which are decision tree, frequency domain classification, neural network-based classification, Bayes classifier, and pattern recognition methods, which are control charts, and hidden Markov models. The control chart approach, which is a decision methodology, gives more promising results than other methodologies. Overlapping between events is one of the problems, hence we applied fuzzy logic technique to solve this problem. After using this method the total accuracy increased from 95.6 to 97.2.
Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.
Different biometric traits such as face appearance and heartbeat signal from Electrocardiogram (ECG)/Phonocardiogram (PCG) are widely used in the human identity recognition. Recent advances in facial video based measurement of cardio-physiological parameters such as heartbeat rate, respiratory rate......, and blood volume pressure provide the possibility of extracting heartbeat signal from facial video instead of using obtrusive ECG or PCG sensors in the body. This paper proposes the Heartbeat Signal from Facial Video (HSFV) as a new biometric trait for human identity recognition, for the first time...
Ukhanova, Ann; Støttrup-Andersen, Jesper; Forchhammer, Søren
Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC...... standards on the recognition performance. We compare logarithmic and logistic functions for quality modeling. Our results show that a logistic function can better describe the dependence of recognition performance on the quality for both compression standards. We observe that automatic license plate...
Krüger, Volker; Zhou, Shaohua; Chellappa, Rama
-temporal relations: This allows the system to use dynamics as well as to generate warnings when 'implausible' situations occur or to circumvent these altogether. We have studied the effectiveness of temporal integration for recognition purposes by using the face recognition as an example problem. Face recognition...... is a prominent problem and has been studied more extensively than almost any other recognition problem. An observation is that face recognition works well in ideal conditions. If those conditions, however, are not met, then all present algorithms break down disgracefully. This probelm appears to be general...... will use the face recognition problem as a study example. Probabilistic methods are attractive in this context as they allow a systematic handling of uncertainty and an elegant way for fusing temporal information....
Full Text Available This paper presents an approach and an implementation of a named entity extractor for Slovene language, based on a machine learning approach. It is designed as a supervised algorithm based on Conditional Random Fields and is trained on the ssj500k annotated corpus of Slovene. The corpus, which is available under a Creative Commons CC-BY-NC-SA licence, is annotated with morphosyntactic tags, as well as named entities for people, locations, organisations, and miscellaneous names. The paper discusses the influence of morphosyntactic tags, lexicons and conjunctions of features of neighbouring words. An important contribution of this investigation is that morphosyntactic tags benefit named entity extraction. Using all the best-performing features the recognizer reaches a precision of 74% and a recall of 72%, having stronger performance on personal and geographical named entities, followed by organizations, but performs poorly on the miscellaneous entities, since this class is very diverse and consequently difficult to predict. A major contribution of the paper is also showing the benefits of splitting the class of miscellaneous entities into organizations and other entities, which in turn improves performance even on personal and organizational names. The software, developed in this research is freely available under the Apache 2.0 licence at http://ailab.ijs.si/~tadej/slner.zip, while development versions are available at https://github.com/tadejs/slner.
R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22
MS received 6 November 2000; revised 16 December 2002. Abstract. Recognition of text recorded in Pitman shorthand language (PSL) is an interesting research problem. Automatic reading of PSL and generating equivalent. English text is very challenging. The most important task involved here is the accurate recognition ...
Recognition of text recorded in Pitman shorthand language (PSL) is an interesting research problem. Automatic reading of PSL and generating equivalent English text is very challenging. The most important task involved here is the accurate recognition of Pitman stroke patterns, which constitute “text” in PSL. The paper ...
Full Text Available In this paper we propose a new approach for facial micro expressions recognition. For this purpose the Eulerian Video Magnification (EVM method is used to retrieve the subtle motions of the face. The results of this method are obtained as in the magnified images sequence. In this study the numerical tests are performed on two databases: Spontaneous Micro expression (SMIC and Category and Sourcing Managers Executive (CASME. We evaluate our proposed method in two phases using the eigenface method. In phase 1 we recognize the type of a micro expression, for example emotional versus unemotional in SMIC database. Phase 2 classifies the recognized micro expression as negative versus positive in SMIC database and happiness versus disgust in CASME database. The results show that the eigenface method by the EVM method for the retrieval of subtle motions of the face increases the performance of micro expression recognition. Moreover, the proposed approach is more accurate and promising than the previous works in micro expressions recognition.
Sudhaker Samuel RD
Full Text Available The first step in an automatic face recognition system is to localize the face region in a cluttered background and carefully segment the face from each frame of a video sequence. In this paper, we propose a fast and efficient algorithm for segmenting a face suitable for recognition from a video sequence. The cluttered background is first subtracted from each frame, in the foreground regions, a coarse face region is found using skin colour. Then using a dynamic template matching approach the face is efficiently segmented. The proposed algorithm is fast and suitable for real-time video sequence. The algorithm is invariant to large scale and pose variation. The segmented face is then handed over to a recognition algorithm based on principal component analysis and linear discriminant analysis. The online face detection, segmentation, and recognition algorithms take an average of 0.06 second on a 3.2 GHz P4 machine.
Kimura, Marcia L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Erikson, Rebecca L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Lombardo, Nicholas J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
The Pacific Northwest National Laboratory (PNNL) will produce a non-cooperative (i.e. not posing for the camera) facial recognition video data set for research purposes to evaluate and enhance facial recognition systems technology. The aggregate data set consists of 1) videos capturing PNNL role players and public volunteers in three key operational settings, 2) photographs of the role players for enrolling in an evaluation database, and 3) ground truth data that documents when the role player is within various camera fields of view. PNNL will deliver the aggregate data set to DHS who may then choose to make it available to other government agencies interested in evaluating and enhancing facial recognition systems. The three operational settings that will be the focus of the video collection effort include: 1) unidirectional crowd flow 2) bi-directional crowd flow, and 3) linear and/or serpentine queues.
Ruolin, Zhu; Jianbo, Liu; Yuan, Zhang; Xiaoyu, Wu
The technology of computer vision is used in the training of military shooting. In order to overcome the limitation of the bullet holes recognition using Video Image Analysis that exists over-detection or leak-detection, this paper adopts the support vector machine algorithm and convolutional neural network to extract and recognize Bullet Holes in the digital video and compares their performance. It extracts HOG characteristics of bullet holes and train SVM classifier quickly, though the target is under outdoor environment. Experiments show that support vector machine algorithm used in this paper realize a fast and efficient extraction and recognition of bullet holes, improving the efficiency of shooting training.
Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.
Full Text Available The use of video sequences for face recognition has been relatively less studied compared to image-based approaches. In this paper, we present an analysis-by-synthesis framework for face recognition from video sequences that is robust to large changes in facial pose and lighting conditions. This requires tracking the video sequence, as well as recognition algorithms that are able to integrate information over the entire video; we address both these problems. Our method is based on a recently obtained theoretical result that can integrate the effects of motion, lighting, and shape in generating an image using a perspective camera. This result can be used to estimate the pose and structure of the face and the illumination conditions for each frame in a video sequence in the presence of multiple point and extended light sources. We propose a new inverse compositional estimation approach for this purpose. We then synthesize images using the face model estimated from the training data corresponding to the conditions in the probe sequences. Similarity between the synthesized and the probe images is computed using suitable distance measurements. The method can handle situations where the pose and lighting conditions in the training and testing data are completely disjoint. We show detailed performance analysis results and recognition scores on a large video dataset.
Hunt, Tamerah N.
Context: Concussion management is potentially complicated by the lack of reporting due to poor educational intervention in youth athletics. Objective: Determine if a concussion-education video developed for high school athletes will increase the reporting of concussive injuries and symptom recognition in this group. Design: Cross-sectional,…
Boon, Josua; Hoenes, Frank; Ben Hadj Ali, Majdi
This paper presents an alternative method for typed character recognition by way of the textual context. The approach here is word-oriented, and uses no a priori knowledge about typical appearance of characters. It leads back to an approach suggested by R. G. Casey where text recognition is considered as solving a substitution cipher, or cryptogram. Character images are considered only in order to distinguish or group (cluster) them. The recognition information used is provided by dictionaries. The overall procedure can be divided into three principle steps: (1) a ciphertext like symbolic representation of the text is generated. (2) in an initialization phase only a few but reliable word recognitions are striven for. The resulting partial symbol-character assignments are sufficient to initiate the following relaxation of the recognition process as the third step. Whereas Casey uses several ambiguous alternatives for word recognition, the approach here is based on acquiring a few, but reliable, recognition alternatives. Thus, instead of a spell check program, a dictionary with a heuristic-driven look- up control combined with an appropriate access mechanism is used.
Ryoo, M. S.; Matthies, Larry
In this evaluation paper, we discuss convolutional neural network (CNN)-based approaches for human activity recognition. In particular, we investigate CNN architectures designed to capture temporal information in videos and their applications to the human activity recognition problem. There have been multiple previous works to use CNN-features for videos. These include CNNs using 3-D XYT convolutional filters, CNNs using pooling operations on top of per-frame image-based CNN descriptors, and recurrent neural networks to learn temporal changes in per-frame CNN descriptors. We experimentally compare some of these different representatives CNNs while using first-person human activity videos. We especially focus on videos from a robots viewpoint, captured during its operations and human-robot interactions.
Giménez, Adrià; Juan, Alfons
Hidden Markov Models (HMMs) are now widely used in off-line handwritten text recognition. As in speech recognition, they are usually built from shared, embedded HMMs at symbol level, in which state-conditional probability density functions are modelled with Gaussian mixtures. In contrast to speech recognition, however, it is unclear which kind of real-valued features should be used and, indeed, very different features sets are in use today. In this paper, we propose to by-pass feature extraction and directly fed columns of raw, binary image pixels into embedded Bernoulli mixture HMMs, that is, embedded HMMs in which the emission probabilities are modelled with Bernoulli mixtures. The idea is to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. Good empirical results are reported on the well-known IAM database.
Kirsh, Steven J; Mounts, Jeffrey R W
This study assessed the speed of recognition of facial emotional expressions (happy and angry) as a function of violent video game play. Color photos of calm facial expressions morphed to either an angry or a happy facial expression. Participants were asked to make a speeded identification of the emotion (happiness or anger) during the morph. Typically, happy faces are identified faster than angry faces (the happy-face advantage). Results indicated that playing a violent video game led to a reduction in the happy face advantage. Implications of these findings are discussed with respect to the current models of aggressive behavior. (c) 2007 Wiley-Liss, Inc.
Huo, Hongwen; Feng, Jufu
We present a novel online face recognition approach for video stream in this paper. Our method includes two stages: pre-training and online training. In the pre-training phase, our method observes interactions, collects batches of input data, and attempts to estimate their distributions (Box-Cox transformation is adopted here to normalize rough estimates). In the online training phase, our method incrementally improves classifiers' knowledge of the face space and updates it continuously with incremental eigenspace analysis. The performance achieved by our method shows its great potential in video stream processing.
Guan, Yu; Li, Chang-Tsun; Choudhury, Sruti Das
In this paper, we propose a gait recognition method for extremely low frame-rate videos. Different from the popular temporal reconstruction-based methods, the proposed method uses the average gait over the whole sequence as input feature template. Assuming the effect caused by extremely low frame-rate or large gait fluctuations are intra-class variations that the gallery data fails to capture, we build a general model based on random subspace method. More specifically, a number of weak classi...
Bourennane, Salah; Fossati, Caroline; Ketchantang, William
Among existing biometrics, iris recognition systems are among the most accurate personal biometric identification systems. However, the acquisition of a workable iris image requires strict cooperation of the user; otherwise, the image will be rejected by a verification module because of its poor quality, inducing a high false reject rate (FRR). The FRR may also increase when iris localization fails or when the pupil is too dilated. To improve the existing methods, we propose to use video sequences acquired in real time by a camera. In order to keep the same computational load to identify the iris, we propose a new method to estimate the iris characteristics. First, we propose a new iris texture characterization based on Fourier-Mellin transform, which is less sensitive to pupil dilatations than previous methods. Then, we develop a new iris localization algorithm that is robust to variations of quality (partial occlusions due to eyelids and eyelashes, light reflects, etc.), and finally, we introduce a fast and new criterion of suitable image selection from an iris video sequence for an accurate recognition. The accuracy of each step of the algorithm in the whole proposed recognition process is tested and evaluated using our own iris video database and several public image databases, such as CASIA, UBIRIS, and BATH.
Full Text Available Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM. Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN, then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept example descriptions in the form of feature vectors which include a label that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalization from the training set. This ability to generalize is the main attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognizing characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalizes well, is noise tolerant and the symbolic representation is easy to understand. The technique can be divided into three major steps: the first step is pre- processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword, number and position of the complementary character, etc. Finally, machine learning C4.5 is used for character classification to generate a decision tree.
Full Text Available This paper presents a new approach for text-based video content retrieval system. The proposed scheme consists of three main processes that are key frame extraction, text localization and keyword matching. For the key-frame extraction, we proposed a Maximally Stable Extremal Region (MSER based feature which is oriented to segment shots of the video with different text contents. In text localization process, in order to form the text lines, the MSERs in each key frame are clustered based on their similarity in position, size, color, and stroke width. Then, Tesseract OCR engine is used for recognizing the text regions. In this work, to improve the recognition results, we input four images obtained from different pre-processing methods to Tesseract engine. Finally, the target keyword for querying is matched with OCR results based on an approximate string search scheme. The experiment shows that, by using the MSER feature, the videos can be segmented by using efficient number of shots and provide the better precision and recall in comparison with a sum of absolute difference and edge based method.
Whitman, Lucy S.; Lewis, Colin; Oakley, John P.
Atmospheric scattering causes significant degradation in the quality of video images, particularly when imaging over long distances. The principle problem is the reduction in contrast due to scattered light. It is known that when the scattering particles are not too large compared with the imaging wavelength (i.e. Mie scattering) then high spatial resolution information may be contained within a low-contrast image. Unfortunately this information is not easily perceived by a human observer, particularly when using a standard video monitor. A secondary problem is the difficulty of achieving a sharp focus since automatic focus techniques tend to fail in such conditions. Recently several commercial colour video processing systems have become available. These systems use various techniques to improve image quality in low contrast conditions whilst retaining colour content. These systems produce improvements in subjective image quality in some situations, particularly in conditions of haze and light fog. There is also some evidence that video enhancement leads to improved ATR performance when used as a pre-processing stage. Psychological literature indicates that low contrast levels generally lead to a reduction in the performance of human observers in carrying out simple visual tasks. The aim of this paper is to present the results of an empirical study on object recognition in adverse viewing conditions. The chosen visual task was vehicle number plate recognition at long ranges (500 m and beyond). Two different commercial video enhancement systems are evaluated using the same protocol. The results show an increase in effective range with some differences between the different enhancement systems.
Bowman, Elizabeth K.; Turek, Matt; Tunison, Paul; Porter, Reed; Thomas, Steve; Gintautas, Vadas; Shargo, Peter; Lin, Jessica; Li, Qingzhe; Gao, Yifeng; Li, Xiaosheng; Mittu, Ranjeev; Rosé, Carolyn Penstein; Maki, Keith; Bogart, Chris; Choudhari, Samrihdi Shree
Today's warfighters operate in a highly dynamic and uncertain world, and face many competing demands. Asymmetric warfare and the new focus on small, agile forces has altered the framework by which time critical information is digested and acted upon by decision makers. Finding and integrating decision-relevant information is increasingly difficult in data-dense environments. In this new information environment, agile data algorithms, machine learning software, and threat alert mechanisms must be developed to automatically create alerts and drive quick response. Yet these advanced technologies must be balanced with awareness of the underlying context to accurately interpret machine-processed indicators and warnings and recommendations. One promising approach to this challenge brings together information retrieval strategies from text, video, and imagery. In this paper, we describe a technology demonstration that represents two years of tri-service research seeking to meld text and video for enhanced content awareness. The demonstration used multisource data to find an intelligence solution to a problem using a common dataset. Three technology highlights from this effort include 1) Incorporation of external sources of context into imagery normalcy modeling and anomaly detection capabilities, 2) Automated discovery and monitoring of targeted users from social media text, regardless of language, and 3) The concurrent use of text and imagery to characterize behaviour using the concept of kinematic and text motifs to detect novel and anomalous patterns. Our demonstration provided a technology baseline for exploiting heterogeneous data sources to deliver timely and accurate synopses of data that contribute to a dynamic and comprehensive worldview.
Full Text Available In this paper, a novel approach for identifying normal and obscene videos is proposed. In order to classify different episodes of a video independently and discard the need to process all frames, first, key frames are extracted and skin regions are detected for groups of video frames starting with key frames. In the second step, three different features including 1- structural features based on single frame information, 2- features based on spatiotemporal volume and 3-motion-based features, are extracted for each episode of video. The PCA-LDA method is then applied to reduce the size of structural features and select more distinctive features. For the final step, we use fuzzy or a Weighted Support Vector Machine (WSVM classifier to identify video episodes. We also employ a multilayer Kohonen network as an initial clustering algorithm to increase the ability to discriminate between the extracted features into two classes of videos. Features based on motion and periodicity characteristics increase the efficiency of the proposed algorithm in videos with bad illumination and skin colour variation. The proposed method is evaluated using 1100 videos in different environmental and illumination conditions. The experimental results show a correct recognition rate of 94.2% for the proposed algorithm.
Singha, Joyeeta; Das, Karen
Sign Language Recognition has emerged as one of the important area of research in Computer Vision. The difficulty faced by the researchers is that the instances of signs vary with both motion and appearance. Thus, in this paper a novel approach for recognizing various alphabets of Indian Sign Language is proposed where continuous video sequences of the signs have been considered. The proposed system comprises of three stages: Preprocessing stage, Feature Extraction and Classification. Preprocessing stage includes skin filtering, histogram matching. Eigen values and Eigen Vectors were considered for feature extraction stage and finally Eigen value weighted Euclidean distance is used to recognize the sign. It deals with bare hands, thus allowing the user to interact with the system in natural way. We have considered 24 different alphabets in the video sequences and attained a success rate of 96.25%.
Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
The aim of this study was to determine whether video or text was more effective at knowledge transfer and retention. In this study, knowledge transfer with video and text was similar, and text consumed fewer resources to create.
Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.
Riad I. Hammoud
Full Text Available We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA and multi-media indexing and explorer (MINER. VIVA utilizes analyst call-outs (ACOs in the form of chat messages (voice-to-text to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1 a fusion of graphical track and text data using probabilistic methods; (2 an activity pattern learning framework to support querying an index of activities of interest (AOIs and targets of interest (TOIs by movement type and geolocation; and (3 a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV. VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
Rurainsky, J.; Eisert, P.
We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.
Ge, Zhenhao; Sharma, Sudhendu R.; Smith, Mark J. T.
Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and efficiency. This paper presents a novel PCA/LDA-based approach that is faster than traditional statistical model-based methods and achieves competitive results. First, the performance based on only PCA and only LDA is measured; then a mixed model, taking advantages of both methods, is introduced. A subset of the TIMIT corpus composed of 200 male speakers, is used for enrollment, validation and testing. The best results achieve 100%, 96% and 95% classification rate at population level 50, 100 and 200, using 39- dimensional MFCC features with delta and double delta. These results are based on 12-second text-independent speech for training and 4-second data for test. These are comparable to the conventional MFCC-GMM methods, but require significantly less time to train and operate.
Heutte, L.; Paquet, T.; Nosary, A.; Hernoux, C.
This communication investigates the automatic reading of unconstrained omniwriter handwritten texts. It shows how to endow the reading system with learning faculties necessary to adapt the recognition to each writer\\\\\\'s handwriting. In the first part of this communication, we explain how the
Full Text Available We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Dhakal, Shanti; Rahnemoonfar, Maryam
Measuring water quality of bays, estuaries, and gulfs is a complicated and time-consuming process. YSI Sonde is an instrument used to measure water quality parameters such as pH, temperature, salinity, and dissolved oxygen. This instrument is taken to water bodies in a boat trip and researchers note down different parameters displayed by the instrument's display monitor. In this project, a mobile application is developed for Android platform that allows a user to take a picture of the YSI Sonde monitor, extract text from the image and store it in a file on the phone. The image captured by the application is first processed to remove perspective distortion. Probabilistic Hough line transform is used to identify lines in the image and the corner of the image is then obtained by determining the intersection of the detected horizontal and vertical lines. The image is warped using the perspective transformation matrix, obtained from the corner points of the source image and the destination image, hence, removing the perspective distortion. Mathematical morphology operation, black-hat is used to correct the shading of the image. The image is binarized using Otsu's binarization technique and is then passed to the Optical Character Recognition (OCR) software for character recognition. The extracted information is stored in a file on the phone and can be retrieved later for analysis. The algorithm was tested on 60 different images of YSI Sonde with different perspective features and shading. Experimental results, in comparison to ground-truth results, demonstrate the effectiveness of the proposed method.
Klare, Brendan; Burge, Mark
We assess the impact of the H.264 video codec on the match performance of automated face recognition in surveillance and mobile video applications. A set of two hundred access control (90 pixel inter-pupilary distance) and distance surveillance (45 pixel inter-pupilary distance) videos taken under non-ideal imaging and facial recognition (e.g., pose, illumination, and expression) conditions were matched using two commercial face recognition engines in the studies. The first study evaluated automated face recognition performance on access control and distance surveillance videos at CIF and VGA resolutions using the H.264 baseline profile at nine bitrates rates ranging from 8kbs to 2048kbs. In our experiments, video signals were able to be compressed up to 128kbs before a significant drop face recognition performance occurred. The second study evaluated automated face recognition on mobile devices at QCIF, iPhone, and Android resolutions for each of the H.264 PDA profiles. Rank one match performance, cumulative match scores, and failure to enroll rates are reported.
the stationary dataset, we include downsampled versions of dataset obtained by down- sampling the original HD videos to lower framerates and pixel...when video framerates and pixel resolutions are low. This is a relatively unexplored area 3155 Figure 2. Six example scenes in VIRAT Video Dataset...A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen
This dissertation develops a novel system for object recognition in videos. The input of the system is a set of unconstrained videos containing a known set of objects. The output is the locations and categories for each object in each frame across all videos. Initially, a shot boundary detection algorithm is applied to the videos to divide them into multiple sequences separated by the identified shot boundaries. Since each of these sequences still contains moderate content variations, we furt...
Full Text Available Gait is a unique perceptible biometric feature at larger distances, and the gait representation approach plays a key role in a video sensor-based gait recognition system. Class Energy Image is one of the most important gait representation methods based on appearance, which has received lots of attentions. In this paper, we reviewed the expressions and meanings of various Class Energy Image approaches, and analyzed the information in the Class Energy Images. Furthermore, the effectiveness and robustness of these approaches were compared on the benchmark gait databases. We outlined the research challenges and provided promising future directions for the field. To the best of our knowledge, this is the first review that focuses on Class Energy Image. It can provide a useful reference in the literature of video sensor-based gait representation approach.
Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin
Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Wang, Chao; Wang, Yunhong; Zhang, Zhaoxiang
This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. First a robust face tracking algorithm is proposed via employing local sparse appearance and covariance pooling method. In the following face recognition stage, with the employment of a novel template update strategy, which combines incremental subspace learning, our recognition algorithm adapts the template to appearance changes and reduces the influence of occlusion and illumination variation. This leads to a robust video-based face tracking and recognition with desirable performance. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database, which includes 47 celebrities. Our proposed method produces a high face recognition rate at 95% of all videos. The proposed face tracking and recognition algorithms are also tested on a set of noisy videos under heavy occlusion and illumination variation. The tracking results on challenging benchmark videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. In the case of the challenging dataset in which faces undergo occlusion and illumination variation, and tracking and recognition experiments under significant pose variation on the University of California, San Diego (Honda/UCSD) database, our proposed method also consistently demonstrates a high recognition rate.
Vidakis, Nikolaos; Kavallakis, George; Triantafyllidis, Georgios
This paper presents a scheme of creating an emotion index of cover song music video clips by recognizing and classifying facial expressions of the artist in the video. More specifically, it fuses effective and robust algorithms which are employed for expression recognition, along with the use of ...... of a neural network system using the features extracted by the SIFT algorithm. Also we support the need of this fusion of different expression recognition algorithms, because of the way that emotions are linked to facial expressions in music video clips.......This paper presents a scheme of creating an emotion index of cover song music video clips by recognizing and classifying facial expressions of the artist in the video. More specifically, it fuses effective and robust algorithms which are employed for expression recognition, along with the use...
Full Text Available Abnormal running behavior frequently happen in robbery cases and other criminal cases. In order to identity these abnormal behaviors a method to detect and recognize abnormal running behavior, is presented based on spatiotemporal parameters. Meanwhile, to obtain more accurate spatiotemporal parameters and improve the real-time performance of the algorithm, a multitarget tracking algorithm, based on the intersection area among the minimum enclosing rectangle of the moving objects, is presented. The algorithm can judge and exclude effectively the intersection of multitarget and the interference, which makes the tracking algorithm more accurate and of better robustness. Experimental results show that the combination of these two algorithms can detect and recognize effectively the abnormal running behavior in surveillance videos.
Kavallakis, George; Vidakis, Nikolaos; Triantafyllidis, Georgios
This paper presents a scheme of creating an emotion index of cover song music video clips by recognizing and classifying facial expressions of the artist in the video. More specifically, it fuses effective and robust algorithms which are employed for expression recognition, along with the use...... of a neural network system using the features extracted by the SIFT algorithm. Also we support the need of this fusion of different expression recognition algorithms, because of the way that emotions are linked to facial expressions in music video clips....
Divakaran, Ajay; Radhakrishnan, Regunathan; Xiong, Ziyou; Casey, Michael
In Casey describes a generalized sound recognition framework based on reduced rank spectra and Minimum-Entropy Priors. This approach enables successful recognition of a wide variety of sounds such as male speech, female speech, music, animal sounds etc. In this work, we apply this recognition framework to news video to enable quick video browsing. We identify speaker change positions in the broadcast news using the sound recognition framework. We combine the speaker change position with color & motion cues from video and are able to locate the beginning of each of the topics covered by the news video. We can thus skim the video by merely playing a small portion starting from each of the locations where one of the principal cast begins to speak. In combination with our motion-based video browsing approach, our technique provides simple automatic news video browsing. While similar work has been done before, our approach is simpler and faster than competing techniques, and provides a rich framework for further analysis and description of content.
Diaz, Ruth L; Wong, Ulric; Hodgins, David C; Chiu, Carina G; Goghari, Vina M
Violent video game playing has been associated with both positive and negative effects on cognition. We examined whether playing two or more hours of violent video games a day, compared to not playing video games, was associated with a different pattern of recognition of five facial emotions, while controlling for general perceptual and cognitive differences that might also occur. Undergraduate students were categorized as violent video game players (n = 83) or non-gamers (n = 69) and completed a facial recognition task, consisting of an emotion recognition condition and a control condition of gender recognition. Additionally, participants completed questionnaires assessing their video game and media consumption, aggression, and mood. Violent video game players recognized fearful faces both more accurately and quickly and disgusted faces less accurately than non-gamers. Desensitization to violence, constant exposure to fear and anxiety during game playing, and the habituation to unpleasant stimuli, are possible mechanisms that could explain these results. Future research should evaluate the effects of violent video game playing on emotion processing and social cognition more broadly. © 2015 Wiley Periodicals, Inc.
This study tests the effects of tutorial format (i.e. video vs. text) on student attitudes and performance in online computing education. A one-factor within-subjects experiment was conducted in an undergraduate Computer Information Systems course. Subjects were randomly assigned to complete two Excel exercises online: one with a video tutorial…
Gelfuso, Andrea; Dennis, Danielle V.
In this article, we theoretically explore how the deliberate use of video during literacy field experiences creates a text that can be read by triad members and can ameliorate the problem of relying on memory to engage in reflective conversations about literacy teaching and learning. The use of video, tools, and interactions with knowledgeable…
Advanced SDK, and iris recognition, through the SIRIS SDK, to provide multi-biometric recognition, which is also supported by the ABIS System , a server-side...flow of legitimate goods and travellers across borders, and to align/coordinate security systems for goods, cargo and baggage; 3. Cross-Cutting...commercial off-the-shelf face recognition systems on Chokepoint dataset,” • S. Matwin, D. Gorodnichy, and E. Granger, “Using smooth ROC method for
Full Text Available An automatic recognition framework for human facial expressions from a monocular video with an uncalibrated camera is proposed. The expression characteristics are first acquired from a kind of deformable template, similar to a facial muscle distribution. After associated regularization, the time sequences from the trait changes in space-time under complete expressional production are then arranged line by line in a matrix. Next, the matrix dimensionality is reduced by a method of manifold learning of neighborhood-preserving embedding. Finally, the refined matrix containing the expression trait information is recognized by a classifier that integrates the hidden conditional random field (HCRF and support vector machine (SVM. In an experiment using the Cohn–Kanade database, the proposed method showed a comparatively higher recognition rate than the individual HCRF or SVM methods in direct recognition from two-dimensional human face traits. Moreover, the proposed method was shown to be more robust than the typical Kotsia method because the former contains more structural characteristics of the data to be classified in space-time
Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan
Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...... in the literature, a treatment of simultaneous speaker and utterance verification with a modern, standard database is so far lacking. This is despite the burgeoning demand for voice biometrics in a plethora of practical security applications. With the goal of improving overall verification performance, this paper...... reports different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification. Experiments performed on the recently released RedDots corpus are reported for three different ASV systems and four different UV systems. Results show that the combination...
15 4 Graph presents the performance comparison among different algorithms implemented in OpenCV (Fisherfaces, Eigenfaces and LBPH)- all use...for face recog- nition in video, in particular those available in the OpenCV library . Comparative performance analysis of these algorithms is...videos. The first one used a generic class that exists in OpenCV (version 2.4.1), called FeatureDetector, which allowed the automatic extraction of
et al.  adopted random forest , a collection of binary decision trees, for fast quantization. Shotton et al.  proposed semantic texton forests ...detection for sports video. In: Proceedings of international conference on image and video retrieval, Urbana -Champaign, IL 8. Ballan L, Bertini M, Bimbo AD...Proceedings of AAAI con- ference 93. Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal
Nasrollahi, Kamal; Moeslund, Thomas B.
of such video sequences by any enhancement or even face recognition algorithm is demanding. Thus, there is a need for a mechanism to summarize the input video sequence to a set of key-frames and then applying an enhancement algorithm to this subset. This paper presents a system doing exactly this. The system......Face recognition systems are very sensitive to the quality and resolution of their input face images. This makes such systems unreliable when working with long surveillance video sequences without employing some selection and enhancement algorithms. On the other hand, processing all the frames...... uses face quality assessment to select the key-frames and a hybrid super-resolution to enhance the face image quality. The suggested system that employs a linear associator face recognizer to evaluate the enhanced results has been tested on real surveillance video sequences and the experimental results...
Goutsu, Yusuke; Kobayashi, Takaki; Obara, Junya; Kusajima, Ikuo; Takeichi, Kazunari; Takano, Wataru; Nakamura, Yoshihiko
Gesture recognition is used in many practical applications such as human-robot interaction, medical rehabilitation and sign language. With increasing motion sensor development, multiple data sources have become available, which leads to the rise of multi-modal gesture recognition. Since our previous approach to gesture recognition depends on a unimodal system, it is difficult to classify similar motion patterns. In order to solve this problem, a novel approach which integrates motion, audio and video models is proposed by using dataset captured by Kinect. The proposed system can recognize observed gestures by using three models. Recognition results of three models are integrated by using the proposed framework and the output becomes the final result. The motion and audio models are learned by using Hidden Markov Model. Random Forest which is the video classifier is used to learn the video model. In the experiments to test the performances of the proposed system, the motion and audio models most suitable for gesture recognition are chosen by varying feature vectors and learning methods. Additionally, the unimodal and multi-modal models are compared with respect to recognition accuracy. All the experiments are conducted on dataset provided by the competition organizer of MMGRC, which is a workshop for Multi-Modal Gesture Recognition Challenge. The comparison results show that the multi-modal model composed of three models scores the highest recognition rate. This improvement of recognition accuracy means that the complementary relationship among three models improves the accuracy of gesture recognition. The proposed system provides the application technology to understand human actions of daily life more precisely.
Talavera Martínez, Estefanía
Nowadays, there is an upsurge of interest in using lifelogging devices. Such devices generate huge amounts of image data; consequently, the need for automatic methods for analyzing and summarizing these data is drastically increasing. We present a new method for familiar scene recognition in
Full Text Available The task of human hand trajectory tracking and gesture trajectory recognition based on synchronized color and depth video is considered. Toward this end, in the facet of hand tracking, a joint observation model with the hand cues of skin saliency, motion and depth is integrated into particle filter in order to move particles to local peak in the likelihood. The proposed hand tracking method, namely, salient skin, motion, and depth based particle filter (SSMD-PF, is capable of improving the tracking accuracy considerably, in the context of the signer performing the gesture toward the camera device and in front of moving, cluttered backgrounds. In the facet of gesture recognition, a shape-order context descriptor on the basis of shape context is introduced, which can describe the gesture in spatiotemporal domain. The efficient shape-order context descriptor can reveal the shape relationship and embed gesture sequence order information into descriptor. Moreover, the shape-order context leads to a robust score for gesture invariant. Our approach is complemented with experimental results on the settings of the challenging hand-signed digits datasets and American sign language dataset, which corroborate the performance of the novel techniques.
This study investigated whether video-based materials can facilitate second language learners' text comprehension at the levels of macrostructure and microstructure. Three classes inclusive of 98 Chinese-speaking university students joined this study. The three classes were randomly assigned to three treatment groups: on-screen text (T Group),…
Dissanayake, Cheryl; Shembrey, Joh; Suddendorf, Thomas
Two studies are reported which investigate delayed video self-recognition (DSR) in children with autistic disorder and Asperger's disorder relative to one another and to their typically developing peers. A secondary aim was to establish whether DSR ability is dependent on metarepresentational ability. Children's verbal and affective responses to…
Full Text Available Digital text formats that allow a close interaction between writing and video represent new possibilities and challenges for the communication of educational content. What are the premises for functional and appropriate communication through web-based, multimedial text formats?This article explores the digital writing-video format from a structural, theoretical perspective. To begin with, the two media’s respective characteristics are discussed and compared as carriers of complex signs. Thereafter, the focus is upon how writing and video elements can be accommodated to web media. Finally, the article discusses the conditions for optimal co-ordination and interaction between the two media types within the framework of an integrated design. A design example is presented.
Full Text Available The purpose of this study is to develop a plausible method to code and compile Buddhist texts from original Tibetan scripts into Romanized form. Using GUI (Graphical User Interface based on Object Oriented Design, a dictionary of Tibetan characters can be easily made for Buddhist literature researchers. It is hoped that a computer system capable of highly accurate character recognition will be actively used by all scholars engaged in Buddhist literature research. In the present study, an efficient automatic recognition method for Tibetan characters is established. The result of the experiments performed is that the recognition rate achieved is 99.4% for 28,954 characters.
Lai, Wei-Sheng; Huang, Yujia; Joshi, Neel; Buehler, Christopher; Yang, Ming-Hsuan; Kang, Sing Bing
We present a system for converting a fully panoramic (360[Formula: see text]) video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience. Our system exploits visual saliency and semantics to non-uniformly sample in space and time for generating hyperlapses. In addition, users can optionally choose objects of interest for customizing the hyperlapses. We first stabilize an input 360[Formula: see text] video by smoothing the rotation between adjacent frames and then compute regions of interest and saliency scores. An initial hyperlapse is generated by optimizing the saliency and motion smoothness followed by the saliency-aware frame selection. We further smooth the result using an efficient 2D video stabilization approach that adaptively selects the motion model to generate the final hyperlapse. We validate the design of our system by showing results for a variety of scenes and comparing against the state-of-the-art method through a large-scale user study.
Wingenbach, Tanja S H; Ashwin, Chris; Brosnan, Mark
There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or 'extreme' examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations.
There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or ‘extreme’ examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations. PMID:29293674
Freed, Erin; Long, Debra; Rodriguez, Tonantzin; Franks, Peter; Kravitz, Richard L; Jerant, Anthony
To compare the effects of two health information texts on patient recognition memory, a key aspect of comprehension. Randomized controlled trial (N=60), comparing the effects of experimental and control colorectal cancer (CRC) screening texts on recognition memory, measured using a statement recognition test, accounting for response bias (score range -0.91 to 5.34). The experimental text had a lower Flesch-Kincaid reading grade level (7.4 versus 9.6), was more focused on addressing screening barriers, and employed more comparative tables than the control text. Recognition memory was higher in the experimental group (2.54 versus 1.09, t=-3.63, P=0.001), including after adjustment for age, education, and health literacy (β=0.42, 95% CI: 0.17, 0.68, P=0.001), and in analyses limited to persons with college degrees (β=0.52, 95% CI: 0.18, 0.86, P=0.004) or no self-reported health literacy problems (β=0.39, 95% CI: 0.07, 0.71, P=0.02). An experimental CRC screening text improved recognition memory, including among patients with high education and self-assessed health literacy. CRC screening texts comparable to our experimental text may be warranted for all screening-eligible patients, if such texts improve screening uptake. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Borup, Jered; West, Richard E.; Thomas, Rebecca
In this study we examined student and instructor perceptions of text and video feedback in technology integration courses that combined face-to-face with online instruction for teacher candidates. Items from the Feedback Environment Scale (Steelman et al. 2004) were used to measure student perceptions of feedback quality and delivery. Independent…
Abdous, M'hammed; He, Wu
Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte
OBJECTIVES: The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students....... Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. CONCLUSION: The format of patient cases included in teaching may have a substantial impact...
Monsoriu, Juan A; Gimenez, Marcos H; Riera, Jaime; Vidaurre, Ana [Departamento de Fisica Aplicada, Universidad Politecnica de Valencia, E-46022 Valencia (Spain)
The applications of the digital video image to the investigation of physical phenomena have increased enormously in recent years. The advances in computer technology and image recognition techniques allow the analysis of more complex problems. In this work, we study the movement of a damped coupled oscillation system. The motion is considered as a linear combination of two normal modes, i.e. the symmetric and antisymmetric modes. The image of the experiment is recorded with a video camera and analysed by means of software developed in our laboratory. The results show a very good agreement with the theory.
Full Text Available Medical entity recognition, a basic task in the language processing of clinical data, has been extensively studied in analyzing admission notes in alphabetic languages such as English. However, much less work has been done on nonstructural texts that are written in Chinese, or in the setting of differentiation of Chinese drug names between traditional Chinese medicine and Western medicine. Here, we propose a novel cascade-type Chinese medication entity recognition approach that aims at integrating the sentence category classifier from a support vector machine and the conditional random field-based medication entity recognition. We hypothesized that this approach could avoid the side effects of abundant negative samples and improve the performance of the named entity recognition from admission notes written in Chinese. Therefore, we applied this approach to a test set of 324 Chinese-written admission notes with manual annotation by medical experts. Our data demonstrated that this approach had a score of 94.2% in precision, 92.8% in recall, and 93.5% in F-measure for the recognition of traditional Chinese medicine drug names and 91.2% in precision, 92.6% in recall, and 91.7% F-measure for the recognition of Western medicine drug names. The differences in F-measure were significant compared with those in the baseline systems.
Full Text Available The paper discusses Optical Character Recognition (OCR of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries, with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.
Bu, Jiang; Lao, Song-Yan; Bai, Liang
Nowadays, different applications like automatic video indexing, keyword based video search and TV commercials can be developed by detecting and recognizing the billboard trademark. We propose a hierarchical solution for real-time billboard trademark recognition in various sports video, billboard frames are detected in the first level, fuzzy decision tree with easily-computing features are employed to accelerate the process, while in the second level, color and regional SIFT features are combined for the first time to describe the appearance of trademarks, and the shared nearest neighbor (SNN) clustering with x2 distance is utilized instead of traditional K-means clustering to construct the SIFT vocabulary, at last, Latent Semantic Analysis (LSA) based SIFT vocabulary matching is performed on the template trademark and the candidate regions in billboard frame. The preliminary experiments demonstrate the effectiveness of the hierarchical solution, and real time constraints are also met by our solution.
Full Text Available In this paper, we describe how information obtained from multiple views usinga network of cameras can be effectively combined to yield a reliable and fast humanactivity recognition system. First, we present a score-based fusion technique for combininginformation from multiple cameras that can handle the arbitrary orientation of the subjectwith respect to the cameras and that does not rely on a symmetric deployment of thecameras. Second, we describe how longer, variable duration, inter-leaved action sequencescan be recognized in real-time based on multi-camera data that is continuously streaming in.Our framework does not depend on any particular feature extraction technique, and as aresult, the proposed system can easily be integrated on top of existing implementationsfor view-specific classifiers and feature descriptors. For implementation and testing of theproposed system, we have used computationally simple locality-specific motion informationextracted from the spatio-temporal shape of a human silhouette as our feature descriptor.This lends itself to an efficient distributed implementation, while maintaining a high framecapture rate. We demonstrate the robustness of our algorithms by implementing them ona portable multi-camera, video sensor network testbed and evaluating system performanceunder different camera network configurations.
Ohyama, Wataru; Suzuki, Koushi; Wakabayashi, Tetsushi
An algorithm for recognition and defect detection of dot-matrix text printed on products is proposed. Extraction and recognition of dot-matrix text contains several difficulties, which are not involved in standard camera-based OCR, that the appearance of dot-matrix characters is corrupted and broken by illumination, complex texture in the background and other standard characters printed on product packages. We propose a dot-matrix text extraction and recognition method which does not require any user interaction. The method employs detected location of corner points and classification score. The result of evaluation experiment using 250 images shows that recall and precision of extraction are 78.60% and 76.03%, respectively. Recognition accuracy of correctly extracted characters is 94.43%. Detecting printing defect of dot-matrix text is also important in the production scene to avoid illegal productions. We also propose a detection method for printing defect of dot-matrix characters. The method constructs a feature vector of which elements are classification scores of each character class and employs support vector machine to classify four types of printing defect. The detection accuracy of the proposed method is 96.68 %.
Celli, Fabio; Poesio, Massimo
We present PR2, a personality recognition system available online, that performs instance-based classification of Big5 personality types from unstructured text, using language-independent features. It has been tested on English and Italian, achieving performances up to f=.68.
Lin, Wutao; Ji, Donghong; Lu, Yanan
Information extraction in clinical texts enables medical workers to find out problems of patients faster as well as makes intelligent diagnosis possible in the future. There has been a lot of work about disorder mention recognition in clinical narratives. But recognition of some more complicated disorder mentions like overlapping ones is still an open issue. This paper proposes a multi-label structured Support Vector Machine (SVM) based method for disorder mention recognition. We present a multi-label scheme which could be used in complicated entity recognition tasks. We performed three sets of experiments to evaluate our model. Our best F1-Score on the 2013 Conference and Labs of the Evaluation Forum data set is 0.7343. There are six types of labels in our multi-label scheme, all of which are represented by 24-bit binary numbers. The binary digits of each label contain information about different disorder mentions. Our multi-label method can recognize not only disorder mentions in the form of contiguous or discontiguous words but also mentions whose spans overlap with each other. The experiments indicate that our multi-label structured SVM model outperforms the condition random field (CRF) model for this disorder mention recognition task. The experiments show that our multi-label scheme surpasses the baseline. Especially for overlapping disorder mentions, the F1-Score of our multi-label scheme is 0.1428 higher than the baseline BIOHD1234 scheme. This multi-label structured SVM based approach is demonstrated to work well with this disorder recognition task. The novel multi-label scheme we presented is superior to the baseline and it can be used in other models to solve various types of complicated entity recognition tasks as well.
Khosla, Deepak; Moore, Christopher K.; Chelian, Suhas
This paper presents a bio-inspired method for spatio-temporal recognition in static and video imagery. It builds upon and extends our previous work on a bio-inspired Visual Attention and object Recognition System (VARS). The VARS approach locates and recognizes objects in a single frame. This work presents two extensions of VARS. The first extension is a Scene Recognition Engine (SCE) that learns to recognize spatial relationships between objects that compose a particular scene category in static imagery. This could be used for recognizing the category of a scene, e.g., office vs. kitchen scene. The second extension is the Event Recognition Engine (ERE) that recognizes spatio-temporal sequences or events in sequences. This extension uses a working memory model to recognize events and behaviors in video imagery by maintaining and recognizing ordered spatio-temporal sequences. The working memory model is based on an ARTSTORE1 neural network that combines an ART-based neural network with a cascade of sustained temporal order recurrent (STORE)1 neural networks. A series of Default ARTMAP classifiers ascribes event labels to these sequences. Our preliminary studies have shown that this extension is robust to variations in an object's motion profile. We evaluated the performance of the SCE and ERE on real datasets. The SCE module was tested on a visual scene classification task using the LabelMe2 dataset. The ERE was tested on real world video footage of vehicles and pedestrians in a street scene. Our system is able to recognize the events in this footage involving vehicles and pedestrians.
Dissanayake, Cheryl; Shembrey, Joh; Suddendorf, Thomas
Two studies are reported which investigate delayed video self-recognition (DSR) in children with autistic disorder and Asperger's disorder relative to one another and to their typically developing peers. A secondary aim was to establish whether DSR ability is dependent on metarepresentational ability. Children's verbal and affective responses to their image were also measured. Three groups of male children between 5 and 9 years, comprising 15 with high-functioning autistic disorder (HFA), 12 with Asperger's disorder (AspD), and 15 typically developing (TD) children, participated in Study 1. Study 2 included two groups of younger children (18 HFA; 18 TD) aged 4 to 7 years. Participant groups in each study were equally able to recognize themselves using delayed video feedback, and responded to their marked image with positive affect. This was so even amongst children with HFA who were impaired in their performance on false belief tasks, casting doubt on a metarepresentational basis of DSR.
Lv, Zhuowen; Xing, Xianglei; Wang, Kejun; Guan, Donghai
Gait is a unique perceptible biometric feature at larger distances, and the gait representation approach plays a key role in a video sensor-based gait recognition system. Class Energy Image is one of the most important gait representation methods based on appearance, which has received lots of attentions. In this paper, we reviewed the expressions and meanings of various Class Energy Image approaches, and analyzed the information in the Class Energy Images. Furthermore, the effectiveness and robustness of these approaches were compared on the benchmark gait databases. We outlined the research challenges and provided promising future directions for the field. To the best of our knowledge, this is the first review that focuses on Class Energy Image. It can provide a useful reference in the literature of video sensor-based gait representation approach. PMID:25574935
Wang, Haochang; Li, Yu
As a new branch of data mining and knowledge discovery, the research of biomedical text mining has a rapid progress currently. Biomedical named entity (BNE) recognition is a basic technique in the biomedical knowledge discovery and its performance has direct effects on further discovery and processing in biomedical texts. In this paper, we present an improved method based on co-decision matrix framework for Biomedical Named Entity Recognition (BNER). The relativity between classifiers is utilised by using co-decision matrix to exchange decision information among classifiers. The experiments are carried on GENIA corpus with the best result of 75.9% F-score. Experimental results show that the proposed method, co-decision matrix framework, can yield promising performances.
Nona Heydari Esfahani
Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.
Naz, Saeeda; Umar, Arif Iqbal; Ahmed, Riaz; Razzak, Muhammad Imran; Rashid, Sheikh Faisal; Shafait, Faisal
The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta'liq writing style. Nasta'liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta'liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta'liq printed text, which significantly outperforms the state-of-the-art techniques.
Dolbin, A. V.; Rozaliev, V. L.; Orlova, Y. A.
This work is devoted to the semantic analysis of texts, which were written in a natural language. The main goal of the research was to compare latent Dirichlet allocation and latent semantic analysis to identify elements of the human appearance in the text. The completeness of information retrieval was chosen as the efficiency criteria for methods comparison. However, it was insufficient to choose only one method for achieving high recognition rates. Thus, additional methods were used for finding references to the personality in the text. All these methods are based on the created information model, which represents person’s appearance.
Niu, Li; Xu, Xinxing; Chen, Lin; Duan, Lixin; Xu, Dong
In this paper, we propose new approaches for action and event recognition by leveraging a large number of freely available Web videos (e.g., from Flickr video search engine) and Web images (e.g., from Bing and Google image search engines). We address this problem by formulating it as a new multi-domain adaptation problem, in which heterogeneous Web sources are provided. Specifically, we are given different types of visual features (e.g., the DeCAF features from Bing/Google images and the trajectory-based features from Flickr videos) from heterogeneous source domains and all types of visual features from the target domain. Considering the target domain is more relevant to some source domains, we propose a new approach named multi-domain adaptation with heterogeneous sources (MDA-HS) to effectively make use of the heterogeneous sources. In MDA-HS, we simultaneously seek for the optimal weights of multiple source domains, infer the labels of target domain samples, and learn an optimal target classifier. Moreover, as textual descriptions are often available for both Web videos and images, we propose a novel approach called MDA-HS using privileged information (MDA-HS+) to effectively incorporate the valuable textual information into our MDA-HS method, based on the recent learning using privileged information paradigm. MDA-HS+ can be further extended by using a new elastic-net-like regularization. We solve our MDA-HS and MDA-HS+ methods by using the cutting-plane algorithm, in which a multiple kernel learning problem is derived and solved. Extensive experiments on three benchmark data sets demonstrate that our proposed approaches are effective for action and event recognition without requiring any labeled samples from the target domain.
Mansoor Al-A'ali; Jamil Ahmad
This paper presents a novel new technique based on feature extraction and on dynamic cursor sizing for the recognition of Arabic Text. The most challenging area in Arabic OCR (AOCR) research is the segmentation of words into their sub-words and their individual characters. Several rules are defined that govern the size and movement of the cursor through each segment. The features obtained from each segment are termed strokes and each segment is defined by a number of strokes where each stroke...
Schlipsing, Marc; Salmen, Jan; Tschentscher, Marc
Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for "live" recognition and tracking of players exist. But these methods are far from collecting and analyzing event data fully autonomously. To generate accurate results...... collection, annotation, and learning as an offline task. A semi-automatic labeling of training data and robust learning given few examples from unbalanced classes are required. We present a real-time system acquiring and analyzing video sequences from soccer matches. It estimates each player's position...
Sanal Kumar, K. P.; Bhavani, R., Dr.
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
Lebowsky, Fritz; Nicolas, Marina
High-end monitors and TVs based on LCD technology continue to increase their native display resolution to 4k by 2k and beyond. Subsequently, uncompressed pixel amplitude processing becomes costly not only when transmitting over cable or wireless communication channels, but also when processing with array processor architectures. For motion video content, spatial preprocessing from YCbCr 444 to YCbCr 420 is widely accepted. However, due to spatial low pass filtering in horizontal and vertical direction, quality and readability of small text and graphics content is heavily compromised when color contrast is high in chrominance channels. On the other hand, straight forward YCbCr 444 compression based on mathematical error coding schemes quite often lacks optimal adaptation to visually significant image content. We present a block-based memory compression architecture for text, graphics, and video enabling multidimensional error minimization with context sensitive control of visually noticeable artifacts. As a result of analyzing image context locally, the number of operations per pixel can be significantly reduced, especially when implemented on array processor architectures. A comparative analysis based on some competitive solutions highlights the effectiveness of our approach, identifies its current limitations with regard to high quality color rendering, and illustrates remaining visual artifacts.
Ilgner, Justus; Düwel, Philip; Westhofen, Martin
We conducted a study to evaluate speech recognition software in an otorhinolaryngology unit and to assess its impact on productivity prior to general implementation. Current speech recognition software (IBM ViaVoice, version 10) was implemented on a personal computer with a 2-GHz central processing unit, 256 MB of RAM, and a 30-GB hard disk drive, with and without add-on professional vocabulary for otorhinolaryngology. This vocabulary was added by the automated analysis of an additional 12,257 documents from our department. We compared the word recognition error rates for three different text types and determined their impact on the amount of surgeon's time that was invested in the production of an error-free document. Although error rates without any professional vocabulary database were rather high (operation reports: 38.72%; consultation notes: 27.77%), the patient information was edited with a satisfactory result (10.65%). Best results were obtained with the specialty-related vocabulary database added by the analysis of our own documents (operation reports: 5.45%; consultation notes: 5.21%). An increase in productivity compared with that of conventional transcription was found at an error rate of less than 16%.
McIntosh, Lindsey G; Park, Sohee
Social impairment is a core feature of schizophrenia, present from the pre-morbid stage and predictive of outcome, but the etiology of this deficit remains poorly understood. Successful and adaptive social interactions depend on one's ability to make rapid and accurate judgments about others in real time. Our surprising ability to form accurate first impressions from brief exposures, known as "thin slices" of behavior has been studied very extensively in healthy participants. We sought to examine affect and social trait judgment from thin slices of static or video stimuli in order to investigate the ability of schizophrenic individuals to form reliable social impressions of others. 21 individuals with schizophrenia (SZ) and 20 matched healthy participants (HC) were asked to identify emotions and social traits for actors in standardized face stimuli as well as brief video clips. Sound was removed from videos to remove all verbal cues. Clinical symptoms in SZ and delusional ideation in both groups were measured. Results showed a general impairment in affect recognition for both types of stimuli in SZ. However, the two groups did not differ in the judgments of trustworthiness, approachability, attractiveness, and intelligence. Interestingly, in SZ, the severity of positive symptoms was correlated with higher ratings of attractiveness, trustworthiness, and approachability. Finally, increased delusional ideation in SZ was associated with a tendency to rate others as more trustworthy, while the opposite was true for HC. These findings suggest that complex social judgments in SZ are affected by symptomatology. Copyright © 2014 Elsevier B.V. All rights reserved.
Chiang, Yao-Yi; Leyk, Stefan; Honarvar Nazari, Narges; Moghaddam, Sima; Tan, Tian Xiang
Converting geographic features (e.g., place names) in map images into a vector format is the first step for incorporating cartographic information into a geographic information system (GIS). With the advancement in computational power and algorithm design, map processing systems have been considerably improved over the last decade. However, the fundamental map processing techniques such as color image segmentation, (map) layer separation, and object recognition are sensitive to minor variations in graphical properties of the input image (e.g., scanning resolution). As a result, most map processing results would not meet user expectations if the user does not "properly" scan the map of interest, pre-process the map image (e.g., using compression or not), and train the processing system, accordingly. These issues could slow down the further advancement of map processing techniques as such unsuccessful attempts create a discouraged user community, and less sophisticated tools would be perceived as more viable solutions. Thus, it is important to understand what kinds of maps are suitable for automatic map processing and what types of results and process-related errors can be expected. In this paper, we shed light on these questions by using a typical map processing task, text recognition, to discuss a number of map instances that vary in suitability for automatic processing. We also present an extensive experiment on a diverse set of scanned historical maps to provide measures of baseline performance of a standard text recognition tool under varying map conditions (graphical quality) and text representations (that can vary even within the same map sheet). Our experimental results help the user understand what to expect when a fully or semi-automatic map processing system is used to process a scanned map with certain (varying) graphical properties and complexities in map content.
Koelstra, Sander; Yazdani, Ashkan; Soleymani, Mohammad; Mühl, Christian; Lee, Jong-Seok; Nijholt, Anton; Pun, Thierry; Ebrahimi, Touradj; Patras, Ioannis
Recently, the field of automatic recognition of users' affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users' self-assessments of arousal and valence and the frequency pow...
Full Text Available Background: Attention of national and foreign researchers was focused so far on structural and semantic features of syntactic idioms. Automatic analysis of these peculiar units that are on the verge of syntax and phraseology still was not carried out in the scientific literature. This issue requires a theoretical understanding and practical implementation. Purpose: To create an algorithm of recognition of syntactic idioms with one- or two-term core component in the corpus of texts. Results: Based on the results of previous theoretical studies we highlighted a number of formal and statistical criteria that enable to distinguish syntactic idioms from other language units in the corpus of Ukrainian-language texts. The author developed a block diagram of syntactic idioms recognition, incorporating two branches constructed accordingly for the sentences with one-term and sentences with two-term core component. The first branch is based on the presence of word repeats (full words concurrence or presence of other word forms of the word and the list of core components determined on previous stages of the study (є, це, то, не, так; як; з/із/зі, між, над, серед; а, але, зате, однак, проте. The second branch was created for another type of syntactic idioms – one with a two-term core component. It takes into account the following properties of the analyzed units: the presence of combinations of service parts of speech, service parts of speech with pronoun or adverb, pronoun and adverb; compliance of words combinations with the register of the syntactic idioms core components currently comprising 92 structures; association measure of mutual information ≥9, etc. Discussion: Offered algorithm enables automatic identification of syntactic idioms in the corpus of texts and removal of contexts of their use, it can be used to improve the procedure of automatic text processing and creation of automated translation
El Moubtahij Hicham
Full Text Available This paper presents an analytical approach of an offline handwritten Arabic text recognition system. It is based on the Hidden Markov Models (HMM Toolkit (HTK without explicit segmentation. The first phase is preprocessing, where the data is introduced in the system after quality enhancements. Then, a set of characteristics (features of local densities and features statistics are extracted by using the technique of sliding windows. Subsequently, the resulting feature vectors are injected to the Hidden Markov Model Toolkit (HTK. The simple database âArabic-Numbersâ and IFN/ENIT are used to evaluate the performance of this system. Keywords: Hidden Markov Models (HMM Toolkit (HTK, Sliding windows
Smith, Theodore S; Isaak, Matthew I; Senette, Christian G; Abadie, Brenton G
This study examined the effects of electronic communication distractions, including cell-phone and texting demands, on true and false recognition, specifically semantically related words presented and not presented on a computer screen. Participants were presented with 24 Deese-Roediger-McDermott (DRM) lists while manipulating the concurrent presence or absence of cell-phone and text-message distractions during study. In the DRM paradigm, participants study lists of semantically related words (e.g., mother, crib, and diaper) linked to a non-presented critical lure (e.g., baby). After studying the lists of words, participants are then requested to recall or recognize previously presented words. Participants often not only demonstrate high remembrance for presented words (true memory: crib), but also recollection for non-presented words (false memory: baby). In the present study, true memory was highest when participants were not presented with any distraction tasks during study of DRM words, but poorer when they were required to complete a cell-phone conversation or text-message task during study. False recognition measures did not statistically vary across distraction conditions. Signal detection analyses showed that participants better discriminated true targets (list items presented during study) from true target controls (items presented during study only) when cell-phone or text-message distractions were absent than when they were present. Response bias did not vary significantly across distraction conditions, as there were no differences in the likelihood that a participant would claim an item as "old" (previously presented) rather than "new" (not previously presented). Results of this study are examined with respect to both activation monitoring and fuzzy trace theories.
Tilley, Carol L.
With the increasing ranks of cell phone ownership is an increase in text messaging, or texting. During 2008, more than 2.5 trillion text messages were sent worldwide--that's an average of more than 400 messages for every person on the planet. Although many of the messages teenagers text each day are perhaps nothing more than "how r u?" or "c u…
Artyukhin, S. G.; Mestetskiy, L. M.
This paper presents an efficient framework for solving the problem of static gesture recognition based on data obtained from the web cameras and depth sensor Kinect (RGB-D - data). Each gesture given by a pair of images: color image and depth map. The database store gestures by it features description, genereated by frame for each gesture of the alphabet. Recognition algorithm takes as input a video sequence (a sequence of frames) for marking, put in correspondence with each frame sequence gesture from the database, or decide that there is no suitable gesture in the database. First, classification of the frame of the video sequence is done separately without interframe information. Then, a sequence of successful marked frames in equal gesture is grouped into a single static gesture. We propose a method combined segmentation of frame by depth map and RGB-image. The primary segmentation is based on the depth map. It gives information about the position and allows to get hands rough border. Then, based on the color image border is specified and performed analysis of the shape of the hand. Method of continuous skeleton is used to generate features. We propose a method of skeleton terminal branches, which gives the opportunity to determine the position of the fingers and wrist. Classification features for gesture is description of the position of the fingers relative to the wrist. The experiments were carried out with the developed algorithm on the example of the American Sign Language. American Sign Language gesture has several components, including the shape of the hand, its orientation in space and the type of movement. The accuracy of the proposed method is evaluated on the base of collected gestures consisting of 2700 frames.
Xie, Zecheng; Sun, Zenghui; Jin, Lianwen; Ni, Hao; Lyons, Terry
Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MC-FCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.50% and 96.58%, respectively, which are significantly better than the best result reported thus far in the literature.
Dat Tien Nguyen
Full Text Available Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT, speed-up robust feature (SURF, local binary patterns (LBP, histogram of oriented gradients (HOG, and weighted HOG. Recently, the convolutional neural network (CNN method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
M.Sc. (Computer Science) A video conference is an interactive meeting between two or more locations, facilitated by simultaneous two-way video and audio transmissions. People in a video conference, also known as participants, join these video conferences for business and recreational purposes. In a typical video conference, we should properly identify and authenticate every participant in the video conference, if information discussed during the video conference is confidential. This preve...
Kortelainen, Jukka; Seppänen, Tapio
Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Zhang, Shaodian; Elhadad, Nóemie
Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592
Full Text Available Recognizing others’ emotional states is crucial for effective social interaction. While most facial emotion recognition tasks use explicit prompts that trigger consciously controlled processing, emotional faces are almost exclusively processed implicitly in real life. Recent attempts in social cognition suggest a dual process perspective, whereby explicit and implicit processes largely operate independently. However, due to differences in methodology the direct comparison of implicit and explicit social cognition has remained a challenge.Here, we introduce a new tool to comparably measure implicit and explicit processing aspects comprising basic and complex emotions in facial expressions. We developed two video-based tasks with similar answer formats to assess performance in respective facial emotion recognition processes: Face Puzzle, implicit and explicit. To assess the tasks’ sensitivity to atypical social cognition and to infer interrelationship patterns between explicit and implicit processes in typical and atypical development, we included healthy adults (NT, n= 24 and adults with autism spectrum disorder (ASD, n = 24.Item analyses yielded good reliability of the new tasks. Group-specific results indicated sensitivity to subtle social impairments in high-functioning ASD. Correlation analyses with established implicit and explicit socio-cognitive measures were further in favor of the tasks’ external validity. Between group comparisons provide first hints of differential relations between implicit and explicit aspects of facial emotion recognition processes in healthy compared to ASD participants. In addition, an increased magnitude of between group differences in the implicit task was found for a speed-accuracy composite measure. The new Face Puzzle tool thus provides two new tasks to separately assess explicit and implicit social functioning, for instance, to measure subtle impairments as well as potential improvements due to social
Aldoory, Linda; Roberts, Erica Blue; Bushar, Jessica; Assini-Meytin, Luciana C
Infant mortality is associated with access to healthcare, knowledge, and health literacy. Text4baby, the largest national texting health initiative, seeks to address these factors. However, no research has examined the program's theoretical framework, an aspect that may impact its success. To address this gap, Text4baby's use of theory was evaluated through a content analysis of Text4baby messages and interviews with Text4baby content developers. We compared the main variables of health behavior theories framing Text4baby messages with the situational theory of publics and its factors of problem recognition and constraint recognition. The situational theory of publics provides an understanding of the types of publics that might emerge from Text4baby's audiences of pregnant women. Aware, latent, and active publics are defined by the situational theory and are created out of problem recognition and constraint recognition along with a level of personal involvement in the issue of prenatal health. We used content analysis and interviewing to explore how Text4baby prenatal messages were constructed using theory and to offer lessons learned for prenatal health campaigns. The multi-methodological approach to understanding meaning construction in the production of these text messages and how meaning played out in the messages is a useful framework for text message campaigns.
Jalal, Ahmad; Kamal, Shaharyar; Kim, Daijin
Recent advancements in depth video sensors technologies have made human activity recognition (HAR) realizable for elderly monitoring applications. Although conventional HAR utilizes RGB video sensors, HAR could be greatly improved with depth video sensors which produce depth or distance information. In this paper, a depth-based life logging HAR system is designed to recognize the daily activities of elderly people and turn these environments into an intelligent living space. Initially, a depth imaging sensor is used to capture depth silhouettes. Based on these silhouettes, human skeletons with joint information are produced which are further used for activity recognition and generating their life logs. The life-logging system is divided into two processes. Firstly, the training system includes data collection using a depth camera, feature extraction and training for each activity via Hidden Markov Models. Secondly, after training, the recognition engine starts to recognize the learned activities and produces life logs. The system was evaluated using life logging features against principal component and independent component features and achieved satisfactory recognition rates against the conventional approaches. Experiments conducted on the smart indoor activity datasets and the MSRDailyActivity3D dataset show promising results. The proposed system is directly applicable to any elderly monitoring system, such as monitoring healthcare problems for elderly people, or examining the indoor activities of people at home, office or hospital.
Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua
Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, have been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF’s model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning. PMID:26262126
Sartini, Emily Claire
The purpose of this study was to investigate the effects of explicit instruction combined with video prompting to teach text comprehension skills to students with autism spectrum disorder. Participants included 4 elementary school students with autism. A multiple probe across participants design was used to evaluate the intervention's…
Tyner, Bryan C.; Fienup, Daniel M.
Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance.…
Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the ne...
Scherr, Sebastian; Arendt, Florian; Schäfer, Markus
Suicide is a global public health problem. Media impact on suicide is well confirmed and there are several recommendations on how media should and should not report on suicide to minimize the risk of copycat behavior. Those media guidelines have been developed to improve responsible reporting on suicide (RRS). Although such guidelines are used in several countries, we lack empirical evidence on their causal effect on actual journalistic news writing. We conducted an experiment with journalism students (N = 78) in Germany in which we tested whether exposure to awareness material promoting RRS influences news writing. As a supplement to the widely used text-based material, we tested the impact of a video in which a suicide expert presents the guidelines. A video was used as a supplement to text partly due to its potential benefit for prevention efforts over the Internet. We chose a low-budget production process allowing easy reproduction in different countries by local suicide experts. In the experiment, participants were either exposed to written, audio-visual, or no awareness material. Afterwards, participants read numerous facts of an ostensible suicide event and were asked to write a factual suicide news story based on these facts. Analyses indicate that awareness material exposure helped to improve RRS with the awareness video showing the strongest effects. We recommend that suicide prevention should use instructive awareness videos about RRS complementary to text-based awareness material.
Lei, Jianbo; Tang, Buzhou; Lu, Xueqin; Gao, Kaihua; Jiang, Min; Xu, Hua
Named entity recognition (NER) is one of the fundamental tasks in natural language processing. In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been carried out on clinical notes written in Chinese. The goal of this study was to systematically investigate features and machine learning algorithms for NER in Chinese clinical text. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital in China. For each note, four types of entity-clinical problems, procedures, laboratory test, and medications-were annotated according to a predefined guideline. Two-thirds of the 400 notes were used to train the NER systems and one-third for testing. We investigated the effects of different types of feature including bag-of-characters, word segmentation, part-of-speech, and section information, and different machine learning algorithms including conditional random fields (CRF), support vector machines (SVM), maximum entropy (ME), and structural SVM (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset and evaluated on the test set, and micro-averaged precision, recall, and F-measure were reported. Our evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited. The system achieved the highest performance by combining word segmentation and section information, indicating that these two types of feature complement each other. When the same types of optimized feature were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM achieved the highest performance of the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries, respectively. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Wu, Yonghui; Xu, Jun; Jiang, Min; Zhang, Yaoyun; Xu, Hua
Clinical Named Entity Recognition (NER) is a critical task for extracting important patient information from clinical text to support clinical and translational research. This study explored the neural word embeddings derived from a large unlabeled clinical corpus for clinical NER. We systematically compared two neural word embedding algorithms and three different strategies for deriving distributed word representations. Two neural word embeddings were derived from the unlabeled Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpus (403,871 notes). The results from both 2010 i2b2 and 2014 Semantic Evaluation (SemEval) data showed that the binarized word embedding features outperformed other strategies for deriving distributed word representations. The binarized embedding features improved the F1-score of the Conditional Random Fields based clinical NER system by 2.3% on i2b2 data and 2.4% on SemEval data. The combined feature from the binarized embeddings and the Brown clusters improved the F1-score of the clinical NER system by 2.9% on i2b2 data and 2.7% on SemEval data. Our study also showed that the distributed word embedding features derived from a large unlabeled corpus can be better than the widely used Brown clusters. Further analysis found that the neural word embeddings captured a wide range of semantic relations, which could be discretized into distributed word representations to benefit the clinical NER system. The low-cost distributed feature representation can be adapted to any other clinical natural language processing research. PMID:26958273
This objective minimizes the quadratic error between the original video descriptions Y , and the reconstructed translations obtained from A and S...this purpose, we parse the grammatical structure of title captions using a probabilistic Figure 3: Terms from the VideoStory46K dataset occurring in...according to Eq. (2). Then the video embedding is learned separately, by minimizing the error of predicting the embedded descriptions from the videos
E.M. Van Mulligen (Erik M.); Z. Afzal (Zubair); S.A. Akhondi (Saber); D. Vo (Dang); J.A. Kors (Jan)
textabstractWe participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach.
Samson, S; Zatorre, R J
The role of left and right temporal lobes in memory for songs (words sung to a tune) was investigated. Patients who had undergone focal cerebral excision for the relief of intractable epilepsy along with normal control subjects were tested in 2 recognition memory tasks. The goal of Experiment 1 was to examine recognition of words and of tunes when they were presented together in an unfamiliar song. In Experiment 2, memory for spoken words and tunes sung without words was independently tested in 2 separate recognition tasks. The results clearly showed (a) a deficit after left temporal lobectomy in recognition of text whether sung to a tune or spoken without musical accompaniment, (b) impaired melody recognition when the tune was sung with new words following left or right temporal lobectomy and (c) impaired melody recognition in the absence of lyrics following right but not left temporal lobectomy. The different role of each temporal lobe in memorizing songs provides evidence for the use of dual memory codes. The verbal code is consistently related to left temporal lobe structures, whereas the melodie code my depend on either or both temporal lobe mechanisms, according to the type of encoding involved.
A key element to online learning is the ability to create a sense of presence to improve learning outcomes. This quasi-experimental study evaluated the impact of interactive video communication versus text-based feedback and found a significant difference between the 2 groups related to teaching, social, and cognitive presence. Recommendations to enhance presence should focus on providing timely feedback, interactive learning experiences, and opportunities for students to establish relationships with peers and faculty.
Pedersen, Kamilla; Holdgaard, Martin Møller; Paltved, Charlotte
on students' patient-centeredness. Video-based patient cases are probably more effective than text-based patient cases in fostering patient-centered perspectives in medical students. Teachers sharing stories from their own clinical experiences stimulates both engagement and excitement, but may also provoke......' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. METHODS: The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed....... Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. CONCLUSION: The format of patient cases included in teaching may have a substantial impact...
Full Text Available Increase in number of elderly people who are living independently needs especial care in the form of healthcare monitoring systems. Recent advancements in depth video technologies have made human activity recognition (HAR realizable for elderly healthcare applications. In this paper, a depth video-based novel method for HAR is presented using robust multi-features and embedded Hidden Markov Models (HMMs to recognize daily life activities of elderly people living alone in indoor environment such as smart homes. In the proposed HAR framework, initially, depth maps are analyzed by temporal motion identification method to segment human silhouettes from noisy background and compute depth silhouette area for each activity to track human movements in a scene. Several representative features, including invariant, multi-view differentiation and spatiotemporal body joints features were fused together to explore gradient orientation change, intensity differentiation, temporal variation and local motion of specific body parts. Then, these features are processed by the dynamics of their respective class and learned, modeled, trained and recognized with specific embedded HMM having active feature values. Furthermore, we construct a new online human activity dataset by a depth sensor to evaluate the proposed features. Our experiments on three depth datasets demonstrated that the proposed multi-features are efficient and robust over the state of the art features for human action and activity recognition.
Bajpai, Anvita; Pathangay, Vinod
In this paper, presence of the speaker-specific suprasegmental information in the Linear Prediction (LP) residual signal is demonstrated. The LP residual signal is obtained after removing the predictable part of the speech signal. This information, if added to existing speaker recognition systems based on segmental and subsegmental features, can result in better performing combined system. The speaker-specific suprasegmental information can not only be perceived by listening to the residual, but can also be seen in the form of excitation peaks in the residual waveform. However, the challenge lies in capturing this information from the residual signal. Higher order correlations among samples of the residual are not known to be captured using standard signal processing and statistical techniques. The Hilbert envelope of residual is shown to further enhance the excitation peaks present in the residual signal. A speaker-specific pattern is also observed in the autocorrelation sequence of the Hilbert envelope, and further in the statistics of this autocorrelation sequence. This indicates the presence of the speaker-specific suprasegmental information in the residual signal. In this work, no distinction between voiced and unvoiced sounds is done for extracting these features. Support Vector Machine (SVM) is used to classify the patterns in the variance of the autocorrelation sequence for the speaker recognition task.
Imran, Ali Shariq; Moreno Celleri, Alejandro Manuel; Cheikh, Faouzi Alaya
The usage of non-scripted lecture videos as a part of learning material is becoming an everyday activity in most of higher education institutions due to the growing interest in flexible and blended education. Generally these videos are delivered as part of Learning Objects (LO) through various
Holte, Michael Boelstoft; Tran, Cuong; Trivedi, Mohan
–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent...
Chen, S C; Shao, C L; Liang, C K; Lin, S W; Huang, T H; Hsieh, M C; Yang, C H; Luo, C H; Wuo, C M
In this paper, we present a text input system for the seriously disabled by using lips image recognition based on LabVIEW. This system can be divided into the software subsystem and the hardware subsystem. In the software subsystem, we adopted the technique of image processing to recognize the status of mouth-opened or mouth-closed depending the relative distance between the upper lip and the lower lip. In the hardware subsystem, parallel port built in PC is used to transmit the recognized result of mouth status to the Morse-code text input system. Integrating the software subsystem with the hardware subsystem, we implement a text input system by using lips image recognition programmed in LabVIEW language. We hope the system can help the seriously disabled to communicate with normal people more easily.
Full Text Available Textual information embedded in multimedia can provide a vital tool for indexing and retrieval. A lot of work is done in the field of text localization and detection because of its very fundamental importance. One of the biggest challenges of text detection is to deal with variation in font sizes and image resolution. This problem gets elevated due to the undersegmentation or oversegmentation of the regions in an image. The paper addresses this problem by proposing a solution using novel fuzzy-based method. This paper advocates postprocessing segmentation method that can solve the problem of variation in text sizes and image resolution. The methodology is tested on ICDAR 2011 Robust Reading Challenge dataset which amply proves the strength of the recommended method.
microtext) or a document (e.g., using Sphinx or Apache NLP ) as an automated approach . Previous work in natural language full-text searching...language processing ( NLP ) based module. The heart of the structured text processing module includes the following seven key word banks...Features Tracker MHT Multiple Hypothesis Tracking MIL Multiple Instance Learning NLP Natural Language Processing OAB Online AdaBoost OF Optic Flow
in the realm of academic research in the Type 3 environment. 13) Face Recognition to Improve Voice/ Iris Biometrics : Here, the system uses face...recognition as a supplementary biometric to increase confidence on a match made using a different biometric (for example iris , voice, or fingerprints...Voice/ Iris Biometrics + - - 14. Soft biometrics to improve face recognition - 1 Estimated readiness: The e-Gate environment was not evaluated in
Full Text Available In this study, traffic signs are aimed to be recognized and identified from a video image which is taken through a video camera. To accomplish our aim, a traffic sign recognition program has been developed in MATLAB/Simulink environment. The target traffic sign are recognized in the video image with the developed program.
Full Text Available Micro-expressions play an essential part in understanding non-verbal communication and deceit detection. They are involuntary, brief facial movements that are shown when a person is trying to conceal something. Automatic analysis of micro-expression is challenging due to their low amplitude and to their short duration (they occur as fast as 1/15 to 1/25 of a second. We propose a fully micro-expression analysis system consisting of a high-speed image acquisition setup and a software framework which can detect the frames when the micro-expressions occurred as well as determine the type of the emerged expression. The detection and classification methods use fast and simple motion descriptors based on absolute image differences. The recognition module it only involves the computation of several 2D Gaussian probabilities. The software framework was tested on two publicly available high speed micro-expression databases and the whole system was used to acquire new data. The experiments we performed show that our solution outperforms state of the art works which use more complex and computationally intensive descriptors.
Lalys, Florent; Riffaud, Laurent; Bouget, David; Jannin, Pierre
The need for a better integration of the new generation of Computer-Assisted-Surgical (CAS) systems has been recently emphasized. One necessity to achieve this objective is to retrieve data from the Operating Room (OR) with different sensors, then to derive models from these data. Recently, the use of videos from cameras in the OR has demonstrated its efficiency. In this paper, we propose a framework to assist in the development of systems for the automatic recognition of high level surgical tasks using microscope videos analysis. We validated its use on cataract procedures. The idea is to combine state-of-the-art computer vision techniques with time series analysis. The first step of the framework consisted in the definition of several visual cues for extracting semantic information, therefore characterizing each frame of the video. Five different pieces of image-based classifiers were therefore implemented. A step of pupil segmentation was also applied for dedicated visual cue detection. Time series classification algorithms were then applied to model time-varying data. Dynamic Time Warping (DTW) and Hidden Markov Models (HMM) were tested. This association combined the advantages of all methods for better understanding of the problem. The framework was finally validated through various studies. Six binary visual cues were chosen along with 12 phases to detect, obtaining accuracies of 94%. PMID:22203700
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random
Vision is only a part of a system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. These mechanisms provide a reliable recognition if the object is occluded or cannot be recognized as a whole. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, is the most feasible for such models. It converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Network-Symbolic Transformations derive abstract structures, which allows for invariant recognition of an object as exemplar of a class. Active vision helps creating consistent models. Attention, separation of figure from ground and perceptual grouping are special kinds of network-symbolic transformations. Such Image/Video Understanding Systems will be reliably recognizing targets.
Vision is only a part of a larger system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. This mechanism provides a reliable recognition if the target is occluded or cannot be recognized. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Logic of visual scenes can be captured in Network-Symbolic models and used for disambiguation of visual information. Network-Symbolic Transformations derive abstract structures, which allow for invariant recognition of an object as exemplar of a class. Active vision helps build consistent, unambiguous models. Such Image/Video Understanding Systems will be able reliably recognizing targets in real-world conditions.
Uddin, Md. Zia
In this paper, a novel spatiotemporal feature-based method is proposed to recognize facial expressions from depth video. Independent Component Analysis (ICA) spatial features of the depth faces of facial expressions are first augmented with the optical flow motion features. Then, the augmented features are enhanced by Fisher Linear Discriminant Analysis (FLDA) to make them robust. The features are then combined with on Hidden Markov Models (HMMs) to model different facial expressions that are later used to recognize appropriate expression from a test expression depth video. The experimental results show superior performance of the proposed approach over the conventional methods.
Full Text Available This paper proposes a supervised classification approach for the real-time pattern recognition of sows in an animal supervision system (asup. Our approach offers the possibility of the foreground subtraction in an asup’s image processing module where there is lack of statistical information regarding the background. A set of 7 farrowing sessions of sows, during day and night, have been captured (approximately 7 days/sow, which is used for this study. The frames of these recordings have been grabbed with a time shift of 20 s. A collection of 215 frames of 7 different sows with the same lighting condition have been marked and used as the training set. Based on small neighborhoods around a point, a number of image local features are defined, and their separability and performance metrics are compared. For the classification task, a feed-forward neural network (NN is studied and a realistic configuration in terms of an acceptable level of accuracy and computation time is chosen. The results show that the dense neighborhood feature (d.3 × 3 is the smallest local set of features with an acceptable level of separability, while it has no negative effect on the complexity of NN. The results also confirm that a significant amount of the desired pattern is accurately detected, even in situations where a portion of the body of a sow is covered by the crate’s elements. The performance of the proposed feature set coupled with our chosen configuration reached the rate of 8.5 fps. The true positive rate (TPR of the classifier is 84.6%, while the false negative rate (FNR is only about 3%. A comparison between linear logistic regression and NN shows the highly non-linear nature of our proposed set of features.
Humes, Larry E; Burk, Matthew H; Coughlin, Maureen P; Busey, Thomas A; Strauser, Lauren E
To examine age-related differences in auditory speech recognition and visual text recognition performance for parallel sets of stimulus materials in the auditory and visual modalities. In addition, the effects of variation in rate of presentation of stimuli in each modality were investigated in each age group. A mixed-model design was used in which 3 independent groups (13 young adults with normal hearing, 10 elderly adults with normal hearing, and 16 elderly hearing-impaired adults) listened to auditory speech tests (a sentence-in-noise task, time-compressed monosyllables, and a speeded-spelling task) and viewed visual text-based analogs of the auditory tests. All auditory speech materials were presented so that the amplitude of the speech signal was at least 15 dB above threshold through 4000 Hz. Analyses of the group data revealed that when baseline levels of performance were used as covariates in the group analyses the only significant group difference was that both elderly groups performed worse than the young group on the auditory speeded-speech tasks. Analysis of individual data, using correlations, factor analysis, and linear regression, was generally consistent with the group data and revealed significant, moderate correlations of performance for similar tasks across modalities, but stronger correlations across tasks within a modality. This suggests that performance on these tasks was mediated both by a common underlying factor, such as cognitive processing, as well as modality-specific processing. Performance on measures of auditory processing of speech examined here was closely associated with performance on parallel measures of the visual processing of text obtained from the same participants. Young and older adults demonstrated comparable abilities in the use of contextual information in each modality, but older adults, regardless of hearing status, had more difficulty with fast presentation of auditory speech stimuli than young adults. There were no
Full Text Available We have developed an intelligent agent to engage with users in virtual drama improvisation previously. The intelligent agent was able to perform sentence-level affect detection from user inputs with strong emotional indicators. However, we noticed that many inputs with weak or no affect indicators also contain emotional implication but were regarded as neutral expressions by the previous interpretation. In this paper, we employ latent semantic analysis to perform topic theme detection and identify target audiences for such inputs. We also discuss how such semantic interpretation of the dialog contexts is used to interpret affect more appropriately during virtual improvisation. Also, in order to build a reliable affect analyser, it is important to detect and combine weak affect indicators from other channels such as body language. Such emotional body language detection also provides a nonintrusive channel to detect users’ experience without interfering with the primary task. Thus, we also make initial exploration on affect detection from several universally accepted emotional gestures.
Huysmans, Elke; Bolk, Elske; Zekveld, Adriana A; Festen, Joost M; de Groot, Annette M B; Goverts, S Theo
The authors first examined the influence of moderate to severe congenital hearing impairment (CHI) on the correctness of samples of elicited spoken language. Then, the authors used this measure as an indicator of linguistic proficiency and examined its effect on performance in language reception, independent of bottom-up auditory processing. In groups of adults with normal hearing (NH, n = 22), acquired hearing impairment (AHI, n = 22), and moderate to severe CHI (n = 21), the authors assessed linguistic proficiency by analyzing the morphosyntactic correctness of their spoken language production. Language reception skills were examined with a task for masked sentence recognition in the visual domain (text), at a readability level of 50%, using grammatically correct sentences and sentences with distorted morphosyntactic cues. The actual performance on the tasks was compared between groups. Adults with CHI made more morphosyntactic errors in spoken language production than adults with NH, while no differences were observed between the AHI and NH group. This outcome pattern sustained when comparisons were restricted to subgroups of AHI and CHI adults, matched for current auditory speech reception abilities. The data yielded no differences between groups in performance in masked text recognition of grammatically correct sentences in a test condition in which subjects could fully take advantage of their linguistic knowledge. Also, no difference between groups was found in the sensitivity to morphosyntactic distortions when processing short masked sentences, presented visually. These data showed that problems with the correct use of specific morphosyntactic knowledge in spoken language production are a long-term effect of moderate to severe CHI, independent of current auditory processing abilities. However, moderate to severe CHI generally does not impede performance in masked language reception in the visual modality, as measured in this study with short, degraded
Kroll, Christine; von der Werth, Monika; Leuck, Holger; Stahl, Christoph; Schertler, Klaus
For Intelligence, Surveillance, Reconnaissance (ISR) missions of manned and unmanned air systems typical electrooptical payloads provide high-definition video data which has to be exploited with respect to relevant ground targets in real-time by automatic/assisted target recognition software. Airbus Defence and Space is developing required technologies for real-time sensor exploitation since years and has combined the latest advances of Deep Convolutional Neural Networks (CNN) with a proprietary high-speed Support Vector Machine (SVM) learning method into a powerful object recognition system with impressive results on relevant high-definition video scenes compared to conventional target recognition approaches. This paper describes the principal requirements for real-time target recognition in high-definition video for ISR missions and the Airbus approach of combining an invariant feature extraction using pre-trained CNNs and the high-speed training and classification ability of a novel frequency-domain SVM training method. The frequency-domain approach allows for a highly optimized implementation for General Purpose Computation on a Graphics Processing Unit (GPGPU) and also an efficient training of large training samples. The selected CNN which is pre-trained only once on domain-extrinsic data reveals a highly invariant feature extraction. This allows for a significantly reduced adaptation and training of the target recognition method for new target classes and mission scenarios. A comprehensive training and test dataset was defined and prepared using relevant high-definition airborne video sequences. The assessment concept is explained and performance results are given using the established precision-recall diagrams, average precision and runtime figures on representative test data. A comparison to legacy target recognition approaches shows the impressive performance increase by the proposed CNN+SVM machine-learning approach and the capability of real-time high
to a particular observation O and choose the Ai with maximum apriori probability. 124 While the LTL-based framework in Section 3 provides a...deterministic plan recognition technique that is not flexible enough to 125 incorporate probability distributions of the various apriori events, in most
Srilatha, V.; Venkatesh, Veeramuthu
Trustworthy contextual data of human action recognition of remotely monitored person who requires medical care should be generated to avoid hazardous situation and also to provide ubiquitous services in home-based care. It is difficult for numerous reasons.
Li, Wen; Chen, Lin; Xu, Dong; Van Gool, Luc
In this work, we propose a new framework for recognizing RGB images or videos by leveraging a set of labeled RGB-D data, in which the depth features can be additionally extracted from the depth images or videos. We formulate this task as a new unsupervised domain adaptation (UDA) problem, in which we aim to take advantage of the additional depth features in the source domain and also cope with the data distribution mismatch between the source and target domains. To handle the domain distribution mismatch, we propose to learn an optimal projection matrix to map the samples from both domains into a common subspace such that the domain distribution mismatch can be reduced. Moreover, we also propose different strategies to effectively utilize the additional depth features. To simultaneously cope with the above two issues, we formulate a unified learning framework called domain adaptation from multi-view to single-view (DAM2S). By defining various forms of regularizers in our DAM2S framework, different strategies can be readily incorporated to learn robust SVM classifiers for classifying the target samples. We conduct comprehensive experiments, which demonstrate the effectiveness of our proposed methods for recognizing RGB images and videos by learning from RGB-D data.
Woodham, Luke A; Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil
The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George's, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students' ability to review and critically appraise the presented information. Our findings suggest that text was perceived to be a better source of information than video in virtual
Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil
Background The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. Objective A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. Methods An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George’s, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Results Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students’ ability to review and critically appraise the presented information. Conclusions Our findings suggest that text was perceived to be a
Full Text Available The article examines the texts of political advertising video clips issued by the candidates for presidency in France during the campaign before the first round of elections in 2017. The mentioned examples of media texts are analysed from the compositional point of view as well as from that of the content particularities which are directly connected to the text structure. In general, the majority of the studied clips have a similar structure and consist of three parts: introduction, main part and conclusion. However, as a result of the research, a range of advantages marking well-structured videos was revealed. These include: addressing the voters and stating the speech topic clearly at the beginning of the clip, a relevant attention-grabbing opening phrase, consistency and clarity of the information presentation, appropriate use of additional video plots, conclusion at the end of the clip.
Verbal methods of realisation of addresser-addressee relations in French political media texts (through the example of the texts of political videos issued by the candidates for the French 2017 presidential election
Dmitrieva Anastasia Valerievna
Full Text Available The article deals with the addresser-addressee relations in the texts of French political advertising video clips from the verbal, textual point of view. The texts of video clips issued by the candidates for the French 2017 presidential election during the first round of the campaign serve as the material for this article. The aim of the article is to determine how the candidates (i.e. the addressers effectuate their relations with the voters (i.e. the addressees in the texts of their videos. As a result, a range of rhetorical methods were used by the candidates allowing them to attract maximum attention of the target audience. It makes the addressees trust the addresser and provide the desired perlocutionary effect.
Kei Long Cheung
Full Text Available Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1 the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI, and (2 the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM difference = −205.40, p = 0.00 and 12 months (LSM difference = −128.14, p = 0.03. Only video intervention resulted in lower average daily energy intake after one year (d = 0.12. Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring.
Thomas, N. Luke; Du, Yingzi; Muttineni, Sriharsha; Mang, Shing; Sran, Dylan
This paper presents a low-cost method for providing biometric verification for applications that do not require large database sizes. Existing portable iris recognition systems are typically self-contained and expensive. For some applications, low cost is more important than extremely discerning matching ability. In these instances, the proposed system could be implemented at low cost, with adequate matching performance for verification. Additionally, the proposed system could be used in conjunction with any image based biometric identification system. A prototype system was developed and tested on a small database, with promising preliminary results.
Full Text Available Various studies have discussed the pedagogical potential of video game play in the classroom but resistance to such texts remains high. The study presented here discusses the case study of one young boy who, having failed to learn to read in the public school system was able to learn in a private Sudbury model school where video games were not only allowed but considered important learning tools. Findings suggest that the incorporation of such new texts in today’s public schools have the potential to motivate and enhance the learning of children.
Aghdam, Mehran Alizadeh; Ogawa, Makoto; Iwahashi, Toshihiko; Hosokawa, Kiyohito; Kato, Chieri; Inohara, Hidenori
The purpose of this study was to assess whether or not high frame rate (HFR) videos recorded using high-speed digital imaging (HSDI) improve the visual recognition of the motions of the laryngopharyngeal structures during pharyngeal swallow in fiberoptic endoscopic evaluation of swallowing (FEES). Five healthy subjects were asked to swallow 0.5 ml water under fiberoptic nasolaryngoscopy. The endoscope was connected to a high-speed camera, which recorded the laryngopharyngeal view throughout the swallowing process at 4000 frames/s (fps). Each HFR video was then copied and downsampled into a standard frame rate (SFR) video version (30 fps). Fifteen otorhinolaryngologists observed all of the HFR/SFR videos in random order and rated the four-point ordinal scale reflecting the degree of visual recognition of the rapid laryngopharyngeal structure motions just before the 'white-out' phenomenon. Significantly higher scores, reflecting better visibility, were seen for the HFR videos compared with the SFR videos for the following laryngopharyngeal structures: the posterior pharyngeal wall (p = 0.001), left pharyngeal wall (p = 0.015), right lateral pharyngeal wall (p = 0.035), tongue base (p = 0.005), and epiglottis tilting (p = 0.005). However, when visualized with HFR and SFR, 'certainly clear observation' of the laryngeal structures was achieved in <50% of cases, because all the motions were not necessarily captured in each video. These results demonstrate the use of HSDI in FEES makes the motion perception of the laryngopharyngeal structures during pharyngeal swallow easier in comparison to SFR videos with equivalent image quality due to the ability of HSDI to depict the laryngopharyngeal motions in a continuous manner.
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...... include: re-identification, consumer behavior analysis, utilizing pupillary response for task difficulty measurement, logo detection, saliency prediction, classification of facial expressions, face recognition, face verification, age estimation, super-resolution, pose estimation, and pain recognition...
include: re-identification, consumer behavior analysis, utilizing pupillary response for task difficulty measurement, logo detection, saliency prediction, classification of facial expressions, face recognition, face verification, age estimation, super-resolution, pose estimation, and pain recognition......This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...
Roy, Partha Pratim
Descripció del recurs: el 13 setembre 2011 With the advent research of Document Image Analysis and Recognition (DIAR), an important line of research is explored on indexing and retrieval of graphics rich docu- ments. It aims at nding relevant documents relying on segmentation and recognition of text and graphics components underlying in non-standard layout where commercial OCRs can not be applied due to complexity. This thesis is focused towards text infor- mation extraction approaches in ...
Anderson-Inman, Lynne; Terrazas-Arellanes, Fatima E.
Expanded captions are designed to enhance the educational value by linking unfamiliar words to one of three types of information: vocabulary definitions, labeled illustrations, or concept maps. This study investigated the effects of expanded captions versus standard captions on the comprehension of educational video materials on DVD by secondary…
Hung, Yu-Wan; Higgins, Steve
This study investigates the different learning opportunities enabled by text-based and video-based synchronous computer-mediated communication (SCMC) from an interactionist perspective. Six Chinese-speaking learners of English and six English-speaking learners of Chinese were paired up as tandem (reciprocal) learning dyads. Each dyad participated…
Walsh-Buhi, Eric R.; Helmy, Hannah; Harsch, Kristin; Rella, Natalie; Godcharles, Cheryl; Ogunrunde, Adejoke; Lopez Castillo, Humberto
Objective: This paper reports on a pilot study evaluating the feasibility and acceptability of a text- and mobile video-based intervention to educate women and men attending college about non-daily contraception, with a particular focus on long-acting reversible contraception (LARC). A secondary objective is to describe the process of intervention…
fifin naili rizkiyah
Full Text Available Abstract: This research is aimed at finding out how Process-Genre Based Approach strategy with YouTube Videos as the media are employed to improve the students� ability in writing hortatory exposition texts. This study uses collaborative classroom action research design following the procedures namely planning, implementing, observing, and reflecting. The procedures of carrying out the strategy are: (1 relating several issues/ cases to the students� background knowledge and introducing the generic structures and linguistic features of hortatory exposition text as the BKoF stage, (2 analyzing the generic structure and the language features used in the text and getting model on how to write a hortatory exposition text by using the YouTube Video as the MoT stage, (3 writing a hortatory exposition text collaboratively in a small group and in pairs through process writing as the JCoT stage, and (4 writing a hortatory exposition text individually as the ICoT stage. The result shows that the use of Process-Genre Based Approach and YouTube Videos can improve the students� ability in writing hortatory exposition texts. The percentage of the students achieving the score above the minimum passing grade (70 had improved from only 15.8% (3 out of 19 students in the preliminary study to 100% (22 students in the Cycle 1. Besides, the score of each aspect; content, organization, vocabulary, grammar, and mechanics also improved. � Key Words: writing ability, hortatory exposition text, process-genre based approach, youtube video
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. PMID:28335510
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real W...
Gur, Michal; Nir, Vered; Teleshov, Anna; Bar-Yoseph, Ronen; Manor, Eynav; Diab, Gizelle; Bentur, Lea
Background Poor communications between cystic fibrosis (CF) patients and health-care providers may result in gaps in knowledge and misconceptions about medication usage, and can lead to poor adherence. We aimed to assess the feasibility of using WhatsApp and Skype to improve communications. Methods This single-centre pilot study included CF patients who were older than eight years of age assigned to two groups: one without intervention (control group), and one with intervention. Each patient from the intervention group received Skype-based online video chats and WhatsApp messages from members of the multidisciplinary CF team. CF questionnaires, revised (CFQ-R) scores, knowledge and adherence based on CF My Way and patients satisfaction were evaluated before and after three months. Feasibility was assessed by session attendance, acceptability and satisfaction survey. Descriptive analysis and paired and non-paired t-tests were used as applicable. Results Eighteen patients were recruited to this feasibility study (nine in each group). Each intervention group participant had between four and six Skype video chats and received 22-45 WhatsApp messages. In this small study, CFQ-R scores, knowledge, adherence and patient satisfaction were similar in both groups before and after the three-month intervention. Conclusions A telehealth-based approach, using Skype video chats and WhatsApp messages, was feasible and acceptable in this pilot study. A larger and longer multi-centre study is warranted to examine the efficacy of these interventions to improve knowledge, adherence and communication.
Full Text Available Abstract Action recognition from video is a problem that has many important applications to human motion analysis. In real-world settings, the viewpoint of the camera cannot always be fixed relative to the subject, so view-invariant action recognition methods are needed. Previous view-invariant methods use multiple cameras in both the training and testing phases of action recognition or require storing many examples of a single action from multiple viewpoints. In this paper, we present a framework for learning a compact representation of primitive actions (e.g., walk, punch, kick, sit that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation. Using our method, which models the low-dimensional structure of these actions relative to viewpoint, we show recognition rates on a publicly available dataset previously only achieved using multiple simultaneous views.
Full Text Available This paper proposes novel framework for facial expressions analysis using dynamic and static information in video sequences. First, based on incremental formulation, discriminative deformable face alignment method is adapted to locate facial points to correct in-plane head rotation and break up facial region from background. Then, spatial-temporal motion local binary pattern (LBP feature is extracted and integrated with Gabor multiorientation fusion histogram to give descriptors, which reflect static and dynamic texture information of facial expressions. Finally, a one-versus-one strategy based multiclass support vector machine (SVM classifier is applied to classify facial expressions. Experiments on Cohn-Kanade (CK + facial expression dataset illustrate that integrated framework outperforms methods using single descriptors. Compared with other state-of-the-art methods on CK+, MMI, and Oulu-CASIA VIS datasets, our proposed framework performs better.
Eka Bayu Pramanca
Full Text Available This research discusses about how two different techniques affect the students’ ability in descriptive text at SMP N 2 Metro. The objectives of this research are (1 to know the difference result of using YouTube Downloaded Video and Serial Pictures media toward students’ writing ability in descriptive text and (2 to know which one is more effective of students’ writing ability in descriptive text instruction between learning by using YouTube Downloaded Video and Serial Pictures media. The implemented method is quantitative research design in that both researchers use true experimental research design. In this research , experimental and control class pre-test and post test are conducted. It is carried out at the first grade of SMP N 2 Metro in academic year 2012/2013. The population in this research is 7 different classes with total number of 224 students. 2 classes of the total population are taken as the samples; VII.1 students in experimental class and VII.2 students in control class by using cluster random sampling technique. The instruments of the research are tests, treatment and post-test. The data analyzing procedure uses t-test and results the following output. The result of ttest is 3,96 and ttable is 2,06. It means that tcount > ttable with the criterion of ttest is Ha is accepted if tcount > ttable. So, there is any difference result of students’ writing ability using YouTube Downloaded Video and Serial Pictures Media. However; Youtube Downloaded Video media is more effective media than Serial Pictures media toward students’ writing ability. This research is consistent with the previous result of the studies and thus this technique is recommended to use in writing instruction especially in descriptive text in order that students may feel fun and enjoy during the learning process.
A. I. Logvin
Full Text Available The article discusses algorithm for automatic runway detection on video sequences. The main stages of algorithm are represented. Some methods to increase reliability of recognition are described.
Full Text Available We present a user-based method that detects regions of interest within a video in order to provide video skims and video summaries. Previous research in video retrieval has focused on content-based techniques, such as pattern recognition algorithms that attempt to understand the low-level features of a video. We are proposing a pulse modeling method, which makes sense of a web video by analyzing users' Replay interactions with the video player. In particular, we have modeled the user information seeking behavior as a time series and the semantic regions as a discrete pulse of fixed width. Then, we have calculated the correlation coefficient between the dynamically detected pulses at the local maximums of the user activity signal and the pulse of reference. We have found that users' Replay activity significantly matches the important segments in information-rich and visually complex videos, such as lecture, how-to, and documentary. The proposed signal processing of user activity is complementary to previous work in content-based video retrieval and provides an additional user-based dimension for modeling the semantics of a social video on the web.
Koelstra, S.; Yazdani, A.; Soleymani, M.; Mühl, C.; Lee, J.-L.; Nijholt, Antinus; Pun, T.; Ebrahimi, T.; Patras, I.; Yao, Y.; Sun, R.; Poggio, T.; Liu, J.; Zhong, N.; Huang, J.
Recently, the field of automatic recognition of users' affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some
Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru
In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...... World Videos. The workshops were run on December 4, 2016, in Cancun in Mexico. The two workshops together received 13 papers. Each paper was then reviewed by at least two expert reviewers in the field. In all, 11 papers were accepted to be presented at the workshops. The topics covered in the papers...
Full Text Available ... Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos was designed ... Activity Role of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic ...
Tuna, Tayfun; Subhlok, Jaspal; Barker, Lecia; Shah, Shishir; Johnson, Olin; Hovey, Christopher
Videos of classroom lectures have proven to be a popular and versatile learning resource. A key shortcoming of the lecture video format is accessing the content of interest hidden in a video. This work meets this challenge with an advanced video framework featuring topical indexing, search, and captioning (ICS videos). Standard optical character recognition (OCR) technology was enhanced with image transformations for extraction of text from video frames to support indexing and search. The images and text on video frames is analyzed to divide lecture videos into topical segments. The ICS video player integrates indexing, search, and captioning in video playback providing instant access to the content of interest. This video framework has been used by more than 70 courses in a variety of STEM disciplines and assessed by more than 4000 students. Results presented from the surveys demonstrate the value of the videos as a learning resource and the role played by videos in a students learning process. Survey results also establish the value of indexing and search features in a video platform for education. This paper reports on the development and evaluation of ICS videos framework and over 5 years of usage experience in several STEM courses.
Full Text Available ... NEI YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration ... Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: ...
Full Text Available ... YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia ... of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ...
Shadiev, Rustam; Wu, Ting-Ting; Huang, Yueh-Min
In this study, we provide STR-texts to non-native English speaking students during English lectures to facilitate learning, attention, and meditation. We carry out an experiment to test the feasibility of our approach. Our results show that the participants in the experimental group both outperform those in the control group on the post-tests and…
Nakhmani, Arie; Tannenbaum, Allen
We propose two novel distance measures, normalized between 0 and 1, and based on normalized cross-correlation for image matching. These distance measures explicitly utilize the fact that for natural images there is a high correlation between spatially close pixels. Image matching is used in various computer vision tasks, and the requirements to the distance measure are application dependent. Image recognition applications require more shift and rotation robust measures. In contrast, registration and tracking applications require better localization and noise tolerance. In this paper, we explore different advantages of our distance measures, and compare them to other popular measures, including Normalized Cross-Correlation (NCC) and Image Euclidean Distance (IMED). We show which of the proposed measures is more appropriate for tracking, and which is appropriate for image recognition tasks.
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Mukhtar, Omar; Setlur, Srirangaraj; Govindaraju, Venu
Urdu is a language spoken in the Indian subcontinent by an estimated 130-270 million speakers. At the spoken level, Urdu and Hindi are considered dialects of a single language because of shared vocabulary and the similarity in grammar. At the written level, however, Urdu is much closer to Arabic because it is written in Nastaliq, the calligraphic style of the Persian-Arabic script. Therefore, a speaker of Hindi can understand spoken Urdu but may not be able to read written Urdu because Hindi is written in Devanagari script, whereas an Arabic writer can read the written words but may not understand the spoken Urdu. In this chapter we present an overview of written Urdu. Prior research in handwritten Urdu OCR is very limited. We present (perhaps) the first system for recognizing handwritten Urdu words. On a data set of about 1300 handwritten words, we achieved an accuracy of 70% for the top choice, and 82% for the top three choices.
tem r a new## n> vo i j∂$ PROWCf H VMUABU’tfBH’T# AtfAtYsβ fWA) CHMMiW $℮&№tJTflnTwi4S f>f |?0sεβ«εwTiNάr THc rW* se^ngwArioti (t-ev tH f W)&r*f ΰFON...device, e.g., a CRT (cathode ray tube) or LCD ( liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing
Elbouz, Marwa; Alfalou, Ayman; Brosseau, Christian
Home automation is being implemented into more and more domiciles of the elderly and disabled in order to maintain their independence and safety. For that purpose, we propose and validate a surveillance video system, which detects various posture-based events. One of the novel points of this system is to use adapted Vander-Lugt correlator (VLC) and joint-transfer correlator (JTC) techniques to make decisions on the identity of a patient and his three-dimensional (3-D) positions in order to overcome the problem of crowd environment. We propose a fuzzy logic technique to get decisions on the subject's behavior. Our system is focused on the goals of accuracy, convenience, and cost, which in addition does not require any devices attached to the subject. The system permits one to study and model subject responses to behavioral change intervention because several levels of alarm can be incorporated according different situations considered. Our algorithm performs a fast 3-D recovery of the subject's head position by locating eyes within the face image and involves a model-based prediction and optical correlation techniques to guide the tracking procedure. The object detection is based on (hue, saturation, value) color space. The system also involves an adapted fuzzy logic control algorithm to make a decision based on information given to the system. Furthermore, the principles described here are applicable to a very wide range of situations and robust enough to be implementable in ongoing experiments.
Ankit Kumar Agrawal
Full Text Available Abstract The amount of images and videos being shared by the user is exponentially increasing but applications that perform video analytics is severely lacking or work on limited set of data. It is also challenging to perform analytics with less time complexity. Object recognition is the primary step in video analytics. We implement a robust method to extract objects from the data which is in unstructured format and cannot be processed directly by relational databases. In this study we present our report with results after performance evaluation and compare them with results of MATLAB.
Full Text Available Video has become an interactive medium of communication in everyday life. The sheer volume of video makes it extremely difficult to browse through and find the required data. Hence extraction of key frames from the video which represents the abstract of the entire video becomes necessary. The aim of the video shot detection is to find the position of the shot boundaries, so that key frames can be selected from each shot for subsequent processing such as video summarization, indexing etc. For most of the surveillance applications like video summery, face recognition etc., the hardware (real time implementation of these algorithms becomes necessary. Here in this paper we present the architecture for simultaneous accessing of consecutive frames, which are then used for the implementation of various Video Shot Detection algorithms. We also present the real time implementation of three video shot detection algorithms using the above mentioned architecture on FPGA (Field Programmable Gate Arrays.
Vandelanotte, Corneel; Duncan, Mitch J; Plotnikoff, Ronald C; Mummery, W Kerry
In randomized controlled trials, participants cannot choose their preferred intervention delivery mode and thus might refuse to participate or not engage fully if assigned to a nonpreferred group. This might underestimate the true effectiveness of behavior-change interventions. To examine whether receiving interventions either matched or mismatched with participants' preferred delivery mode would influence effectiveness of a Web-based physical activity intervention. Adults (n = 863), recruited via email, were randomly assigned to one of three intervention delivery modes (text based, video based, or combined) and received fully automated, Internet-delivered personal advice about physical activity. Personalized intervention content, based on the theory of planned behavior and stages of change concept, was identical across groups. Online, self-assessed questionnaires measuring physical activity were completed at baseline, 1 week, and 1 month. Physical activity advice acceptability and website usability were assessed at 1 week. Before randomization, participants were asked which delivery mode they preferred, to categorize them as matched or mismatched. Time spent on the website was measured throughout the intervention. We applied intention-to-treat, repeated-measures analyses of covariance to assess group differences. Attrition was high (575/863, 66.6%), though equal between groups (t(86) (3) =1.31, P =.19). At 1-month follow-up, 93 participants were categorized as matched and 195 as mismatched. They preferred text mode (493/803, 61.4%) over combined (216/803, 26.9%) and video modes (94/803, 11.7%). After the intervention, 20% (26/132) of matched-group participants and 34% (96/282) in the mismatched group changed their delivery mode preference. Time effects were significant for all physical activity outcomes (total physical activity: F(2,801) = 5.07, P = .009; number of activity sessions: F(2,801) = 7.52, P < .001; walking: F(2,801) = 8.32, P < .001; moderate physical
Full Text Available Recently face recognition is attracting much attention in the society of network multimedia information access. Areas such as network security, content indexing and retrieval, and video compression benefits from face recognition technology because "people" are the center of attention in a lot of video. Network access control via face recognition not only makes hackers virtually impossible to steal one's "password", but also increases the user-friendliness in human-computer interaction. Indexing and/or retrieving video data based on the appearances of particular persons will be useful for users such as news reporters, political scientists, and moviegoers. For the applications of videophone and teleconferencing, the assistance of face recognition also provides a more efficient coding scheme. In this paper, we give an introductory course of this new information processing technology. The paper shows the readers the generic framework for the face recognition system, and the variants that are frequently encountered by the face recognizer. Several famous face recognition algorithms, such as eigenfaces and neural networks, will also be explained.
Full Text Available The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method.
Full Text Available In video surveillance applications, trained operators watch a number of screens simultaneously to detect potential security threats. Looking for such events in real time, in multiple videos simultaneously, is cognitively challenging for human operators. This study suggests that there is a significant need to use an automated video analysis system to aid human perception of security events in video surveillance applications. In this paper the performance of humans in observing a simulated environment is studied and quantified. Furthermore, this paper proposes an automated mechanism to detect events before they occur by means of an automated intent recognition system. Upon the detection of a potential event the proposed mechanism communicates the location of such potential threat to the human operator to redirect attention to the areas of interest within the video. Studying the improvements achieved by applying the intent recognition into the simulated video surveillance application in a two phase trial supports the need for an automated event detection approach in improving human video surveillance performance. Moreover, this paper presents a comparison of the performance in video surveillance with and without the aid of the intent recognition mechanism.
Full Text Available The understanding of ecosystem dynamics in deep-sea areas is to date limited by technical constraints on sampling repetition. We have elaborated a morphometry-based protocol for automated video-image analysis where animal movement tracking (by frame subtraction is accompanied by species identification from animals’ outlines by Fourier Descriptors and Standard K-Nearest Neighbours methods. One-week footage from a permanent video-station located at 1,100 m depth in Sagami Bay (Central Japan was analysed. Out of 150,000 frames (1 per 4 s, a subset of 10.000 was analyzed by a trained operator to increase the efficiency of the automated procedure. Error estimation of the automated and trained operator procedure was computed as a measure of protocol performance. Three displacing species were identified as the most recurrent: Zoarcid fishes (eelpouts, red crabs (Paralomis multispina, and snails (Buccinum soyomaruae. Species identification with KNN thresholding produced better results in automated motion detection. Results were discussed assuming that the technological bottleneck is to date deeply conditioning the exploration of the deep-sea.
Full Text Available Random numbers are very useful in simulation, chaos theory, game theory, information theory, pattern recognition, probability theory, quantum mechanics, statistics, and statistical mechanics. The random numbers are especially helpful in cryptography. In this work, the proposed random number generators come from white noise of audio and video (A/V sources which are extracted from high-resolution IPCAM, WEBCAM, and MPEG-1 video files. The proposed generator applied on video sources from IPCAM and WEBCAM with microphone would be the true random number generator and the pseudorandom number generator when applied on video sources from MPEG-1 video file. In addition, when applying NIST SP 800-22 Rev.1a 15 statistics tests on the random numbers generated from the proposed generator, around 98% random numbers can pass 15 statistical tests. Furthermore, the audio and video sources can be found easily; hence, the proposed generator is a qualified, convenient, and efficient random number generator.
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Full Text Available ... Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing Chronic Pain and Depression ...
Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos was designed to help you learn more about Rheumatoid Arthritis (RA). You will learn how the diagnosis of ...
Full Text Available ... questions Clinical Studies Publications Catalog Photos and Images Spanish Language Information Grants and Funding Extramural Research Division ... Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video ...
Full Text Available ... support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ... group for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support ...
Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles News Resources Links Videos Podcasts Webinars For the ... Doctor Find a Provider Meet the Team Blog Articles News Provider Directory Donate Resources Links Videos Podcasts ...
Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...
Full Text Available ... Back Support Groups Is a support group for me? Find a Group Upcoming Events Video Library Photo ... Support Groups Back Is a support group for me? Find a group Back Upcoming events Video Library ...
Full Text Available ... group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork ... for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ...
Full Text Available ... the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media For Clinicians For ... Family Caregivers Glossary Menu In this section Links Videos Podcasts Webinars For the Media For Clinicians For ...
Full Text Available ... a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media ... a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos Podcasts Webinars ...
Full Text Available ... for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
Full Text Available ... News Resources Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary ... this section Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary ...
questions of our media literacy pertaining to authoring multimodal texts (visual, verbal, audial, etc.) in research practice and the status of multimodal texts in academia. The implications of academic video extend to wider issues of how researchers harness opportunities to author different types of texts......Is video becoming “the new black” in academia, if so, what are the challenges? The integration of video in research methodology (for collection, analysis) is well-known, but the use of “academic video” for dissemination is relatively new (Eriksson and Sørensen). The focus of this paper is academic...... video, or short video essays produced for the explicit purpose of communicating research processes, topics, and research-based knowledge (see the journal of academic videos: www.audiovisualthinking.org). Video is increasingly used in popular showcases for video online, such as YouTube and Vimeo, as well...
Full Text Available Purpose: The represented research results are aimed to improve theoretical basics of computer vision and artificial intelligence of dynamical system. Proposed approach of object detection and recognition is based on probabilistic fundamentals to ensure the required level of correct object recognition. Methods: Presented approach is grounded at probabilistic methods, statistical methods of probability density estimation and computer-based simulation at verification stage of development. Results: Proposed approach for object detection and recognition for video stream data processing has shown several advantages in comparison with existing methods due to its simple realization and small time of data processing. Presented results of experimental verification look plausible for object detection and recognition in video stream. Discussion: The approach can be implemented in dynamical system within changeable environment such as remotely piloted aircraft systems and can be a part of artificial intelligence in navigation and control systems.
Wang, Jiang; Wu, Ying
Action recognition technology has many real-world applications in human-computer interaction, surveillance, video retrieval, retirement home monitoring, and robotics. The commoditization of depth sensors has also opened up further applications that were not feasible before. This text focuses on feature representation and machine learning algorithms for action recognition from depth sensors. After presenting a comprehensive overview of the state of the art, the authors then provide in-depth descriptions of their recently developed feature representations and machine learning techniques, includi
Stanczyk, Nicola E; Smit, Eline S; Schulz, Daniela N; de Vries, Hein; Bolman, Catherine; Muris, Jean W M; Evers, Silvia M A A
Although evidence exists for the effectiveness of web-based smoking cessation interventions, information about the cost-effectiveness of these interventions is limited. The study investigated the cost-effectiveness and cost-utility of two web-based computer-tailored (CT) smoking cessation interventions (video- vs. text-based CT) compared to a control condition that received general text-based advice. In a randomized controlled trial, respondents were allocated to the video-based condition (N = 670), the text-based condition (N = 708) or the control condition (N = 721). Societal costs, smoking status, and quality-adjusted life years (QALYs; EQ-5D-3L) were assessed at baseline, six-and twelve-month follow-up. The incremental costs per abstinent respondent and per QALYs gained were calculated. To account for uncertainty, bootstrapping techniques and sensitivity analyses were carried out. No significant differences were found in the three conditions regarding demographics, baseline values of outcomes and societal costs over the three months prior to baseline. Analyses using prolonged abstinence as outcome measure indicated that from a willingness to pay of €1,500, the video-based intervention was likely to be the most cost-effective treatment, whereas from a willingness to pay of €50,400, the text-based intervention was likely to be the most cost-effective. With regard to cost-utilities, when quality of life was used as outcome measure, the control condition had the highest probability of being the most preferable treatment. Sensitivity analyses yielded comparable results. The video-based CT smoking cessation intervention was the most cost-effective treatment for smoking abstinence after twelve months, varying the willingness to pay per abstinent respondent from €0 up to €80,000. With regard to cost-utility, the control condition seemed to be the most preferable treatment. Probably, more time will be required to assess changes in quality of life
Stanczyk, Nicola E.; Smit, Eline S.; Schulz, Daniela N.; de Vries, Hein; Bolman, Catherine; Muris, Jean W. M.; Evers, Silvia M. A. A.
Background Although evidence exists for the effectiveness of web-based smoking cessation interventions, information about the cost-effectiveness of these interventions is limited. Objective The study investigated the cost-effectiveness and cost-utility of two web-based computer-tailored (CT) smoking cessation interventions (video- vs. text-based CT) compared to a control condition that received general text-based advice. Methods In a randomized controlled trial, respondents were allocated to the video-based condition (N = 670), the text-based condition (N = 708) or the control condition (N = 721). Societal costs, smoking status, and quality-adjusted life years (QALYs; EQ-5D-3L) were assessed at baseline, six-and twelve-month follow-up. The incremental costs per abstinent respondent and per QALYs gained were calculated. To account for uncertainty, bootstrapping techniques and sensitivity analyses were carried out. Results No significant differences were found in the three conditions regarding demographics, baseline values of outcomes and societal costs over the three months prior to baseline. Analyses using prolonged abstinence as outcome measure indicated that from a willingness to pay of €1,500, the video-based intervention was likely to be the most cost-effective treatment, whereas from a willingness to pay of €50,400, the text-based intervention was likely to be the most cost-effective. With regard to cost-utilities, when quality of life was used as outcome measure, the control condition had the highest probability of being the most preferable treatment. Sensitivity analyses yielded comparable results. Conclusion The video-based CT smoking cessation intervention was the most cost-effective treatment for smoking abstinence after twelve months, varying the willingness to pay per abstinent respondent from €0 up to €80,000. With regard to cost-utility, the control condition seemed to be the most preferable treatment. Probably, more time will be
Full Text Available We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL and the cued speech (CS language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.
Full Text Available We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL and the cued speech (CS language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.
Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.
Poh, N.; Chan, C.H.; Kittler, J.; Marcel, S.; Mc Cool, C.; Argones Rúa, E.; Alba Castro, J.L.; Villegas, M.; Paredes, R.; Štruc, V.; Pavešić, N.; Salah, A.A.; Fang, H.; Costen, N.
Person recognition using facial features, e.g., mug-shot images, has long been used in identity documents. However, due to the widespread use of web-cams and mobile devices embedded with a camera, it is now possible to realize facial video recognition, rather than resorting to just still images. In
Full Text Available The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.
Full Text Available ... search for current job openings visit HHS USAJobs Home > NEI YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Full Text Available ... Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia NEI Home Contact Us A-Z Site Map NEI on Social Media Information in Spanish (Información en español) Website, ...
This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV
Amir, Arnon; Srinivasan, Savitha; Efrat, Alon
The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.
Full Text Available Person identification plays an important role in semantic analysis of video content. This paper presents a novel method to automatically label persons in video sequence captured from fixed camera. Instead of leveraging traditional face recognition approaches, we deal with the task of person identification by fusing information from motion sensor platforms, like smart phones, carried on human bodies and extracted from camera video. More specifically, a sequence of motion features extracted from camera video are compared with each of those collected from accelerometers of smart phones. When strong correlation is detected, identity information transmitted from the corresponding smart phone is used to identify the phone wearer. To test the feasibility and efficiency of the proposed method, extensive experiments are conducted which achieved impressive performance.
Full Text Available The discourses of crisis in the humanities is juxtaposed with an analysis of remix video practices to suggest that the cognitive and cultural engagement feared lost in the former appear with frequency and enthusiasm in the latter. Whether humanists focus on the deleterious effects of the digital or celebrate the digital humanities but resist a turn to computation, their anxieties turn to the disappearance of textual analysis, aesthetics, critique, and self-reflection. Remix video, as exemplified by mashups, trailer remixes, and vids, depends on these same competencies for the creation and circulation of its works. Remix video is not the answer to the crises of the humanities; rather, the recognition of a common set of practices, skills, and values underpinning scholars and video practitioners' work provides the basis for a coalitional approach: identification of shared opportunities to promote and engage potential participants in the modes of thinking and production that contend with complex cultural ideas.
Full Text Available Abstract We present a multilevel system architecture for intelligent environments equipped with omnivideo arrays. In order to gain unobtrusive human awareness, real-time 3D human tracking as well as robust video-based face detection and tracking and face recognition algorithms are needed. We first propose a multiprimitive face detection and tracking loop to crop face videos as the front end of our face recognition algorithm. Both skin-tone and elliptical detections are used for robust face searching, and view-based face classification is applied to the candidates before updating the Kalman filters for face tracking. For video-based face recognition, we propose three decision rules on the facial video segments. The majority rule and discrete HMM (DHMM rule accumulate single-frame face recognition results, while continuous density HMM (CDHMM works directly with the PCA facial features of the video segment for accumulated maximum likelihood (ML decision. The experiments demonstrate the robustness of the proposed face detection and tracking scheme and the three streaming face recognition schemes with 99% accuracy of the CDHMM rule. We then experiment on the system interactions with single person and group people by the integrated layers of activity awareness. We also discuss the speech-aided incremental learning of new faces.
Mohan M. Trivedi
Full Text Available We present a multilevel system architecture for intelligent environments equipped with omnivideo arrays. In order to gain unobtrusive human awareness, real-time 3D human tracking as well as robust video-based face detection and tracking and face recognition algorithms are needed. We first propose a multiprimitive face detection and tracking loop to crop face videos as the front end of our face recognition algorithm. Both skin-tone and elliptical detections are used for robust face searching, and view-based face classification is applied to the candidates before updating the Kalman filters for face tracking. For video-based face recognition, we propose three decision rules on the facial video segments. The majority rule and discrete HMM (DHMM rule accumulate single-frame face recognition results, while continuous density HMM (CDHMM works directly with the PCA facial features of the video segment for accumulated maximum likelihood (ML decision. The experiments demonstrate the robustness of the proposed face detection and tracking scheme and the three streaming face recognition schemes with 99% accuracy of the CDHMM rule. We then experiment on the system interactions with single person and group people by the integrated layers of activity awareness. We also discuss the speech-aided incremental learning of new faces.
Full Text Available A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients (ROI-IHOG feature extraction method is proposed later. A support vector machine (SVM classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.
Besser, Jana; Zekveld, Adriana A; Kramer, Sophia E; Rönnberg, Jerker; Festen, Joost M
In this research, the authors aimed to increase the analogy between Text Reception Threshold (TRT; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) and Speech Reception Threshold (SRT; Plomp & Mimpen, 1979) and to examine the TRT's value in estimating cognitive abilities that are important for speech comprehension in noise. The authors administered 5 TRT versions, SRT tests in stationary (SRT(STAT)) and modulated (SRT(MOD)) noise, and 2 cognitive tests: a reading span (RSpan) test for working memory capacity and a letter-digit substitution test for information-processing speed. Fifty-five adults with normal hearing (18-78 years, M = 44 years) participated. The authors examined mutual associations of the tests and their predictive value for the SRTs with correlation and linear regression analyses. SRTs and TRTs were well associated, also when controlling for age. Correlations for the SRT(STAT) were generally lower than for the SRT(MOD.) The cognitive tests were correlated to the SRTs only when age was not controlled for. Age and the TRTs were the only significant predictors of SRT(MOD). SRT(STAT) was predicted by level of education and some of the TRT versions. TRTs and SRTs are robustly associated, nearly independent of age. The association between SRTs and RSpan is largely age dependent. The TRT test and the RSpan test measure different nonauditory components of linguistic processing relevant for speech perception in noise.
Chernyshov Alexander V.
Full Text Available The article focuses on the origins of the song videos as TV and Internet-genre. In addition, it considers problems of screen images creation depending on the musical form and the text of a songs in connection with relevant principles of accent and phraseological video editing and filming techniques as well as with additional frames and sound elements.
This book provides a unique view of human activity recognition, especially fine-grained human activity structure learning, human-interaction recognition, RGB-D data based action recognition, temporal decomposition, and causality learning in unconstrained human activity videos. The techniques discussed give readers tools that provide a significant improvement over existing methodologies of video content understanding by taking advantage of activity recognition. It links multiple popular research fields in computer vision, machine learning, human-centered computing, human-computer interaction, image classification, and pattern recognition. In addition, the book includes several key chapters covering multiple emerging topics in the field. Contributed by top experts and practitioners, the chapters present key topics from different angles and blend both methodology and application, composing a solid overview of the human activity recognition techniques. .
Full Text Available As academics we study, research and teach audiovisual media, yet rarely disseminate and mediate through it. Today, developments in production technologies have enabled academic researchers to create videos and mediate audiovisually. In academia it is taken for granted that everyone can write a text. Is it now time to assume that everyone can make a video essay? Using the online journal of academic videos Audiovisual Thinking and the videos published in it as a case study, this article seeks to reflect on the emergence and legacy of academic audiovisual dissemination. Anchoring academic video and audiovisual dissemination of knowledge in two critical traditions, documentary theory and semiotics, we will argue that academic video is in fact already present in a variety of academic disciplines, and that academic audiovisual essays are bringing trends and developments that have long been part of academic discourse to their logical conclusion.
Bornoe, Nis; Barkhuus, Louise
Microblogging is a recently popular phenomenon and with the increasing trend for video cameras to be built into mobile phones, a new type of microblogging has entered the arena of electronic communication: video microblogging. In this study we examine video microblogging, which is the broadcasting...... of short videos. A series of semi-structured interviews offers an understanding of why and how video microblogging is used and what the users post and broadcast....
Full Text Available ... Grants and Funding Extramural Research Division of Extramural Science Programs Division of Extramural Activities Extramural Contacts NEI ... Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded ...
Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five ... was designed to help you learn more about Rheumatoid Arthritis (RA). You will learn how the diagnosis ...
Full Text Available ... Our Staff Rheumatology Specialty Centers You are here: Home / Patient Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video ... to take a more active role in your care. The information in these videos should not take ...
Full Text Available ... will allow you to take a more active role in your care. The information in these videos ... Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for ...
Full Text Available ... here. Will You Support the Education of Arthritis Patients? Each year, over 1 million people visit this ... of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 ...
Full Text Available ... listen? see more videos from Veterans Health Administration 1 Act see more videos from Veterans Health Administration ... videos from Veterans Health Administration The Power of 1 PSA see more videos from Veterans Health Administration ...
Full Text Available ... videos about getting help. Be There: Help Save a Life see more videos from Veterans Health Administration ... listen? see more videos from Veterans Health Administration 1 Act see more videos from Veterans Health Administration ...
Habibian, A.; Snoek, C.G.M.
Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is
This international bestseller and essential reference is the "bible" for digital video engineers and programmers worldwide. This is by far the most informative analog and digital video reference available, includes the hottest new trends and cutting-edge developments in the field. Video Demystified, Fourth Edition is a "one stop" reference guide for the various digital video technologies. The fourth edition is completely updated with all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video (Video over DSL, Ethernet, etc.), as well as discussions of the latest standards throughout. The accompanying CD-ROM is updated to include a unique set of video test files in the newest formats. *This essential reference is the "bible" for digital video engineers and programmers worldwide *Contains all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video *Completely revised with all the latest and most up-to-date industry standards.
Full Text Available Image recognition is a technology which can be used in various applications such as medical image recognition systems, security, defense video tracking, and factory automation. In this paper we present a novel pipelined architecture of an adaptive integrated Artificial Neural Network for image recognition. In our proposed work we have combined the feature of spiking neuron concept with ANN to achieve the efficient architecture for image recognition. The set of training images are trained by ANN and target output has been identified. Real time videos are captured and then converted into frames for testing purpose and the image were recognized. The machine can operate at up to 40 frames/sec using images acquired from the camera. The system has been implemented on XC3S400 SPARTAN-3 Field Programmable Gate Arrays.
Full Text Available ... a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer Support Program Community Connections Overview ... group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork Peer Support Program ...
Full Text Available ... support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... group for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
Finnemann, Niels Ole
the print medium, rather than written text or speech. In late 20th century, the notion of text was subject to increasing criticism as in the question raised within literary text theory: is there a text in this class? At the same time, the notion was expanded by including extra linguistic sign modalities...... (images, videos). Thus, a basic question is this: should electronic text be included in the expanded notion of text as a new digital sign modality added to the repertoire of modalities, or should it be included as a sign modality, which is both an independent modality and a container in which other...
Full Text Available A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available Abstract A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available ... NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract Convergence Insufficiency Diabetic Eye Disease Dilated Eye Exam Dry Eye For Kids Glaucoma ...
Noel F. Peden
Full Text Available Capturing video at a conference is easy. Doing it so the product is useful is another matter. Many subtle problems come into play so that video and audio obtained can be used to create a final product. This article discusses what the author learned in the two years of shooting and editing video for Code4Lib conference.
Today's youth are situated in a complex information ecology that includes video games and print texts. At the basic level, video game play itself is a form of digital literacy practice. If we widen our focus from the "individual player + technology" to the online communities that play them, we find that video games also lie at the nexus of a…
Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... Your Arthritis Managing Chronic Pain and Depression in Arthritis Nutrition & Rheumatoid Arthritis Arthritis and Health-related Quality of Life ...
Full Text Available ... Care Disease Types FAQ Handout for Patients and Families Is It Right for You How to Get ... For the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos ...
Full Text Available ... Donate Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and Families ... Policymakers For Family Caregivers Glossary Resources Browse our palliative care resources below: Links Videos Podcasts Webinars For the ...
Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video Ronson and Kerri Albany Support ...
Full Text Available ... Donate Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and ... Policymakers For Family Caregivers Glossary Resources Browse our palliative care resources below: Links Videos Podcasts Webinars For ...
Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video Howard of NJ Gloria hiking ...
Full Text Available ... Mission, Vision & Values Shop ANA Leadership & Staff Annual Reports Acoustic Neuroma Association 600 Peachtree Parkway Suite 108 ... About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English ...
Full Text Available ... Disease Types Stories FAQ Handout for Patients and Families Is It Right for You How to Get ... For the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos ...
Full Text Available ... Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and Families ... For Family Caregivers Glossary Resources Browse our palliative care resources below: Links Videos Podcasts Webinars For the ...
Full Text Available ... Educational Video Scott at the Grand Canyon Proton Center load more hold SHIFT key to load all load all Stay Connected with ANA Newly Diagnosed Living with AN Healthcare Providers Acoustic Neuroma Association Donate Now Newly Diagnosed ...
Full Text Available ... a patient kit Keywords Join/Renew Programs Back Support Groups Is a support group for me? Find ... Events Video Library Photo Gallery One-on-One Support ANetwork Peer Support Program Community Connections Overview Find ...
Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese ( ...
Full Text Available ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources ...
Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video Keck Medicine of USC ANWarriors ...
Full Text Available ... illness: Toby’s palliative care story Access the Provider Directory Handout for Patients and Families Is it Right ... Provider Meet the Team Blog Articles News Provider Directory Donate Resources Links Videos Podcasts Webinars For the ...
Full Text Available ... Click to learn more... LOGIN EVENTS DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video Scott at the Grand Canyon ...
Full Text Available ... Is a support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer Support Program Community Connections Overview Find a Meeting ...
Full Text Available ... All rights reserved. GetPalliativeCare.org does not provide medical advice, diagnosis or treatment. ... the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...
Porter, Guy; Starcevic, Vladan; Berle, David; Fenech, Pauline
It has been increasingly recognized that some people develop problem video game use, defined here as excessive use of video games resulting in various negative psychosocial and/or physical consequences. The main objectives of the present study were to identify individuals with problem video game use and compare them with those without problem video game use on several variables. An international, anonymous online survey was conducted, using a questionnaire with provisional criteria for problem video game use, which the authors have developed. These criteria reflect the crucial features of problem video game use: preoccupation with and loss of control over playing video games and multiple adverse consequences of this activity. A total of 1945 survey participants completed the survey. Respondents who were identified as problem video game users (n = 156, 8.0%) differed significantly from others (n = 1789) on variables that provided independent, preliminary validation of the provisional criteria for problem video game use. They played longer than planned and with greater frequency, and more often played even though they did not want to and despite believing that they should not do it. Problem video game users were more likely to play certain online role-playing games, found it easier to meet people online, had fewer friends in real life, and more often reported excessive caffeine consumption. People with problem video game use can be identified by means of a questionnaire and on the basis of the present provisional criteria, which require further validation. These findings have implications for recognition of problem video game users among individuals, especially adolescents, who present to mental health services. Mental health professionals need to acknowledge the public health significance of the multiple negative consequences of problem video game use.
Johnson, Don; Johnson, Mike
The process of digital capture, editing, and archiving video has become an important aspect of documenting arthroscopic surgery. Recording the arthroscopic findings before and after surgery is an essential part of the patient's medical record. The hardware and software has become more reasonable to purchase, but the learning curve to master the software is steep. Digital video is captured at the time of arthroscopy to a hard disk, and written to a CD at the end of the operative procedure. The process of obtaining video of open procedures is more complex. Outside video of the procedure is recorded on digital tape with a digital video camera. The camera must be plugged into a computer to capture the video on the hard disk. Adobe Premiere software is used to edit the video and render the finished video to the hard drive. This finished video is burned onto a CD. We outline the choice of computer hardware and software for the manipulation of digital video. The techniques of backup and archiving the completed projects and files also are outlined. The uses of digital video for education and the formats that can be used in PowerPoint presentations are discussed.
Full Text Available During their lifetime, people learn to recognize thousands of faces that they interact with. Face perception refers to an individual's understanding and interpretation of the face, particularly the human face, especially in relation to the associated information processing in the brain. The proportions and expressions of the human face are important to identify origin, emotional tendencies, health qualities, and some social information. From birth, faces are important in the individual's social interaction. Face perceptions are very complex as the recognition of facial expressions involves extensive and diverse areas in the brain. Our main goal is to put emphasis on presenting human faces specialized studies, and also to highlight the importance of attractiviness in their retention. We will see that there are many factors that influence face recognition.
Mounîm A. El-Yacoubi
Full Text Available We present an autonomous assistive robotic system for human activity recognition from video sequences. Due to the large variability inherent to video capture from a non-fixed robot (as opposed to a fixed camera, as well as the robot's limited computing resources, implementation has been guided by robustness to this variability and by memory and computing speed efficiency. To accommodate motion speed variability across users, we encode motion using dense interest point trajectories. Our recognition model harnesses the dense interest point bag-of-words representation through an intersection kernel-based SVM that better accommodates the large intra-class variability stemming from a robot operating in different locations and conditions. To contextually assess the engine as implemented in the robot, we compare it with the most recent approaches of human action recognition performed on public datasets (non-robot-based, including a novel approach of our own that is based on a two-layer SVM-hidden conditional random field sequential recognition model. The latter's performance is among the best within the recent state of the art. We show that our robot-based recognition engine, while less accurate than the sequential model, nonetheless shows good performances, especially given the adverse test conditions of the robot, relative to those of a fixed camera.
include: re-identification, consumer behavior analysis, utilizing pupillary response for task difficulty measurement, logo detection, saliency prediction, classification of facial expressions, face recognition, face verification, age estimation, super-resolution, pose estimation, and pain recognition...
Moezzi, Saied; Katkere, Arun L.; Jain, Ramesh C.
Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. That is accomplished by assimilating important information from each video stream into a comprehensive, dynamic, 3D model of the environment. Using this 3D digital video, interactive viewers can then move around the remote environment and observe the events taking place from any desired perspective. Our Immersive Video System currently provides interactive viewing and `walkthrus' of staged karate demonstrations, basketball games, dance performances, and typical campus scenes. In its full realization, Immersive Video will be a paradigm shift in visual communication which will revolutionize television and video media, and become an integral part of future telepresence and virtual reality systems.
Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John
Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.
This thesis is based on a detailed analysis of various topics related to the question of whether video games can be art. In the first place it analyzes the current academic discussion on this subject and confronts different opinions of both supporters and objectors of the idea, that video games can be a full-fledged art form. The second point of this paper is to analyze the properties, that are inherent to video games, in order to find the reason, why cultural elite considers video games as i...
Trybula, Walter J.
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.
Full Text Available This article deals with a recognition system using an algorithm based on the Principal Component Analysis (PCA technique. The recognition system consists only of a PC and an integrated video camera. The algorithm is developed in MATLAB language and calculates the eigenfaces considered as features of the face. The PCA technique is based on the matching between the facial test image and the training prototype vectors. The mathcing score between the facial test image and the training prototype vectors is calculated between their coefficient vectors. If the matching is high, we have the best recognition. The results of the algorithm based on the PCA technique are very good, even if the person looks from one side at the video camera.
Full Text Available Face recognition systems are now being used in many applications such as border crossings, banks, and mobile payments. The wide scale deployment of facial recognition systems has attracted intensive attention to the reliability of face biometrics against spoof attacks, where a photo, a video, or a 3D mask of a genuine user’s face can be used to gain illegitimate access to facilities or services. Though several face antispoofing or liveness detection methods (which determine at the time of capture whether a face is live or spoof have been proposed, the issue is still unsolved due to difficulty in finding discriminative and computationally inexpensive features and methods for spoof attacks. In addition, existing techniques use whole face image or complete video for liveness detection. However, often certain face regions (video frames are redundant or correspond to the clutter in the image (video, thus leading generally to low performances. Therefore, we propose seven novel methods to find discriminative image patches, which we define as regions that are salient, instrumental, and class-specific. Four well-known classifiers, namely, support vector machine (SVM, Naive-Bayes, Quadratic Discriminant Analysis (QDA, and Ensemble, are then used to distinguish between genuine and spoof faces using a voting based scheme. Experimental analysis on two publicly available databases (Idiap REPLAY-ATTACK and CASIA-FASD shows promising results compared to existing works.
Maria J. Santofimia
Full Text Available Smart Spaces, Ambient Intelligence, and Ambient Assisted Living are environmental paradigms that strongly depend on their capability to recognize human actions. While most solutions rest on sensor value interpretations and video analysis applications, few have realized the importance of incorporating common-sense capabilities to support the recognition process. Unfortunately, human action recognition cannot be successfully accomplished by only analyzing body postures. On the contrary, this task should be supported by profound knowledge of human agency nature and its tight connection to the reasons and motivations that explain it. The combination of this knowledge and the knowledge about how the world works is essential for recognizing and understanding human actions without committing common-senseless mistakes. This work demonstrates the impact that episodic reasoning has in improving the accuracy of a computer vision system for human action recognition. This work also presents formalization, implementation, and evaluation details of the knowledge model that supports the episodic reasoning.
Pattern recognition is a scientific discipline that is becoming increasingly important in the age of automation and information handling and retrieval. Patter Recognition, 2e covers the entire spectrum of pattern recognition applications, from image analysis to speech recognition and communications. This book presents cutting-edge material on neural networks, - a set of linked microprocessors that can form associations and uses pattern recognition to ""learn"" -and enhances student motivation by approaching pattern recognition from the designer's point of view. A direct result of more than 10
conclude and give a peek at our future work in section 7. II. RELATED WORK An early attempt in extracting spatio-temporal features was Laptev and... extract HoGs and HoFs along with nearby boundaries histogram (MBH) making the approach robust to camera changes . Nogushi and Yanai  proposed... extracted features remains a problem. The next step was the introduction of relationships between large, stable, trajectory clusters. Sun et al. [9
Battiato, Sebastiano; Farinella, Giovanni
Computer vision is the science and technology of making machines that see. It is concerned with the theory, design and implementation of algorithms that can automatically process visual data to recognize objects, track and recover their shape and spatial layout. The International Computer Vision Summer School - ICVSS was established in 2007 to provide both an objective and clear overview and an in-depth analysis of the state-of-the-art research in Computer Vision. The courses are delivered by world renowned experts in the field, from both academia and industry, and cover both theoretical and practical aspects of real Computer Vision problems. The school is organized every year by University of Cambridge (Computer Vision and Robotics Group) and University of Catania (Image Processing Lab). Different topics are covered each year.This edited volume contains a selection of articles covering some of the talks and tutorials held during the last editions of the school. The chapters provide an in-depth overview o...
Nortvig, Anne Mette; Sørensen, Birgitte Holm
This project’s aim was to support and facilitate master’s students’ preparation and collaboration by making video podcasts of short lectures available on YouTube prior to students’ first face-to-face seminar. The empirical material stems from group interviews, from statistical data created through...... YouTube analytics and from surveys answered by students after the seminar. The project sought to explore how video podcasts support learning and reflection online and how students use and reflect on the integration of online activities in the videos. Findings showed that students engaged actively...
Yang, Su; Zhu, Qing
The goal of sign language recognition (SLR) is to translate the sign language into text, and provide a convenient tool for the communication between the deaf-mute and the ordinary. In this paper, we formulate an appropriate model based on convolutional neural network (CNN) combined with Long Short-Term Memory (LSTM) network, in order to accomplish the continuous recognition work. With the strong ability of CNN, the information of pictures captured from Chinese sign language (CSL) videos can be learned and transformed into vector. Since the video can be regarded as an ordered sequence of frames, LSTM model is employed to connect with the fully-connected layer of CNN. As a recurrent neural network (RNN), it is suitable for sequence learning tasks with the capability of recognizing patterns defined by temporal distance. Compared with traditional RNN, LSTM has performed better on storing and accessing information. We evaluate this method on our self-built dataset including 40 daily vocabularies. The experimental results show that the recognition method with CNN-LSTM can achieve a high recognition rate with small training sets, which will meet the needs of real-time SLR system.
Full Text Available As the public education system in Northern Ontario continues to take a downward spiral, a plethora of secondary school students are being placed in an alternative educational environment. Juxtaposing the two educational settings reveals very similar methods and characteristics of educating our youth as opposed to using a truly alternative approach to education. This video reviews the relationship between public education and alternative education in a remote Northern Ontario setting. It is my belief that the traditional methods of teaching are not appropriate in educating at risk students in alternative schools. Paper and pencil worksheets do not motivate these students to learn and succeed. Alternative education should emphasize experiential learning, a just in time curriculum based on every unique individual and the students true passion for everyday life. Cameron Culbert was born on February 3rd, 1977 in North Bay, Ontario. His teenage years were split between attending public school and his willed curriculum on the ski hill. Culbert spent 10 years (1996-2002 & 2006-2010 competing for Canada as an alpine ski racer. His passion for teaching and coaching began as an athlete and has now transferred into the classroom and the community. As a graduate of Nipissing University (BA, BEd, MEd. Camerons research interests are alternative education, physical education and technology in the classroom. Currently Cameron is an active educator and coach in Northern Ontario.
Full Text Available ... treatment immediately. View the Video » View the Transcript » Download the Video » Ataque Cerebral Video Loading the player... ... Jose Merino. View the Video » View the Transcript » Download the Video (75,830K) » Home | About the Campaign | ...
Full Text Available ... listen? see more videos from Veterans Health Administration 1 Act see more videos from Veterans Health Administration Lost: The Power of One Connection see more videos from Veterans Health Administration ...
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Full Text Available ... videos about getting help. Be There: Help Save a Life see more videos from Veterans Health Administration ... more videos from Veterans Health Administration I am A Veteran Family/Friend Active Duty/Reserve and Guard ...
Full Text Available ... After the Call see more videos from Veterans Health Administration I'm Good. But are you ready to listen? see more videos from Veterans Health Administration 1 Act see more videos from Veterans ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/v/ ... the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Be There: ... a Life see more videos from Veterans Health Administration Veterans Crisis Line -- After the Call see more ...
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/ ... Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Be ... Save a Life see more videos from Veterans Health Administration Veterans Crisis Line -- After the Call see ...
Full Text Available ... Administration I'm Good. But are you ready to listen? see more videos from Veterans Health Administration ... videos from Veterans Health Administration Vet Centers: Here to Help see more videos from Veterans Health Administration ...
Full Text Available ... More Videos from Veterans Health Administration Watch additional videos about getting help. Be There: Help Save a Life see more videos from Veterans Health Administration Veterans Crisis Line -- After ...
Full Text Available ... more videos from Veterans Health Administration Lost: The Power of One Connection see more videos from Veterans Health Administration The Power of 1 PSA see more videos from Veterans ...
Full Text Available ... listen? see more videos from Veterans Health Administration 1 Act see more videos from Veterans Health Administration ... from Veterans Health Administration Lost: The Power of One Connection see more videos from Veterans Health Administration ...
Full Text Available Video Tracking is one of the processes in video postproduction and motion picture digitally. The ability of video tracking method in the production is helpful to realize the concept of the visual. It is considered in the process of visual effects making. This paper presents how the tracking process and its benefits in visual needs, especially for video and motion picture production. Some of the things involved in the process of tracking such as failure to do so are made clear in this discussion.
Full Text Available ... Special Needs: Planning for Adulthood (Video) KidsHealth > For Parents > Special Needs: Planning for Adulthood (Video) Print A A A Young adults with special needs have many programs, services, and ...
Full Text Available ... Program Growth and Nutrition Program Celiac Disease Program | Videos Contact the Celiac Disease Program 1-617-355- ... live happy and productive lives. Each of our video segments provides practical information about celiac disease from ...
Full Text Available ... Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with special needs have many ...
Full Text Available ... Healthy Drinks for Kids Special Needs: Planning for Adulthood (Video) KidsHealth > For Parents > Special Needs: Planning for Adulthood (Video) Print A A A Young adults with ...
Full Text Available ... Health Food & Fitness Diseases & Conditions Infections Drugs & Alcohol School & Jobs Sports Expert Answers (Q&A) Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / ...
Full Text Available In this paper we join a growing body of studies that learn from vernacular video analysts quite what video analysis as an intelligible course of action might be. Rather than pursuing epistemic questions regarding video as a number of other studies of video analysis have done, our concern here is with the crafts of producing the filmic. As such we examine how audio and video clips are indexed and brought to hand during the logging process, how a first assembly of the film is built at the editing bench and how logics of shot sequencing relate to wider concerns of plotting, genre and so on. In its conclusion we make a number of suggestions about the future directions of studying video and film editors at work. URN: urn:nbn:de:0114-fqs0803378
Full Text Available This paper presents a method for recognizing human actions from a single query action video. We propose an action recognition scheme based on the ordinal measure of accumulated motion, which is robust to variations of appearances. To this end, we first define the accumulated motion image (AMI using image differences. Then the AMI of the query action video is resized to a subimage by intensity averaging and a rank matrix is generated by ordering the sample values in the sub-image. By computing the distances from the rank matrix of the query action video to the rank matrices of all local windows in the target video, local windows close to the query action are detected as candidates. To find the best match among the candidates, their energy histograms, which are obtained by projecting AMI values in horizontal and vertical directions, respectively, are compared with those of the query action video. The proposed method does not require any preprocessing task such as learning and segmentation. To justify the efficiency and robustness of our approach, the experiments are conducted on various datasets.
Full Text Available Surveillance videos contain a considerable amount of data, wherein interesting information to the user is sparsely distributed. Researchers construct video synopsis that contain key information extracted from a surveillance video for efficient browsing and analysis. Geospatial–temporal information of a surveillance video plays an important role in the efficient description of video content. Meanwhile, current approaches of video synopsis lack the introduction and analysis of geospatial-temporal information. Owing to the preceding problems mentioned, this paper proposes an approach called “surveillance video synopsis in GIS”. Based on an integration model of video moving objects and GIS, the virtual visual field and the expression model of the moving object are constructed by spatially locating and clustering the trajectory of the moving object. The subgraphs of the moving object are reconstructed frame by frame in a virtual scene. Results show that the approach described in this paper comprehensively analyzed and created fusion expression patterns between video dynamic information and geospatial–temporal information in GIS and reduced the playback time of video content.
Denikin Anton A.
Full Text Available The article considers the aesthetical and practical possibilities for sounds (sound design in video games and interactive applications. Outlines the key features of the game sound, such as simulation, representativeness, interactivity, immersion, randomization, and audio-visuality. The author defines the basic terminology in study of game audio, as well as identifies significant aesthetic differences between film sounds and sounds in video game projects. It is an attempt to determine the techniques of art analysis for the approaches in study of video games including aesthetics of their sounds. The article offers a range of research methods, considering the video game scoring as a contemporary creative practice.
Full Text Available In the past few years there has been an explosion in the use of digital video data. Many people have personal computers at home, and with the help of the Internet users can easily share video files on their computer. This makes possible the unauthorized use of digital media, and without adequate protection systems the authors and distributors have no means to prevent it.Digital watermarking techniques can help these systems to be more effective by embedding secret data right into the video stream. This makes minor changes in the frames of the video, but these changes are almost imperceptible to the human visual system. The embedded information can involve copyright data, access control etc. A robust watermark is resistant to various distortions of the video, so it cannot be removed without affecting the quality of the host medium. In this paper I propose a video watermarking scheme that fulfills the requirements of a robust watermark.
.... Experiments have been completed comparing the effects of several types of facial motion on face recognition, the effects of face familiarity on recognition from video clips taken at a distance...
Wetzel, C. Douglas; And Others
This volume is a blend of media research, cognitive science research, and tradecraft knowledge regarding video production techniques. The research covers: visual learning; verbal-auditory information; news broadcasts; the value of motion and animation in film and video; simulation (including realism and fidelity); the relationship of text and…
Full Text Available ... National Eye Institute’s mission is to “conduct and support research, training, health information dissemination, and other programs ... search for current job openings visit HHS USAJobs Home > NEI YouTube Videos > NEI YouTube Videos: Amblyopia NEI ...
Dette kapitel har fokus på metodiske problemstillinger, der opstår i forhold til at bruge (digital) video i forbindelse med forskningskommunikation, ikke mindst online. Video har længe været benyttet i forskningen til dataindsamling og forskningskommunikation. Med digitaliseringen og internettet er...... der dog opstået nye muligheder og udfordringer i forhold til at formidle og distribuere forskningsresultater til forskellige målgrupper via video. Samtidig er klassiske metodologiske problematikker som forskerens positionering i forhold til det undersøgte stadig aktuelle. Både klassiske og nye...... problemstillinger diskuteres i kapitlet, som rammesætter diskussionen ud fra forskellige positioneringsmuligheder: formidler, historiefortæller, eller dialogist. Disse positioner relaterer sig til genrer inden for ’akademisk video’. Afslutningsvis præsenteres en metodisk værktøjskasse med redskaber til planlægning...
Full Text Available Previous research has been inconsistent on whether violent video games exert positive and/or negative effects on cognition. In particular, attentional bias in facial affect processing after violent video game exposure continues to be controversial. The aim of the present study was to investigate attentional bias in facial recognition after short term exposure to violent video games and to characterize the neural correlates of this effect. In order to accomplish this, participants were exposed to either neutral or violent video games for 25 min and then event-related potentials (ERPs were recorded during two emotional search tasks. The first search task assessed attentional facilitation, in which participants were required to identify an emotional face from a crowd of neutral faces. In contrast, the second task measured disengagement, in which participants were required to identify a neutral face from a crowd of emotional faces. Our results found a significant presence of the ERP component, N2pc, during the facilitation task; however, no differences were observed between the two video game groups. This finding does not support a link between attentional facilitation and violent video game exposure. Comparatively, during the disengagement task, N2pc responses were not observed when participants viewed happy faces following violent video game exposure; however, a weak N2pc response was observed after neutral video game exposure. These results provided only inconsistent support for the disengagement hypothesis, suggesting that participants found it difficult to separate a neutral face from a crowd of emotional faces.
Karpenko, Alexandre; Aarabi, Parham
In this paper, we present a large database of over 50,000 user-labeled videos collected from YouTube. We develop a compact representation called "tiny videos" that achieves high video compression rates while retaining the overall visual appearance of the video as it varies over time. We show that frame sampling using affinity propagation-an exemplar-based clustering algorithm-achieves the best trade-off between compression and video recall. We use this large collection of user-labeled videos in conjunction with simple data mining techniques to perform related video retrieval, as well as classification of images and video frames. The classification results achieved by tiny videos are compared with the tiny images framework  for a variety of recognition tasks. The tiny images data set consists of 80 million images collected from the Internet. These are the largest labeled research data sets of videos and images available to date. We show that tiny videos are better suited for classifying scenery and sports activities, while tiny images perform better at recognizing objects. Furthermore, we demonstrate that combining the tiny images and tiny videos data sets improves classification precision in a wider range of categories.
Bolle, R. M.; Yeo, B.-L.; Yeung, M.
Digital video databases are becoming more and more pervasive and finding video of interest in large databases is rapidly becoming a problem. Intelligent means of quick content-based video retrieval and content-based rapid video viewing is, therefore, an important topic of research. Video is a rich source of data, it contains visual and audio information, and in many cases, there is text associated with the video. Content-based video retrieval should use all this information in an efficient and effective way. From a human perspective, a video query can be viewed as an iterated sequence of navigating, searching, browsing, and viewing. This paper addresses video search in terms of these phases.
Full Text Available ... Donate Resources Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary Sign Up for Our Blog Subscribe to Blog Enter your email address to subscribe to this blog and receive notifications of new posts by email. Email Address CLOSE Home About ...
Full Text Available Video processing source code for algorithms and tools used in software media pipelines (e.g. image scalers, colour converters, etc.) The currently available source code is written in C++ with their associated libraries and DirectShow- Filters....
Full Text Available Home About Donate Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and Families Is It Right for You How to Get It Talk to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts ...
Full Text Available ... Surgery What is acoustic neuroma Diagnosing Symptoms Side effects ... Groups Is a support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer Support Program Community Connections Overview Find a Meeting ...
Simon, Michael; Fischer, Amber; Petrov, Plamen
Unmanned aerial vehicles (UAVs) capture real-time video data of military targets while keeping the warfighter at a safe distance. This keeps soldiers out of harm's way while they perform intelligence, surveillance and reconnaissance (ISR) and close-air support troops in contact (CAS-TIC) situations. The military also wants to use UAV video to achieve force multiplication. One method of achieving effective force multiplication involves fielding numerous UAVs with cameras and having multiple videos processed simultaneously by a single operator. However, monitoring multiple video streams is difficult for operators when the videos are of low quality. To address this challenge, we researched several promising video enhancement algorithms that focus on improving video quality. In this paper, we discuss our video enhancement suite and provide examples of video enhancement capabilities, focusing on stabilization, dehazing, and denoising. We provide results that show the effects of our enhancement algorithms on target detection and tracking algorithms. These results indicate that there is potential to assist the operator in identifying and tracking relevant targets with aided target recognition even on difficult video, increasing the force multiplier effect of UAVs. This work also forms the basis for human factors research into the effects of enhancement algorithms on ISR missions.
Mu, Meiru; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.
It is still challenging to recognize faces reliably in videos from mobile camera, although mature automatic face recognition technology for still images has been available for quite some time. Suppose we want to be alerted when suspects appear in the recording of a police Body-Cam, even a good face matcher on still images would give many false alarms due to the uncontrolled conditions. This paper presents an approach to identify faces in videos from mobile cameras. A commercial face matcher F...
Full Text Available Video streaming over the Internet has gained significant popularity during the last years, and the academy and industry have realized a great research effort in this direction. In this scenario, scalable video coding (SVC has emerged as an important video standard to provide more functionality to video transmission and storage applications. This paper proposes and evaluates two strategies based on scalable video coding for P2P video streaming services. In the first strategy, SVC is used to offer differentiated quality video to peers with heterogeneous capacities. The second strategy uses SVC to reach a homogeneous video quality between different videos from different sources. The obtained results show that our proposed strategies enable a system to improve its performance and introduce benefits such as differentiated quality of video for clients with heterogeneous capacities and variable network conditions.
Full Text Available Las primeras experiencias de video rural fueron realizadas en Perú y México. El proyecto peruano es conocido como CESPAC (Centro de Servicios de Pedagogía Audiovisual para la Capacitación. Con financiamiento externo de la FAO fue iniciado en la década del 70. El proyecto mexicano fue bautizado con el nombre de PRODERITH (Programa de Desarrollo Rural Integrado del Trópico Húmedo. Su componente de video rural tuvo un éxito muy particular a nivel de base.La evaluación concluyó en que el video rural como sistema de comunicación social para el desarrollo es excelente y de bajo costo
Full Text Available This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor (MPGD. Furthermore, we have developed a histogram-based representation (HBR based on the MPGD to construct a length-independent representation of the observed video subsequences. Using the constructed HBR, we formulate the fall detection problem as a posterior-maximization problem in which the posteriori probability for each observed video subsequence is estimated using a multi-class SVM (support vector machine classifier. Then, we combine the computed posteriori probabilities from all of the observed subsequences to obtain an overall class posteriori probability of the entire partially-observed depth-map video sequence. To evaluate the performance of the proposed approach, we have utilized the Kinect sensor to record a dataset of depth-map video sequences that simulates four fall-related activities of elderly people, including: walking, sitting, falling form standing and falling from sitting. Then, using the collected dataset, we have developed three evaluation scenarios based on the number of unobserved video subsequences in the testing videos, including: fully-observed video sequence scenario, single unobserved video subsequence of random lengths scenarios and two unobserved video subsequences of random lengths scenarios. Experimental results show that the proposed approach achieved an average recognition accuracy of 93 . 6 % , 77 . 6 % and 65 . 1 % , in recognizing the activities during the first, second and third evaluation scenario, respectively. These results demonstrate the feasibility of the proposed approach to detect falls from partially-observed videos.
Full Text Available ... see more videos from Blue Star Families These Hands PSA see more videos from Veterans Health Administration ... Line text-messaging service does not store mobile phone numbers of users who access information via text ...
Full Text Available ... Arthritis Center since 2000, currently serving as the Nurse Manager. She is a critical member of our patient care ... of Body Weight in Osteoarthritis Educational Videos for ...
Full Text Available ... will allow you to take a more active role in your care. The information in these videos ... Arthritis and Health-related Quality of Life Rehabilitation Management for Rheumatoid Arthritis Patients Rehabilitation of Older Adult ...
Full Text Available ... Intern Program Diversity In Vision Research & Ophthalmology (DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home > NEI YouTube Videos > ...
Full Text Available ... Studies The Camille Julia Morgan Arthritis Research and Education Fund About Us Appointment Information Contact Us Our Faculty Our Staff Rheumatology Specialty Centers You are here: Home / Patient Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video ...
Full Text Available ... is Happening to the Joints? Rheumatoid Arthritis: Gaining Control – Working with your Rheumatologist Rheumatoid Arthritis: Additional Conditions ... Hopkins Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos ...
Full Text Available ... to take a more active role in your care. The information in these videos should not take ... She is a critical member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing ...
Full Text Available ... and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision ... DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home > NEI YouTube Videos > ...
Full Text Available ... for Patients with Rheumatoid Arthritis Yoga for Arthritis Yoga Poses for Arthritis Patients from Johns Hopkins Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid ...
Botella, Guillermo; García, Carlos; Meyer-Bäse, Uwe
This contribution focuses on different topics covered by the special issue titled `Hardware Implementation of Machine vision Systems' including FPGAs, GPUS, embedded systems, multicore implementations for image analysis such as edge detection, segmentation, pattern recognition and object recognition/interpretation, image enhancement/restoration, image/video compression, image similarity and retrieval, satellite image processing, medical image processing, motion estimation, neuromorphic and bioinspired vision systems, video processing, image formation and physics based vision, 3D processing/coding, scene understanding, and multimedia.
Full Text Available About the video image processing's vehicle detection and counting system research, which has video vehicle detection, vehicle targets' image processing, and vehicle counting function. Vehicle detection is the use of inter-frame difference method and vehicle shadow segmentation techniques for vehicle testing. Image processing functions is the use of color image gray processing, image segmentation, mathematical morphology analysis and image fills, etc. on target detection to be processed, and then the target vehicle extraction. Counting function is to count the detected vehicle. The system is the use of inter-frame video difference method to detect vehicle and the use of the method of adding frame to vehicle and boundary comparison method to complete the counting function, with high recognition rate, fast, and easy operation. The purpose of this paper is to enhance traffic management modernization and automation levels. According to this study, it can provide a reference for the future development of related applications.
Full Text Available Object tracking is an important and fundamental task in computer vision and its high-level applications, e.g., intelligent surveillance, motion-based recognition, video indexing, traffic monitoring and vehicle navigation. However, the recent widespread use of wireless consumer cameras often produces low quality videos with frame-skipping and this makes object tracking difficult. Previous tracking methods, for example, generally depend heavily on object appearance or motion continuity and cannot be directly applied to frame-skipping videos. In this paper, we propose an improved particle filter for object tracking to overcome the frame-skipping difficulties. The novelty of our particle filter lies in using the detection result of erratic motion to ameliorate the transition model for a better trial distribution. Experimental results show that the proposed approach improves the tracking accuracy in comparison with the state-of-the-art methods, even when both the object and the consumer are in motion.
Indian language; Oriya script; character segmentation; handwriting recognition. 1. Introduction. Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten script recognition process. The task of individual text-line segmentation from unconstrained handwritten documents ...
Full Text Available Smoke detection is a very key part of fire recognition in a forest fire surveillance video since the smoke produced by forest fires is visible much before the flames. The performance of smoke video detection algorithm is often influenced by some smoke-like objects such as heavy fog. This paper presents a novel forest fire smoke video detection based on spatiotemporal features and dynamic texture features. At first, Kalman filtering is used to segment candidate smoke regions. Then, candidate smoke region is divided into small blocks. Spatiotemporal energy feature of each block is extracted by computing the energy features of its 8-neighboring blocks in the current frame and its two adjacent frames. Flutter direction angle is computed by analyzing the centroid motion of the segmented regions in one candidate smoke video clip. Local Binary Motion Pattern (LBMP is used to define dynamic texture features of smoke videos. Finally, smoke video is recognized by Adaboost algorithm. The experimental results show that the proposed method can effectively detect smoke image recorded from different scenes.
Lecca, Michela; Smolka, Bogdan
This text covers state-of-the-art color image and video enhancement techniques. The book examines the multivariate nature of color image/video data as it pertains to contrast enhancement, color correction (equalization, harmonization, normalization, balancing, constancy, etc.), noise removal and smoothing. This book also discusses color and contrast enhancement in vision sensors and applications of image and video enhancement. · Focuses on enhancement of color images/video · Addresses algorithms for enhancing color images and video · Presents coverage on super resolution, restoration, in painting, and colorization.
Full Text Available This paper presents about the transmission of Digital Video Broadcasting system with streaming video resolution 640x480 on different IQ rate and modulation. In the video transmission, distortion often occurs, so the received video has bad quality. Key frames selection algorithm is flexibel on a change of video, but on these methods, the temporal information of a video sequence is omitted. To minimize distortion between the original video and received video, we aimed at adding methodology using sequential distortion minimization algorithm. Its aim was to create a new video, better than original video without significant loss of content between the original video and received video, fixed sequentially. The reliability of video transmission was observed based on a constellation diagram, with the best result on IQ rate 2 Mhz and modulation 8 QAM. The best video transmission was also investigated using SEDIM (Sequential Distortion Minimization Method and without SEDIM. The experimental result showed that the PSNR (Peak Signal to Noise Ratio average of video transmission using SEDIM was an increase from 19,855 dB to 48,386 dB and SSIM (Structural Similarity average increase 10,49%. The experimental results and comparison of proposed method obtained a good performance. USRP board was used as RF front-end on 2,2 GHz.
Full Text Available Conventional video traces (which characterize the video encoding frame sizes in bits and frame quality in PSNR are limited to evaluating loss-free video transmission. To evaluate robust video transmission schemes for lossy network transport, generally experiments with actual video are required. To circumvent the need for experiments with actual videos, we propose in this paper an advanced video trace framework. The two main components of this framework are (i advanced video traces which combine the conventional video traces with a parsimonious set of visual content descriptors, and (ii quality prediction schemes that based on the visual content descriptors provide an accurate prediction of the quality of the reconstructed video after lossy network transport. We conduct extensive evaluations using a perceptual video quality metric as well as the PSNR in which we compare the visual quality predicted based on the advanced video traces with the visual quality determined from experiments with actual video. We find that the advanced video trace methodology accurately predicts the quality of the reconstructed video after frame losses.
Mihalache Sergiu; Stoica Mihaela-Zoica
.... From birth, faces are important in the individual's social interaction. Face perceptions are very complex as the recognition of facial expressions involves extensive and diverse areas in the brain...
Full Text Available Video surveillance system senses and trails out all the threatening issues in the real time environment. It prevents from security threats with the help of visual devices which gather the information related to videos like CCTV’S and IP (Internet Protocol cameras. Video surveillance system has become a key for addressing problems in the public security. They are mostly deployed on the IP based network. So, all the possible security threats exist in the IP based application might also be the threats available for the reliable application which is available for video surveillance. In result, it may increase cybercrime, illegal video access, mishandling videos and so on. Hence, in this paper an intelligent model is used to propose security for video surveillance system which ensures safety and it provides secured access on video.
Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor
Understanding behaviors is the core of video content analysis, which is highly related to two important applications: abnormal event detection and action recognition. Dictionary learning, as one of the mid-level representations, is an important step to process a video. It has achieved state...
Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor
Understanding behaviors is the core of video content analysis, which is highly related to two important applications: abnormal event detection and action recognition. Dictionary learning, as one of the mid-level representations, is an important step to process a video. It has achieved state...
Tavormina, Maurilio Giuseppe Maria; Tavormina, Romina
The frequent and protracted use of video games with serious personal, family and social consequences is no longer just a pleasant pastime and could lead to mental and physical health problems. Although there is no official recognition of video game addiction on the Internet as a mild mental health disorder, further scientific research is needed.
Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue
of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...
Tan, Zheng-Hua; Lindberg, Børge
The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...... in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within...... command and control, text entry and search are presented with an emphasis on mobile text entry....
Full Text Available ... Videos – Experiencing Celiac Disease What is Celiac Disease Diet ... Division of Gastroenterology, Hepatology and Nutrition Bone Health Program Growth and Nutrition Program Celiac ...
Full Text Available ... so much more than a hospital—it’s a community of researchers, clinicians, administrators, support staff, ... Innovation Videos Contact Us Boston Children's Hospital 300 ...
Full Text Available The aim of this paper is to present video quality prediction models for objective non-intrusive, prediction of H.264 encoded video for all content types combining parameters both in the physical and application layer over Universal Mobile Telecommunication Systems (UMTS networks. In order to characterize the Quality of Service (QoS level, a learning model based on Adaptive Neural Fuzzy Inference System (ANFIS and a second model based on non-linear regression analysis is proposed to predict the video quality in terms of the Mean Opinion Score (MOS. The objective of the paper is two-fold. First, to find the impact of QoS parameters on end-to-end video quality for H.264 encoded video. Second, to develop learning models based on ANFIS and non-linear regression analysis to predict video quality over UMTS networks by considering the impact of radio link loss models. The loss models considered are 2-state Markov models. Both the models are trained with a combination of physical and application layer parameters and validated with unseen dataset. Preliminary results show that good prediction accuracy was obtained from both the models. The work should help in the development of a reference-free video prediction model and QoS control methods for video over UMTS networks.
Maslow, Katie; Mezey, Mathy
Many hospital patients with dementia have no documented dementia diagnosis. In some cases, this is because they have never been diagnosed. Recognition of Dementia in Hospitalized Older Adults proposes several approaches that hospital nurses can use to increase recognition of dementia. This article describes the Try This approaches, how to implement them, and how to incorporate them into a hospital's current admission procedures. For a free online video demonstrating the use of these approaches, go to http://links.lww.com/A216.
de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus
This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the
VanDeventer, Stephanie S.; White, James A.
Investigates the display of expert behavior by seven outstanding video game-playing children ages 10 and 11. Analyzes observation and debriefing transcripts for evidence of self-monitoring, pattern recognition, principled decision making, qualitative thinking, and superior memory, and discusses implications for educators regarding the development…
Mu, Meiru; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.
It is still challenging to recognize faces reliably in videos from mobile camera, although mature automatic face recognition technology for still images has been available for quite some time. Suppose we want to be alerted when suspects appear in the recording of a police Body-Cam, even a good face
Full Text Available The highly efficient and robust stitching of aerial video captured by unmanned aerial vehicles (UAVs is a challenging problem in the field of robot vision. Existing commercial image stitching systems have seen success with offline stitching tasks, but they cannot guarantee high-speed performance when dealing with online aerial video sequences. In this paper, we present a novel system which has an unique ability to stitch high-frame rate aerial video at a speed of 150 frames per second (FPS. In addition, rather than using a high-speed vision platform such as FPGA or CUDA, our system is running on a normal personal computer. To achieve this, after the careful comparison of the existing invariant features, we choose the FAST corner and binary descriptor for efficient feature extraction and representation, and present a spatial and temporal coherent filter to fuse the UAV motion information into the feature matching. The proposed filter can remove the majority of feature correspondence outliers and significantly increase the speed of robust feature matching by up to 20 times. To achieve a balance between robustness and efficiency, a dynamic key frame-based stitching framework is used to reduce the accumulation errors. Extensive experiments on challenging UAV datasets demonstrate that our approach can break through the speed limitation and generate an accurate stitching image for aerial video stitching tasks.
Full Text Available in conjunction with the technical aspects of video display in browsers, when varying media formats are used. The <video> tag used in this work renders videos from two sources with different MIME types. Feeds from the video sources, namely YouTube and UCT...
Full Text Available ... Stroke Home » Stroke Materials » Loading the player... Video Transcript Weakness on one Side. Trouble Speaking. Trouble Seeing. ... medical treatment immediately. View the Video » View the Transcript » Download the Video » Ataque Cerebral Video Loading the ...
Clintin P. Davis-Stober
Full Text Available The Recognition Heuristic (Gigerenzer and Goldstein, 1996; Goldstein and Gigerenzer, 2002 makes the counter-intuitive prediction that a decision maker utilizing less information may do as well as, or outperform, an idealized decision maker utilizing more information. We lay a theoretical foundation for the use of single-variable heuristics such as the Recognition Heuristic as an optimal decision strategy within a linear modeling framework. We identify conditions under which over-weighting a single predictor is a mini-max strategy among a class of a priori chosen weights based on decision heuristics with respect to a measure of statistical lack of fit we call ``risk''. These strategies, in turn, outperform standard multiple regression as long as the amount of data available is limited. We also show that, under related conditions, weighting only one variable and ignoring all others produces the same risk as ignoring the single variable and weighting all others. This approach has the advantage of generalizing beyond the original environment of the Recognition Heuristic to situations with more than two choice options, binary or continuous representations of recognition, and to other single variable heuristics. We analyze the structure of data used in some prior recognition tasks and find that it matches the sufficient conditions for optimality in our results. Rather than being a poor or adequate substitute for a compensatory model, the Recognition Heuristic closely approximates an optimal strategy when a decision maker has finite data about the world.
Full Text Available Video is a popular and a motivating potential medium in schools. Using video in the language classroom helps the language teachers in many different ways. Video, for instance, brings the outside world into the language classroom, providing the class with many different topics and reasons to talk about. It can provide comprehensible input to the learners through contextualised models of language use. It also offers good opportunities to introduce native English speech into the language classroom. Through this article I will try to show what the benefits of using video are and, at the end, I present an instrument to select and classify video materials.
Helen Gail Prosser
Full Text Available Northern Lakes College in north-central Alberta is the first post-secondary institution in Canada to use the Media on Demand digital video system to stream large video files between dispersed locations (Karlsen. Staff and students at distant locations of Northern Lakes College are now viewing more than 350 videos using video streaming technology. This has been made possible by SuperNet, a high capacity broadband network that connects schools, hospitals, libraries and government offices throughout the province of Alberta (Alberta SuperNet. This article describes the technical process of implementing video streaming at Northern Lakes College from March 2005 until March 2006.
Full Text Available ... Help see more videos from Veterans Health Administration Suicide Prevention PSA for Military Families see more videos ... About About the Veterans Crisis Line FAQs Veteran Suicide The Veterans Crisis Line text-messaging service does ...
I. S. Rubina
Full Text Available The paper deals with image interpolation methods and their applicability to eliminate some of the artifacts related to both the dynamic properties of objects in video sequences and algorithms used in the order of encoding steps. The main drawback of existing methods is the high computational complexity, unacceptable in video processing. Interpolation of signal samples for blocking - effect elimination at the output of the convertion encoding is proposed as a part of the study. It was necessary to develop methods for improvement of compression ratio and quality of the reconstructed video data by blocking effect elimination on the borders of the segments by intraframe interpolating of video sequence segments. The main point of developed methods is an adaptive recursive algorithm application with adaptive-sized interpolation kernel both with and without the brightness gradient consideration at the boundaries of objects and video sequence blocks. Within theoretical part of the research, methods of information theory (RD-theory and data redundancy elimination, methods of pattern recognition and digital signal processing, as well as methods of probability theory are used. Within experimental part of the research, software implementation of compression algorithms with subsequent comparison of the implemented algorithms with the existing ones was carried out. Proposed methods were compared with the simple averaging algorithm and the adaptive algorithm of central counting interpolation. The advantage of the algorithm based on the adaptive kernel size selection interpolation is in compression ratio increasing by 30%, and the advantage of the modified algorithm based on the adaptive interpolation kernel size selection is in the compression ratio increasing by 35% in comparison with existing algorithms, interpolation and quality of the reconstructed video sequence improving by 3% compared to the one compressed without interpolation. The findings will be
Jain, Anil K.; Namboodiri, Anoop M.; Jung, Keechul
Many document images contain both text and non-text (images, line drawings, etc.) regions. An automatic segmentation of such an image into text and non-text regions is extremely useful in a variety of applications. Identification of text regions helps in text recognition applications, while the classification of an image into text and non-text regions helps in processing the individual regions differently in applications like page reproduction and printing. One of the main approaches to text detection is based on modeling the text as a texture. We present a method based on a combination of neural networks (texture-based) and connected component analysis to detect text in color documents with busy foreground and background. The proposed method achieves an accuracy of 96% (by area) on a test set of 40 documents.
158 pages This study aims to construct a systematical approach to classification of narrative usage in video games. The most recent dominant approaches of reading a video game text – narratology and ludology - are discussed. By inquiring the place of interactivity and autonomy inside the discourse of video game narrative, a classification is proposed. Consequently six groups of video games are determined, depending on the levels of combination of narration and ludic context. These Six Degr...
Juan José Rodríguez Soler
Full Text Available Online Channels in financial institutions allows customers with disabilities to access services in a convenient way for them.However, one of the current challenges of this sector is to improve web accessibility and to incorporate technological resources to provide access to multimedia and video content, which has become a new form of internet communication.The present work shows in detail the strategy followed when designing and developing the new video player used by Bankinter for these purposes.
Full Text Available A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
Full Text Available Abstract A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
Full Text Available The quality of the smartphone’s camera enables us to capture high quality pictures at a high resolution, so we can perform different types of recognition on these images. Face detection is one of these types of recognition that is very common in our society. We use it every day on Facebook to tag friends in our pictures. It is also used in video games alongside Kinect concept, or in security to allow the access to private places only to authorized persons. These are just some examples of using facial recognition, because in modern society, detection and facial recognition tend to surround us everywhere. The aim of this article is to create an appli-cation for smartphones that can recognize human faces. The main goal of this application is to grant access to certain areas or rooms only to certain authorized persons. For example, we can speak here of hospitals or educational institutions where there are rooms where only certain employees can enter. Of course, this type of application can cover a wide range of uses, such as helping people suffering from Alzheimer's to recognize the people they loved, to fill gaps persons who can’t remember the names of their relatives or for example to automatically capture the face of our own children when they smile.
Since their first inception, automatic reading systems have evolved substantially, yet the recognition of handwriting remains an open research problem due to its substantial variation in appearance. With the introduction of Markovian models to the field, a promising modeling and recognition paradigm was established for automatic handwriting recognition. However, no standard procedures for building Markov model-based recognizers have yet been established. This text provides a comprehensive overview of the application of Markov models in the field of handwriting recognition, covering both hidden
Gordon, Shayna L; Porto, Dennis A; Ozog, David M; Council, M Laurin
The use of video can enhance the learning experience by demonstrating procedural techniques that are difficult to relay in writing. Several peer-reviewed journals allow publication of videos alongside articles to complement the written text. The purpose of this article is to instruct the dermatologic surgeon on how to create and edit a video using a smartphone, to accompany a article. The authors describe simple tips to optimize surgical videography. The video that accompanies this article further demonstrates the techniques described. Creating a surgical video requires little experience or equipment and can be completed in a modest amount of time. Making and editing a video to accompany a article can be accomplished by following the simple recommendations in this article. In addition, the increased use of video in dermatologic surgery education can enhance the learning opportunity.
Full Text Available Video game accessibility may not seem of significance to some, and it may sound trivial to anyone who does not play video games. This assumption is false. With the digitalization of our culture, video games are an ever increasing part of our life. They contribute to peer to peer interactions, education, music and the arts. A video game can be created by hundreds of musicians and artists, and they can have production budgets that exceed modern blockbuster films. Inaccessible video games are analogous to movie theaters without closed captioning or accessible facilities. The movement to have accessible video games is small, unorganized and misdirected. Just like the other battles to make society accessible were accomplished through legislation and law, the battle for video game accessibility must be focused toward the law and not the market.
Octavio José Salcedo Parra
Full Text Available La motivación para caracterizar el tráfico de voz y video está en la necesidad de las empresas proveedoras de servicio en mantener redes de transporte de información con capacidades acordes a los requerimientos de los usuarios. Poder determinar en forma oportuna como los elementos técnicos que hacen parte de las redes afectan su desempeño, teniendo en cuenta que cada tipo de servicio es afectado en mayor o menor medida por dichos elementos dentro de los que tenemos el jitter, las demoras y las pérdidas de paquetes entre otros. El presente trabajo muestra varios casos de caracterización de tráfico tanto de voz como de video en las que se utilizan una diversidad de técnicas para diferentes tipos de servicio.
Baktashmotlagh, Mahsa; Harandi, Mehrtash; Lovell, Brian C; Salzmann, Mathieu
Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process. In this paper, we introduce non-linear stationary subspace analysis: a method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., the parts specific to individual videos). Our method also encourages the new representation to be discriminative, thus accounting for the underlying classification problem. We demonstrate the effectiveness of our approach on dynamic texture recognition, scene classification and action recognition.
Full Text Available The explosive growth of information technology in the last decade has made a considerable impact on the design and construction of systems for human-machine communication, which is becoming increasingly important in many aspects of life. Amongst other speech processing tasks, a great deal of attention has been devoted to developing procedures that identify people from their voices, and the design and construction of speaker recognition systems has been a fascinating enterprise pursued over many decades. This paper introduces speaker recognition in general and discusses its relevant parameters in relation to system performance.
Full Text Available Much of the attention paid to video in foreign language teaching is focused upon a relatively small amount of commercially produced and distributed material. This paper briefly describes the development of this material in the EFLIESL field; looks at some current issues and concerns, and considers future possibilities with particular reference to computer assisted interactive video. Heelwat van die aandag wat video geniet as hulpmiddel by tweedetaalonderrig is toegespits op 'n relatief klein hoeveelheid kommersieel vervaardigde en verspreide materiaal. Hierdie artikel beskryf kortliks die ontwikkeling van bogenoemde materiaal waar dit Engels as tweede of vreemde taal betref. Verder word daar aandag gegee aan huidige tendense en toekomstige moontlikhede word oorweeg, met spesifieke verwysing na rekenaarondersteunde interaktiewe video.
Full Text Available Background: Psychogenic tremor is the most common psychogenic movement disorder. It has characteristic clinical features that can help distinguish it from other tremor disorders. There is no diagnostic gold standard and the diagnosis is based primarily on clinical history and examination. Despite proposed diagnostic criteria, the diagnosis of psychogenic tremor can be challenging. While there are numerous studies evaluating psychogenic tremor in the literature, there are no publications that provide a video/visual guide that demonstrate the clinical characteristics of psychogenic tremor. Educating clinicians about psychogenic tremor will hopefully lead to earlier diagnosis and treatment. Methods: We selected videos from the database at the Parkinson's Disease Center and Movement Disorders Clinic at Baylor College of Medicine that illustrate classic findings supporting the diagnosis of psychogenic tremor.Results: We include 10 clinical vignettes with accompanying videos that highlight characteristic clinical signs of psychogenic tremor including distractibility, variability, entrainability, suggestibility, and coherence.Discussion: Psychogenic tremor should be considered in the differential diagnosis of patients presenting with tremor, particularly if it is of abrupt onset, intermittent, variable and not congruous with organic tremor. The diagnosis of psychogenic tremor, however, should not be simply based on exclusion of organic tremor, such as essential, parkinsonian, or cerebellar tremor, but on positive criteria demonstrating characteristic features. Early recognition and management are critical for good long-term outcome.
... YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia ... of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ...
Full Text Available Multi-view action recognition has gained a great interest in video surveillance, human computer interaction, and multimedia retrieval, where multiple cameras of different types are deployed to provide a complementary field of views. Fusion of multiple camera views evidently leads to more robust decisions on both tracking multiple targets and analysing complex human activities, especially where there are occlusions. In this paper, we incorporate the marginalised stacked denoising autoencoders (mSDA algorithm to further improve the bag of words (BoWs representation in terms of robustness and usefulness for multi-view action recognition. The resulting representations are fed into three simple fusion strategies as well as a multiple kernel learning algorithm at the classification stage. Based on the internal evaluation, the codebook size of BoWs and the number of layers of mSDA may not significantly affect recognition performance. According to results on three multi-view benchmark datasets, the proposed framework improves recognition performance across all three datasets and outputs record recognition performance, beating the state-of-art algorithms in the literature. It is also capable of performing real-time action recognition at a frame rate ranging from 33 to 45, which could be further improved by using more powerful machines in future applications.
Full Text Available Lip movement of speaker is very informative for many application of speech signal processing such as multi-modal speech recognition and password authentication without speech signal. However, in collecting multi-modal speech information, we need a video camera, large amount of memory, video interface, and high speed processor to extract lip movement in real time. Such a system tends to be expensive and large. This is one reasons of preventing the use of multi-modal speech processing. In this study, we have developed a simple infrared lip movement sensor mounted on a headset, and made it possible to acquire lip movement by PDA, mobile phone, and notebook PC. The sensor consists of an infrared LED and an infrared photo transistor, and measures the lip movement by the reflected light from the mouth region. From experiment, we achieved 66% successfully word recognition rate only by lip movement features. This experimental result shows that our developed sensor can be utilized as a tool for multi-modal speech processing by combining a microphone mounted on the headset.
Valletta, Clement, Ed.; And Others
The document contains scripts, study guides, and discussion questions for two ethnic dramas suitable for ethnic studies at the secondary school level. The first, "A Glass Rose," an adaptation of the novel by Richard Bankowsky, depicts the hopes, dreams, and problems of a Polish immigrant family who reside in an ethnic neighborhood in an…
Full Text Available Abstract The counts of malware attacks exploiting the internet increasing day by day and has become a serious threat. The latest malware spreading out through the media players embedded using the video clip of funny in nature to lure the end users. Once it is executed and installed then the behavior of the malware is in the malware authors hand. The spread of the malware emulates through Internet USB drives sharing of the files and folders can be anything which makes presence concealed. The funny video named as it connected to the film celebrity where the malware variant was collected from the laptop of the terror outfit organization .It runs in the backend which it contains malicious code which steals the user sensitive information like banking credentials username amp password and send it to the remote host user called command amp control. The stealed data is directed to the email encapsulated in the malicious code. The potential malware will spread through the USB and other devices .In summary the analysis reveals the presence of malicious code in executable video file and its behavior.
Smith, Rachel Charlotte; Christensen, Kasper Skov; Iversen, Ole Sejer
We introduce Video Design Games to train educators in teaching design. The Video Design Game is a workshop format consisting of three rounds in which participants observe, reflect and generalize based on video snippets from their own practice. The paper reports on a Video Design Game workshop...
LECTURE NOTES IN COMPUTER SCIENCE Gelbukh, A; Sidorov, G; Guzman -Arenas, A. 1999. Use of a weighted topic hierarchy for document classification...matrix decomposition. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 26 (3): 415-435. Kongovi, M; Guzman , JC; Dasigi, V. 2002. Text categorization: An...RECOGNITION, SPEECH AND IMAGE ANALYSIS 2905: 596-603. LECTURE NOTES IN COMPUTER SCIENCE Porter, AL; Kongthon, A; Lui , JC. 2002. Research profiling
Loktev Alexey Alexeevich
Full Text Available Comprehensive distributed safety, control, and monitoring systems applied by companies and organizations of different ownership structure play a substantial role in the present-day society. Video surveillance elements that ensure image processing and decision making in automated or automatic modes are the essential components of new systems. This paper covers the modeling of video surveillance systems installed in buildings, and the algorithm, or pattern, of video camera placement with due account for nearly all characteristics of buildings, detection and recognition facilities, and cameras themselves. This algorithm will be subsequently implemented as a user application. The project contemplates a comprehensive approach to the automatic placement of cameras that take account of their mutual positioning and compatibility of tasks. The project objective is to develop the principal elements of the algorithm of recognition of a moving object to be detected by several cameras. The image obtained by different cameras will be processed. Parameters of motion are to be identified to develop a table of possible options of routes. The implementation of the recognition algorithm represents an independent research project to be covered by a different article. This project consists in the assessment of the degree of complexity of an algorithm of camera placement designated for identification of cases of inaccurate algorithm implementation, as well as in the formulation of supplementary requirements and input data by means of intercrossing sectors covered by neighbouring cameras. The project also contemplates identification of potential problems in the course of development of a physical security and monitoring system at the stage of the project design development and testing. The camera placement algorithm has been implemented as a software application that has already been pilot tested on buildings and inside premises that have irregular dimensions. The
Ostrowski, Jeffrey R.; Sarhan, Nabil J.
The popularity of social media has grown dramatically over the World Wide Web. In this paper, we analyze the video popularity distribution of well-known social video websites (YouTube, Google Video, and the AOL Truveo Video Search engine) and characterize their workload. We identify trends in the categories, lengths, and formats of those videos, as well as characterize the evolution of those videos over time. We further provide an extensive analysis and comparison of video content amongst the main regions of the world.
Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at ﬁrst, the text image is ...
Full Text Available The phacoemulsification surgery is one of the most advanced surgeries to treat cataract. However, the conventional surgeries are always with low automatic level of operation and over reliance on the ability of surgeons. Alternatively, one imaginative scene is to use video processing and pattern recognition technologies to automatically detect the cataract grade and intelligently control the release of the ultrasonic energy while operating. Unlike cataract grading in the diagnosis system with static images, complicated background, unexpected noise, and varied information are always introduced in dynamic videos of the surgery. Here we develop a VidEo-Based Intelligent Recognitionand Decision (VEBIRD system, which breaks new ground by providing a generic framework for automatically tracking the operation process and classifying the cataract grade in microscope videos of the phacoemulsification cataract surgery. VEBIRD comprises a robust eye (iris detector with randomized Hough transform to precisely locate the eye in the noise background, an effective probe tracker with Tracking-Learning-Detection to thereafter track the operation probe in the dynamic process, and an intelligent decider with discriminative learning to finally recognize the cataract grade in the complicated video. Experiments with a variety of real microscope videos of phacoemulsification verify VEBIRD’s effectiveness.
Yang, Jie Chi; Huang, Yi Ting; Tsai, Chi Cheng; Chung, Ching I.; Wu, Yu Chieh
In recent years, using video as a learning resource has received a lot of attention and has been successfully applied to many learning activities. In comparison with text-based learning, video learning integrates more multimedia resources, which usually motivate learners more than texts. However, one of the major limitations of video learning is…
Höferlin, Markus Johannes
The amount of video data recorded world-wide is tremendously growing and has already reached hardly manageable dimensions. It originates from a wide range of application areas, such as surveillance, sports analysis, scientific video analysis, surgery documentation, and entertainment, and its analysis represents one of the challenges in computer science. The vast amount of video data renders manual analysis by watching the video data impractical. However, automatic evaluation of video material...
Full Text Available ... Call see more videos from Veterans Health Administration I'm Good. But are you ready to listen? ... PSA see more videos from Veterans Health Administration I am A Veteran Family/Friend Active Duty/Reserve ...
Full Text Available ... in crisis, find a facility near you. Spread the Word Download logos, Web ads, and materials and ... Videos from Veterans Health Administration Watch additional videos about getting help. Be There: Help Save a Life ...
Full Text Available ... in crisis, find a facility near you. Spread the Word Download logos, Web ads, and materials and ... Administration Watch additional videos about getting help. Be There: Help Save a Life see more videos from ...
Full Text Available ... in crisis, find a facility near you. Spread the Word Download logos, Web ads, and materials and ... videos about getting help. Be There: Help Save a Life see more videos from Veterans Health Administration ...
Full Text Available ... Resources Spread the Word Videos Homeless Resources Additional Information Make the Connection Get Help When To Call ... Suicide Spread the Word Videos Homeless Resources Additional Information Make the Connection Resource Locator If you or ...
Full Text Available ... Stroke Materials Â Â»Â Loading the player... Video Transcript Weakness on one Side. Trouble Speaking. Trouble ... Stroke: Know the Signs. Act in Time. Ambulance Video Loading the player... This PSA alerts audiences about ...
Full Text Available ... from Veterans Health Administration Be There: Help Save a Life see more videos from Veterans Health Administration ... more videos from Veterans Health Administration I am A Veteran Family/Friend Active Duty/Reserve and Guard ...
Full Text Available ... Involved Crisis Centers About Be There Show You Care Find Resources Graphic Generator Toolkit Signs of Crisis ... out for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Be ...
Full Text Available ... in crisis, find a facility near you. Spread the Word Download logos, Web ads, and materials and ... Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration ...
Full Text Available ... Know Stroke Home » Stroke Materials » Loading the player... Video Transcript Weakness on one Side. Trouble Speaking. Trouble ... Stroke: Know the Signs. Act in Time. Ambulance Video Loading the player... This PSA alerts audiences about ...
Robert C. Lorenz
Full Text Available Video games contain elaborate reinforcement and reward schedules that have the potential to maximize motivation. Neuroimaging studies suggest that video games might have an influence on the reward system. However, it is not clear whether reward-related properties represent a precondition, which biases an individual towards playing video games, or if these changes are the result of playing video games. Therefore, we conducted a longitudinal study to explore reward-related functional predictors in relation to video gaming experience as well as functional changes in the brain in response to video game training.Fifty healthy participants were randomly assigned to a video game training (TG or control group (CG. Before and after training/control period, functional magnetic resonance imaging (fMRI was conducted using a non-video game related reward task.At pretest, both groups showed strongest activation in ventral striatum (VS during reward anticipation. At posttest, the TG showed very similar VS activity compared to pretest. In the CG, the VS activity was significantly attenuated.This longitudinal study revealed that video game training may preserve reward responsiveness in the ventral striatum in a retest situation over time. We suggest that video games are able to keep striatal responses to reward flexible, a mechanism which might be of critical value for applications such as therapeutic cognitive training.
Mølgaard, Lasse Lohilahti; Jørgensen, Kasper Winther
Speaker recognition is basically divided into speaker identification and speaker verification. Verification is the task of automatically determining if a person really is the person he or she claims to be. This technology can be used as a biometric feature for verifying the identity of a person...... in applications like banking by telephone and voice mail. The focus of this project is speaker identification, which consists of mapping a speech signal from an unknown speaker to a database of known speakers, i.e. the system has been trained with a number of speakers which the system can recognize....
Gjødsbøl, Iben Mundbjerg; Svendsen, Mette Nordahl
to misrecognize and humiliate the person under examination. The article ends by proposing that dementia be the condition that forces us to rethink our ways of recognizing persons more generally. Thus, dementia diagnostics provide insights into different enactments of the person that invite us to explore practices......This article investigates how a person with dementia is made up through intersubjective acts of recognition. Based on ethnographic fieldwork in a Danish memory clinic, we show that identification of disease requires patients to be substituted by their relatives in constructing believable medical...
It has recently been found that during recognition memory tests participants’ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ‘‘Pupil Old/New Effect’’ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participants’ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a varie...
Full Text Available Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs, that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.
Full Text Available This paper focuses on modifications to an institutional repository system using the open source DSpace software to support playback of digital videos embedded within item pages. The changes were made in response to the formation and quick startup of an event capture group within the library that was charged with creating and editing video recordings of library events and speakers. This paper specifically discusses the selection of video formats, changes to the visual theme of the repository to allow embedded playback and captioning support, and modifications and bug fixes to the file downloading subsystem to enable skip-ahead playback of videos via byte-range requests. This paper also describes workflows for transcoding videos in the required formats, creating captions, and depositing videos into the repository.
Full Text Available Modern trends in crime control include a variety of technological innovations, including video surveillance systems. The aim of this paper is to review the implementation of video surveillance in contemporary context, considering fundamental theoretical aspects, the legislation and the effectiveness in controlling crime. While considering the theoretical source of ideas on the implementation of video surveillance, priority was given to the concept of situational prevention that focuses on the contextual factors of crime. Capacities for the implementation of video surveillance in Serbia are discussed based on the analysis of the relevant international and domestic legislation, the shortcomings in regulation of this area and possible solutions. Special attention was paid to the effectiveness of video surveillance in public places, in schools and prisons. Starting from the results of studies of video surveillance effectiveness, strengths and weaknesses of these measures and recommendations for improving practice were discussed.
We present a novel approach to lexical error recovery on textual input. An advanced robust tokenizer has been implemented that can not only correct spelling mistakes, but also recover from segmentation errors. Apart from the orthographic considerations taken, the tokenizer also makes use of linguistic expectations extracted from a training corpus. The idea is to arrange Hidden Markov Models (HMM) in multiple layers where the HMMs in each layer are responsible for different aspects of the processing of the input. We report on experimental evaluations with alternative probabilistic language models to guide the lexical error recovery process.
Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.
Asif Ali Laghari
Full Text Available Video sharing on social clouds is popular among the users around the world. High-Definition (HD videos have big file size so the storing in cloud storage and streaming of videos with high quality from cloud to the client are a big problem for service providers. Social clouds compress the videos to save storage and stream over slow networks to provide quality of service (QoS. Compression of video decreases the quality compared to original video and parameters are changed during the online play as well as after download. Degradation of video quality due to compression decreases the quality of experience (QoE level of end users. To assess the QoE of video compression, we conducted subjective (QoE experiments by uploading, sharing, and playing videos from social clouds. Three popular social clouds, Facebook, Tumblr, and Twitter, were selected to upload and play videos online for users. The QoE was recorded by using questionnaire given to users to provide their experience about the video quality they perceive. Results show that Facebook and Twitter compressed HD videos more as compared to other clouds. However, Facebook gives a better quality of compressed videos compared to Twitter. Therefore, users assigned low ratings for Twitter for online video quality compared to Tumblr that provided high-quality online play of videos with less compression.
Full Text Available This paper focuses on the text categorization of Slovak text corpora using latent Dirichlet allocation. Our goal is to build text subcorpora that contain similar text documents. We want to use these better organized text subcorpora to build more robust language models that can be used in the area of speech recognition systems. Our previous research in the area of text categorization showed that we can achieve better results with categorized text corpora. In this paper we used latent Dirichlet allocation for text categorization. We divided initial text corpus into 2, 5, 10, 20 or 100 subcorpora with various iterations and save steps. Language models were built on these subcorpora and adapted with linear interpolation to judicial domain. The experiment results showed that text categorization using latent Dirichlet allocation can improve the system for automatic speech recognition by creating the language models from organized text corpora.
Full Text Available ... Program Growth and Nutrition Program Celiac Disease Program | Videos Contact the Celiac Disease Program 1-617-355-6058 Visit the Celiac ... live happy and productive lives. Each of our video segments provides practical information about celiac disease from real-life families, as well as health ...
Full Text Available ... Lessons? Visit KidsHealth in the Classroom What Other Parents Are Reading Folic Acid and Pregnancy Medical Care ... Special Needs: Planning for Adulthood (Video) KidsHealth > For Parents > Special Needs: Planning for Adulthood (Video) Print A ...
Full Text Available ... ease and allow children with celiac disease to live happy and productive lives. Each of our video segments provides practical information ... Hospital About Us Giving to Boston Children's Newsroom Quality and Patient Safety Research + Innovation Videos Contact Us ...
Full Text Available Deficits in social cognition including facial affect recognition and their detrimental effects on functional outcome are well established in schizophrenia. Structured training can have substantial effects on social cognitive measures including facial affect recognition. Elucidating training effects on cortical mechanisms involved in facial affect recognition may identify causes of dysfunctional facial affect recognition in schizophrenia and foster remediation strategies. In the present study, 57 schizophrenia patients were randomly assigned to (a computer-based facial affect training that focused on affect discrimination and working memory in 20 daily 1-hour sessions, (b similarly intense, targeted cognitive training on auditory-verbal discrimination and working memory, or (c treatment as usual. Neuromagnetic activity was measured before and after training during a dynamic facial affect recognition task (5 s videos showing human faces gradually changing from neutral to fear or to happy expressions. Effects on 10–13 Hz (alpha power during the transition from neutral to emotional expressions were assessed via MEG based on previous findings that alpha power increase is related to facial affect recognition and is smaller in schizophrenia than in healthy subjects. Targeted affect training improved overt performance on the training tasks. Moreover, alpha power increase during the dynamic facial affect recognition task was larger after affect training than after treatment-as-usual, though similar to that after targeted perceptual–cognitive training, indicating somewhat nonspecific benefits. Alpha power modulation was unrelated to general neuropsychological test performance, which improved in all groups. Results suggest that specific neural processes supporting facial affect recognition, evident in oscillatory phenomena, are modifiable. This should be considered when developing remediation strategies targeting social cognition in schizophrenia.
Full Text Available Based on video recordings of the movement of the patients with epilepsy, this paper proposed a human action recognition scheme to detect distinct motion patterns and to distinguish the normal status from the abnormal status of epileptic patients. The scheme first extracts local features and holistic features, which are complementary to each other. Afterwards, a support vector machine is applied to classification. Based on the experimental results, this scheme obtains a satisfactory classification result and provides a fundamental analysis towards the human-robot interaction with socially assistive robots in caring the patients with epilepsy (or other patients with brain disorders in order to protect them from injury.
Hahn, U.; Romacker, M.
We consider the role of textual structures in medical texts. In particular, we examine the impact the lacking recognition of text phenomena has on the validity of medical knowledge bases fed by a natural language understanding front-end. First, we review the results from an empirical study on a sample of medical texts considering, in various forms of local coherence phenomena (anaphora and textual ellipses). We then discuss the representation bias emerging in the text knowledge base that is likely to occur when these phenomena are not dealt with--mainly the emergence of referentially incoherent and invalid representations. We then turn to a medical text understanding system designed to account for local text coherence. PMID:9357739
Michail N. Giannakos
Full Text Available Online video lectures have been considered an instructional media for various pedagogic approaches, such as the flipped classroom and open online courses. In comparison to other instructional media, online video affords the opportunity for recording student clickstream patterns within a video lecture. Video analytics within lecture videos may provide insights into student learning performance and inform the improvement of video-assisted teaching tactics. Nevertheless, video analytics are not accessible to learning stakeholders, such as researchers and educators, mainly because online video platforms do not broadly share the interactions of the users with their systems. For this purpose, we have designed an open-access video analytics system for use in a video-assisted course. In this paper, we present a longitudinal study, which provides valuable insights through the lens of the collected video analytics. In particular, we found that there is a relationship between video navigation (repeated views and the level of cognition/thinking required for a specific video segment. Our results indicated that learning performance progress was slightly improved and stabilized after the third week of the video-assisted course. We also found that attitudes regarding easiness, usability, usefulness, and acceptance of this type of course remained at the same levels throughout the course. Finally, we triangulate analytics from diverse sources, discuss them, and provide the lessons learned for further development and refinement of video-assisted courses and practices.
Fluck, Juliane; Hofmann-Apitius, Martin
Scientific communication in biomedicine is, by and large, still text based. Text mining technologies for the automated extraction of useful biomedical information from unstructured text that can be directly used for systems biology modelling have been substantially improved over the past few years. In this review, we underline the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology. Furthermore, we emphasize the role of publicly organized scientific benchmarking challenges that reflect the current status of text-mining technology and are important in moving the entire field forward. Given further interdisciplinary development of systems biology-orientated ontologies and training corpora, we expect a steadily increasing impact of text-mining technology on systems biology in the future. Copyright © 2013 Elsevier Ltd. All rights reserved.
Full Text Available A road sign recognition system based on adaptive image pre-processing models using two fuzzy inference schemes has been proposed. The first fuzzy inference scheme is to check the changes of the light illumination and rich red color of a frame image by the checking areas. The other is to check the variance of vehicle’s speed and angle of steering wheel to select an adaptive size and position of the detection area. The Adaboost classifier was employed to detect the road sign candidates from an image and the support vector machine technique was employed to recognize the content of the road sign candidates. The prohibitory and warning road traffic signs are the processing targets in this research. The detection rate in the detection phase is 97.42%. In the recognition phase, the recognition rate is 93.04%. The total accuracy rate of the system is 92.47%. For video sequences, the best accuracy rate is 90.54%, and the average accuracy rate is 80.17%. The average computing time is 51.86 milliseconds per frame. The proposed system can not only overcome low illumination and rich red color around the road sign problems but also offer high detection rates and high computing performance.
Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.
Full Text Available Nowadays, editing technology has entered the digital age. Technology will demonstrate the evidence of processing analog to digital data has become simpler since editing technology has been integrated in the society in all aspects. Understanding the technique of processing analog to digital data is important in producing a video. To utilize this technology, the introduction of equipments is fundamental to understand the features. The next phase is the capturing process that supports the preparation in editing process from scene to scene; therefore, it will become a watchable video.
Full Text Available Video boundary detection belongs to a basis subject in computer vision. It is more important to video analysis and video understanding. The existing video boundary detection methods always are effective to certain types of video data. These methods have relatively low generalization ability. We present a novel shot boundary detection algorithm based on video dynamic texture. Firstly, the two adjacent frames are read from a given video. We normalize the two frames to get the same size frame. Secondly, we divide these frames into some sub-domain on the same standard. The following thing is to calculate the average gradient direction of sub-domain and form dynamic texture. Finally, the dynamic texture of adjacent frames is compared. We have done some experiments in different types of video data. These experimental results show that our method has high generalization ability. To different type of videos, our algorithm can achieve higher average precision and average recall relative to some algorithms.
Dewi Yunita Sari
Full Text Available Video merupakan barang bukti digital yang salah satunya berasal dari handycam, dalam hal kejahatan video biasanya dimanipulasi untuk menghilangkan bukti-bukti yang ada di dalamnya, oleh sebab itu diperlukan analisis forensik untuk dapat mendeteksi keaslian video tersebut. Dalam penelitian ini di lakukan manipulasi video dengan attack cropping, zooming, rotation, dan grayscale, hal ini bertujuan untuk membandingkan antara rekaman video asli dan rekaman video tampering, dari rekaman video tersebut dianalisis dengan menggunakann metode localization tampering, yaitu metode deteksi yang menunjukkan bagian pada video yang telah dimanipulasi, dengan menganalisis frame, perhitungan histogram, dan grafik histogram. Dengan localization tampering tersebut maka dapat diketahui letak frame dan durasi pada video yang telah mengalami tampering.
Full Text Available With the development of heterogeneous networks and video coding standards, multiresolution video applications over networks become important. It is critical to ensure the service quality of the network for time-sensitive video services. Worldwide Interoperability for Microwave Access (WIMAX is a good candidate for delivering video signals because through WIMAX the delivery quality based on the quality-of-service (QoS setting can be guaranteed. The selection of suitable QoS parameters is, however, not trivial for service users. Instead, what a video service user really concerns with is the video quality of presentation (QoP which includes the video resolution, the fidelity, and the frame rate. In this paper, we present a quality control mechanism in multiresolution video coding structures over WIMAX networks and also investigate the relationship between QoP and QoS in end-to-end connections. Consequently, the video presentation quality can be simply mapped to the network requirements by a mapping table, and then the end-to-end QoS is achieved. We performed experiments with multiresolution MPEG coding over WIMAX networks. In addition to the QoP parameters, the video characteristics, such as, the picture activity and the video mobility, also affect the QoS significantly.
Washburn, D. A.; Gulledge, J. P.; Rumbaugh, D. M.
Four rhesus monkeys (Macaca mulatta) were tested on joystick-based computer tasks in which they could choose to be reinforced either with pellets-only or with pellets + video. A variety of videotapes were used to reinforce task performance. The monkeys significantly preferred to be rewarded with a pellet and 10 s of a blank screen than a pellet plus 10 s of videotape. When they did choose to see videotaped images, however, they were significantly more likely to view video of themselves than video of their roommate or of unfamiliar conspecifics. These data support earlier findings of individual differences in preference for video reinforcement, and have clear implications for the study of face-recognition and self-recognition by nonhuman primates.
Olasimbo Ayodeji Arigbabu
Full Text Available Soft biometrics can be used as a prescreening filter, either by using single trait or by combining several traits to aid the performance of recognition systems in an unobtrusive way. In many practical visual surveillance scenarios, facial information becomes difficult to be effectively constructed due to several varying challenges. However, from distance the visual appearance of an object can be efficiently inferred, thereby providing the possibility of estimating body related information. This paper presents an approach for estimating body related soft biometrics; specifically we propose a new approach based on body measurement and artificial neural network for predicting body weight of subjects and incorporate the existing technique on single view metrology for height estimation in videos with low frame rate. Our evaluation on 1120 frame sets of 80 subjects from a newly compiled dataset shows that the mentioned soft biometric information of human subjects can be adequately predicted from set of frames.
Full Text Available ... of five videos was designed to help you learn more about Rheumatoid Arthritis (RA). You will learn how the diagnosis of RA is made, what ... and what other conditions are associated with RA. Learning more about your condition will allow you to ...
Full Text Available ... Johns Hopkins Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos ... Drug Information for Patients Arthritis Drug Information Sheets Benefits and Risks of Opioids in Arthritis ... website is intended for educational purposes only. Physicians and other health care professionals are encouraged to consult other sources ...
Full Text Available ... will allow you to take a more active role in your care. The information in these videos should not take the place of any advice you receive from your rheumatologist. Click A Link Below To Play Rheumatoid Arthritis: Symptoms and Diagnosis Rheumatoid Arthritis: What ...
Full Text Available ... are available, what is happening in the immune system and what other conditions are associated with RA. Learning more about your condition will allow you to take a more active role in your care. The information in these videos should not take the place ...
Manuel Calvelo Ríos
Full Text Available El Video resulta ser una herramienta sumamente útil para el desarrollo rural. Entendemos por desarrollo rural el intento de regular las relaciones campo-ciudad en términos más equitativos para el hombre del campo. Es por tanto una decisión política.
Holte, Michael Boelstoft; Moeslund, Thomas B.
This paper presents a method for automatic recognition of human gestures. The method works with 3D image data from a range camera to achieve invariance to viewpoint. The recognition is based solely on motion from characteristic instances of the gestures. These instances are denoted 3D motion...... as a gesture using a probabilistic edit distance method. The system has been trained on frontal images (0deg camera rotation) and tested on 240 video sequences from 0deg and 45deg. An overall recognition rate of 82.9% is achieved. The recognition rate is independent of the viewpoint which shows that the method...
Full Text Available Resumen:El presente ensayo contiene dos partes. En la primera se hace una breve descripción de las carencias de la reflexión moral a las que parece venir al encuentro el concepto de reconocimiento. Charles Taylor y Axel Honneth, protagonistas en estos debates, dan buenas razones para dirigir la discusión hacia el tema del reconocimiento, pero no coinciden ni en su definición, ni en el modo de recuperar la tesis de Hegel, ni tampoco en la forma de tratar la relación entre autonomía y reconocimiento. En la segunda parte se analiza la concepción propiamente hegeliana, con la intención de destacar el nexo esencial, no la ruptura, que existe entre la noción de reconocimiento y el modelo conceptual de la voluntad libre o del espíritu. Abstract:This essay is divided into two parts. The first one is a short description of the deficiencies of moral reflection, which seem to lead the discussion towards the concept of recognition. Charles Taylor and Axel Honneth, two of the protagonists of these debates, give very good reasons for turning the argument towards the issue of recognition, but they do not agree on its definition, on the way to recover the Hegelian thesis, or on how to approach the relationship between autonomy and recognition. The second part constitutes an analysis of the Hegelian conception of recognition, in order to highlight the essential link –rather than the rupture– between the notion of recognition and the conceptual model of free will or spirit.
Bourgonjon, Jeroen; Soetaert, Ronald
... by exploring a particular aspect of digitization that affects young people, namely video games. They explore the new social spaces which emerge in video game culture and how these spaces relate to community building and citizenship...
... the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media For Clinicians For ... Family Caregivers Glossary Menu In this section Links Videos Podcasts Webinars For the Media For Clinicians For ...
This article is an introduction to video screen capture. Basic information of two software programs, QuickTime for Mac and BlueBerry Flashback Express for PC, are also discussed. Practical applications for video screen capture are given.
Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
Full Text Available Preliminary data are reported from experiments in which Warrington's (1984 Recognition Memory Tests were given to patients with misidentification delusions including the Capgras type and to psychotic patients. The results showed a profound impairment on face recognition for most groups, especially those with the Capgras delusion. It was rare to find a patent whose score on the word test was anything but normal.
Full Text Available Our paper focuses on the graphical analysis domain. We propose an automatic image recognition technique. This approach consists of two main pattern recognition steps. First, it performs an image feature extraction operation on an input image set, using statistical dispersion features. Then, an unsupervised classification process is performed on the previously obtained graphical feature vectors. An automatic region-growing based clustering procedure is proposed and utilized in the classification stage.
Pasch, H. L.
An overview of video coding is presented. The aim is not to give a technical summary of possible coding techniques, but to address subjects related to video compression in general and to the transmission of compressed video in more detail. Bit rate reduction is in general possible by removing redundant information; removing information the eye does not use anyway; and reducing the quality of the video. The codecs which are used for reducing the bit rate, can be divided into two groups: Constant Bit rate Codecs (CBC's), which keep the bit rate constant, but vary the video quality; and Variable Bit rate Codecs (VBC's), which keep the video quality constant by varying the bit rate. VBC's can be in general reach a higher video quality than CBC's using less bandwidth, but need a transmission system that allows the bandwidth of a connection to fluctuate in time. The current and the next generation of the PSTN does not allow this; ATM might. There are several factors which influence the quality of video: the bit error rate of the transmission channel, slip rate, packet loss rate/packet insertion rate, end-to-end delay, phase shift between voice and video, and bit rate. Based on the bit rate of the coded video, the following classification of coded video can be made: High Definition Television (HDTV); Broadcast Quality Television (BQTV); video conferencing; and video telephony. The properties of these classes are given. The video conferencing and video telephony equipment available now and in the next few years can be divided into three categories: conforming to 1984 CCITT standard for video conferencing; conforming to 1988 CCITT standard; and conforming to no standard.
Online videos are an increasingly important way technology is contributing to the improvement of physics teaching. Students and teachers have begun to rely on online videos to provide them with content knowledge and instructional strategies. Online audiences are expecting greater production value, and departments are sometimes requesting educators to post video pre-labs or to flip our classrooms. In this article, I share my advice on creating engaging physics videos.
Potter, Ray; Roberts, Deborah
This guide aims to provide an introduction to Desktop Video Conferencing. You may be familiar with video conferencing, where participants typically book a designated conference room and communicate with another group in a similar room on another site via a large screen display. Desktop video conferencing (DVC), as the name suggests, allows users to video conference from the comfort of their own office, workplace or home via a desktop/laptop Personal Computer. DVC provides live audio and visua...
... 47 Telecommunication 4 2010-10-01 2010-10-01 false Video description of video programming. 79.3... CLOSED CAPTIONING AND VIDEO DESCRIPTION OF VIDEO PROGRAMMING § 79.3 Video description of video programming. (a) Definitions. For purposes of this section the following definitions shall apply: (1...
Full Text Available This article presents tips on how to use video in qualitative research. The author states that, though there many complex and powerful computer programs for working with video, the work done in qualitative research does not require those programs. For this work, simple editing software is sufficient. Also presented is an easy and efficient method of transcribing video clips.
Lovink, G.; Somers Miles, R.
Video Vortex Reader II is the Institute of Network Cultures' second collection of texts that critically explore the rapidly changing landscape of online video and its use. With the success of YouTube ('2 billion views per day') and the rise of other online video sharing platforms, the moving image
Full Text Available This paper presents an object occlusion detection algorithm using object depth information that is estimated by automatic camera calibration. The object occlusion problem is a major factor to degrade the performance of object tracking and recognition. To detect an object occlusion, the proposed algorithm consists of three steps: (i automatic camera calibration using both moving objects and a background structure; (ii object depth estimation; and (iii detection of occluded regions. The proposed algorithm estimates the depth of the object without extra sensors but with a generic red, green and blue (RGB camera. As a result, the proposed algorithm can be applied to improve the performance of object tracking and object recognition algorithms for video surveillance systems.
Buggey, Tom; Ogle, Lindsey
Video self-modeling (VSM) first appeared on the psychology and education stage in the early 1970s. The practical applications of VSM were limited by lack of access to tools for editing video, which is necessary for almost all self-modeling videos. Thus, VSM remained in the research domain until the advent of camcorders and VCR/DVD players and,…
Otrel-Cass, Kathrin; Khalid, Md. Saifuddin
With an interest in learning that is set in collaborative situations, the data session presents excerpts from video data produced by two of fifteen students from a class of 5th semester techno-anthropology course. Students used video cameras to capture the time they spent working with a scientist...... video, nature of the interactional space, and material and spatial semiotics....
Epley, Hannah K.
There is a need for Extension professionals to show clientele the benefits of their program. This article shares how promotional videos are one way of reaching audiences online. An example is given on how a promotional video has been used and developed using iMovie software. Tips are offered for how professionals can create a promotional video and…
I Made Oka Widyantara
Full Text Available This paper aims to analyze Internet-based streaming video service in the communication media with variable bit rates. The proposed scheme on Dynamic Adaptive Streaming over HTTP (DASH using the internet network that adapts to the protocol Hyper Text Transfer Protocol (HTTP. DASH technology allows a video in the video segmentation into several packages that will distreamingkan. DASH initial stage is to compress the video source to lower the bit rate video codec uses H.26. Video compressed further in the segmentation using MP4Box generates streaming packets with the specified duration. These packages are assembled into packets in a streaming media format Presentation Description (MPD or known as MPEG-DASH. Streaming video format MPEG-DASH run on a platform with the player bitdash teritegrasi bitcoin. With this scheme, the video will have several variants of the bit rates that gave rise to the concept of scalability of streaming video services on the client side. The main target of the mechanism is smooth the MPEG-DASH streaming video display on the client. The simulation results show that the scheme based scalable video streaming MPEG-DASH able to improve the quality of image display on the client side, where the procedure bufering videos can be made constant and fine for the duration of video views
Full Text Available While it has been established that using full body motion to play active video games results in increased levels of energy expenditure, there is little information on the classification of human movement during active video game play in relationship to fundamental movement skills. The aim of this study was to validate software utilising Kinect sensor motion capture technology to recognise fundamental movement skills (FMS, during active video game play. Two human assessors rated jumping and side-stepping and these assessments were compared to the Kinect Action Recognition Tool (KART, to establish a level of agreement and determine the number of movements completed during five minutes of active video game play, for 43 children (m = 12 years 7 months ± 1 year 6 months. During five minutes of active video game play, inter-rater reliability, when examining the two human raters, was found to be higher for the jump (r = 0.94, p < .01 than the sidestep (r = 0.87, p < .01, although both were excellent. Excellent reliability was also found between human raters and the KART system for the jump (r = 0.84, p, .01 and moderate reliability for sidestep (r = 0.6983, p < .01 during game play, demonstrating that both humans and KART had higher agreement for jumps than sidesteps in the game play condition. The results of the study provide confidence that the Kinect sensor can be used to count the number of jumps and sidestep during five minutes of active video game play with a similar level of accuracy as human raters. However, in contrast to humans, the KART system required a fraction of the time to analyse and tabulate the results.
In the middle of the twentieth century, long before video games were even imagined as a mode of popular entertainment, religious theorist Mircea Eliade argued that the recognition of the "sacred...
Full Text Available Abstract Background Speaker detection is an important component of many human-computer interaction applications, like for example, multimedia indexing, or ambient intelligent systems. This work addresses the problem of detecting the current speaker in audio-visual sequences. The detector performs with few and simple material since a single camera and microphone meets the needs. Method A multimodal pattern recognition framework is proposed, with solutions provided for each step of the process, namely, the feature generation and extraction steps, the classification, and the evaluation of the system performance. The decision is based on the estimation of the synchrony between the audio and the video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework in order to get confidence levels associated to the classifier outputs, allowing thereby an evaluation of the performance of the whole multimodal pattern recognition system. Results Through the hypothesis testing approach, the classifier performance can be given as a ratio of detection to false-alarm probabilities. Above all, the hypothesis tests give means for measuring the whole pattern recognition process effciency. In particular, the gain offered by the proposed feature extraction step can be evaluated. As a result, it is shown that introducing such a feature extraction step increases the ability of the classifier to produce good relative instance scores, and therefore, the performance of the pattern recognition process. Conclusion The powerful capacities of hypothesis tests as an evaluation tool are exploited to assess the performance of a multimodal pattern recognition process. In particular, the advantage of performing or not a feature extraction step prior to the classification is evaluated. Although the proposed framework is
Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Hickman, Simon J
Internet video sharing sites allow the free dissemination of educational material. This study investigated the quality and educational content of videos of eye movement disorders posted on such sites. Educational neurological eye movement videos were identified by entering the titles of the eye movement abnormality into the search boxes of the video sharing sites. Also, suggested links were followed from each video. The number of views, likes, and dislikes for each video were recorded. The videos were then rated for their picture and sound quality. Their educational value was assessed according to whether the video included a description of the eye movement abnormality, the anatomical location of the lesion (if appropriate), and the underlying diagnosis. Three hundred fifty-four of these videos were found on YouTube and Vimeo. There was a mean of 6,443 views per video (range, 1-195,957). One hundred nineteen (33.6%) had no form of commentary about the eye movement disorder shown apart from the title. Forty-seven (13.3%) contained errors in the title or in the text. Eighty (22.6%) had excellent educational value by describing the eye movement abnormality, the anatomical location of the lesion, and the underlying diagnosis. Of these, 30 also had good picture and sound quality. The videos with excellent educational value had a mean of 9.84 "likes" per video compared with 2.37 for those videos without a commentary (P educational value with good picture and sound quality had a mean of 10.23 "likes" per video (P = 0.004 vs videos with no commentary). There was no significant difference in the mean number of "dislikes" between those videos that had no commentary or which contained errors and those with excellent educational value. There are a large number of eye movement videos freely available on these sites; however, due to the lack of peer review, a significant number have poor educational value due to having no commentary or containing errors. The number of "likes
Full Text Available Formålet med denne artikel er at vise, hvordan læringsdesign og stilladsering kan anvendes til at skabe en ramme for studenterproduceret video til eksamen på videregående uddannelser. Artiklen tager udgangspunkt i en problemstilling, hvor uddannelsesinstitutionerne skal håndtere og koordinere undervisning inden for både det faglige område og mediefagligt område og sikre en balance mellem en fagfaglighed og en mediefaglig tilgang. Ved at dele opgaven ud på flere faglige resurser, er der mere koordinering, men man kommer omkring problemet med krav til underviserne om dobbelt faglighed ved medieproduktioner. Med afsæt i Lanarca Declarationens perspektiver på læringsdesign og hovedsageligt Jerome Bruners principper for stilladsering, sammensættes en model for understøttelse af videoproduktion af studerende på videregående uddannelser. Ved at anvende denne model for undervisningssessioner og forløb får de fagfaglige og mediefaglige undervisere et redskab til at fokusere og koordinere indsatsen frem mod målet med, at de studerende producerer og anvender video til eksamen.
No milestone has proven as elusive as the always-approaching "year of the LAN," but the "year of the scanner" might claim the silver medal. Desktop scanners have been around almost as long as personal computers. And everyone thinks they are used for obvious desktop-publishing and business tasks like scanning business documents, magazine articles and other pages, and translating those words into files your computer understands. But, until now, the reality fell far short of the promise. Because it's true that scanners deliver an accurate image of the page to your computer, but the software to recognize this text has been woefully disappointing. Old optical-character recognition (OCR) software recognized such a limited range of pages as to be virtually useless to real users. (For example, one OCR vendor specified 12-point Courier font from an IBM Selectric typewriter: the same font in 10-point, or from a Diablo printer, was unrecognizable!) Computer dealers have told me the chasm between OCR expectations and reality is so broad and deep that nine out of ten prospects leave their stores in disgust when they learn the limitations. And this is a very important, very unfortunate gap. Because the promise of recognition -- what people want it to do -- carries with it tremendous improvements in our productivity and ability to get tons of written documents into our computers where we can do real work with it. The good news is that a revolutionary new development effort has led to the new technology of "page recognition," which actually does deliver the promise we've always wanted from OCR. I'm sure every reader appreciates the breakthrough represented by the laser printer and page-makeup software, a combination so powerful it created new reasons for buying a computer. A similar breakthrough is happening right now in page recognition: the Macintosh (and, I must admit, other personal computers) equipped with a moderately priced scanner and OmniPage software (from Caere
Belonging to the wider academic field of computer vision, video analytics has aroused a phenomenal surge of interest since the current millennium. Video analytics is intended to solve the problem of the incapability of exploiting video streams in real time for the purpose of detection or anticipation. It involves analyzing the videos using algorithms that detect and track objects of interest over time and that indicate the presence of events or suspect behavior involving these objects.The aims of this book are to highlight the operational attempts of video analytics, to identify possi
There has been a phenomenal growth in video applications over the past few years. An accurate traffic model of Variable Bit Rate (VBR) video is necessary for performance evaluation of a network design and for generating synthetic traffic that can be used for benchmarking a network. A large number of models for VBR video traffic have been proposed in the literature for different types of video in the past 20 years. Here, the authors have classified and surveyed these models and have also evaluated the models for H.264 AVC and MVC encoded video and discussed their findings.
Yamada, Kaho; Yoshida, Takeshi; Sumi, Kazuhiko; Habe, Hitoshi; Mitsugami, Ikuhisa
Recently, dense trajectories  have been shown to be a successful video representation for action recognition, and have demonstrated state-of-the-art results with a variety of datasets. However, if we apply these trajectories to gesture recognition, recognizing similar and fine-grained motions is problematic. In this paper, we propose a new method in which dense trajectories are calculated in segmented regions around detected human body parts. Spatial segmentation is achieved by body part detection . Temporal segmentation is performed for a fixed number of video frames. The proposed method removes background video noise and can recognize similar and fine-grained motions. Only a few video datasets are available for gesture classification; therefore, we have constructed a new gesture dataset and evaluated the proposed method using this dataset. The experimental results show that the proposed method outperforms the original dense trajectories.
Full Text Available ... Search Parents Home General Health Growth & Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & ... Safety Too Late for the Flu Vaccine? Eating Disorders Arrhythmias Special Needs: Planning for Adulthood (Video) KidsHealth > ...
Full Text Available ... page is best accessed via your desktop. Celiac Disease Program Home > Centers + Services > Programs and Services > Celiac ... Bone Health Program Growth and Nutrition Program Celiac Disease Program | Videos Contact the Celiac Disease Program 1- ...
Full Text Available ... Find a Doctor Find a Location Overview Meet our Team Conditions and Treatments Celiac Support Group Patient ... to live happy and productive lives. Each of our video segments provides practical information about celiac disease ...
Full Text Available ... This page is best accessed via your desktop. Celiac Disease Program Home > Centers + Services > Programs and Services > Celiac ... Nutrition Bone Health Program Growth and Nutrition Program Celiac Disease Program | Videos Contact the Celiac Disease Program 1- ...
Full Text Available ... get the care they need. Learn about giving This page is best accessed via your desktop. Celiac ... Disease Program | Videos Contact the Celiac Disease Program 1-617-355-6058 Visit the Celiac Support Group ...
Full Text Available ... Cope With a Parent's Suicide? Special Needs: Planning for Adulthood (Video) KidsHealth > For Parents > Special Needs: Planning ... options. For Teens For Kids For Parents MORE ON THIS TOPIC Financial Planning for Kids With Special ...
Full Text Available ... Disease Program Overview > Videos Request an Appointment Second Opinion Find a Doctor Find a Location Overview Meet ... We Help You? International Visitors Get a Second Opinion Find a Doctor Centers and Services Conditions + Treatments ...
Full Text Available ... Overview Meet our Team Conditions and Treatments Celiac Support Group Patient Resources + Videos – Experiencing Celiac Disease What ... Program 1-617-355-6058 Visit the Celiac Support Group Facebook page CSG Facebook Page Boston Children's ...
Full Text Available ... Back to School Programs Healthful Living Injuries and Safety More Boston Children's Hospital #1 Ranked Children’s Hospital ... Giving to Boston Children's Newsroom Quality and Patient Safety Research + Innovation Videos Contact Us Boston Children's Hospital ...
Full Text Available ... video series together to learn about everything from financial and health care benefits to employment and housing ... For Kids For Parents MORE ON THIS TOPIC Financial Planning for Kids With Special Needs Giving Teens ...
Full Text Available ... for Adulthood (Video) Print A A A Young adults with special needs have many programs, services, and ... Care for Your Child With Special Needs Special Education: Getting Support for Your Child Words to Know ( ...
Full Text Available ... Us Giving to Boston Children's Newsroom Quality and Patient Safety Research + Innovation Videos Contact Us Boston Children's Hospital 300 Longwood Avenue, Boston, MA 02115 For Patients: 617-355-6000 For Referring Providers: 844-BCH- ...
Full Text Available ... This page is best accessed via your desktop. Celiac Disease Program Home > Centers + Services > Programs and Services > ... Nutrition Bone Health Program Growth and Nutrition Program Celiac Disease Program | Videos Contact the Celiac Disease Program ...
Full Text Available ... they need. Learn about giving This page is best accessed via your desktop. Celiac Disease Program Home ... Resources + Videos – Experiencing Celiac Disease What is Celiac ...
Full Text Available ... Videos – Experiencing Celiac Disease What is Celiac Disease Diet Information At Home Shopping Cooking + School Eating Out ... What is Celiac Disease? : Diagnosis and treatment III. Diet Information : How to start and maintain a gluten- ...
Full Text Available ... video series together to learn about everything from financial and health care benefits to employment and housing options. More on this topic for: Parents Financial Planning for Kids With Special Needs Giving Teens ...
Full Text Available ... Services > Programs and Services > Celiac Disease Program Overview > Videos Request an Appointment Second Opinion Find a Doctor Find a Location Overview Meet our Team Conditions and Treatments Celiac ...