Swapnil Vitthal Tathe
Full Text Available Advancement in computer vision technology and availability of video capturing devices such as surveillance cameras has evoked new video processing applications. The research in video face recognition is mostly biased towards law enforcement applications. Applications involves human recognition based on face and iris, human computer interaction, behavior analysis, video surveillance etc. This paper presents face tracking framework that is capable of face detection using Haar features, recognition using Gabor feature extraction, matching using correlation score and tracking using Kalman filter. The method has good recognition rate for real-life videos and robust performance to changes due to illumination, environmental factors, scale, pose and orientations.
Most biometric systems employed for human recognition require physical contact with, or close proximity to, a cooperative subject. Far more challenging is the ability to reliably recognize individuals at a distance, when viewed from an arbitrary angle under real-world environmental conditions. Gait and face data are the two biometrics that can be most easily captured from a distance using a video camera. This comprehensive and logically organized text/reference addresses the fundamental problems associated with gait and face-based human recognition, from color and infrared video data that are
Yeung, Serena; Fathi, Alireza; Fei-Fei, Li
In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text ...
Zhou, Saohua; Krüger, Volker; Chellappa, Rama
Recognition of human faces using a gallery of still or video images and a probe set of videos is systematically investigated using a probabilistic framework. In still-to-video recognition, where the gallery consists of still images, a time series state space model is proposed to fuse temporal...... of the identity variable produces the recognition result. The model formulation is very general and it allows a variety of image representations and transformations. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach for both still-to-video and video-to-video...... information in a probe video, which simultaneously characterizes the kinematics and identity using a motion vector and an identity variable, respectively. The joint posterior distribution of the motion vector and the identity variable is estimated at each time instant and then propagated to the next time...
Craciunescu, Razvan; Mihovska, Albena Dimitrova; Kyriazakos, Sofoklis
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current research focus includes on the emotion...... recognition from the face and hand gesture recognition. Gesture recognition enables humans to communicate with the machine and interact naturally without any mechanical devices. This paper investigates the possibility to use non-audio/video sensors in order to design a low-cost gesture recognition device...
Full Text Available The video recognition technology is applied to the landslide emergency remote monitoring system. The trajectories of the landslide are identified by this system in this paper. The system of geological disaster monitoring is applied synthetically to realize the analysis of landslide monitoring data and the combination of video recognition technology. Landslide video monitoring system will video image information, time point, network signal strength, power supply through the 4G network transmission to the server. The data is comprehensively analysed though the remote man-machine interface to conduct to achieve the threshold or manual control to determine the front-end video surveillance system. The system is used to identify the target landslide video for intelligent identification. The algorithm is embedded in the intelligent analysis module, and the video frame is identified, detected, analysed, filtered, and morphological treatment. The algorithm based on artificial intelligence and pattern recognition is used to mark the target landslide in the video screen and confirm whether the landslide is normal. The landslide video monitoring system realizes the remote monitoring and control of the mobile side, and provides a quick and easy monitoring technology.
Wang, Xiaoyang; Ji, Qiang
Current video event recognition research remains largely target-centered. For real-world surveillance videos, targetcentered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.
Full Text Available This paper describes a mobile device which tries to give the blind or visually impaired access to text information. Three key technologies are required for this system: text detection, optical character recognition, and speech synthesis. Blind users and the mobile environment imply two strong constraints. First, pictures will be taken without control on camera settings and a priori information on text (font or size and background. The second issue is to link several techniques together with an optimal compromise between computational constraints and recognition efficiency. We will present the overall description of the system from text detection to OCR error correction.
Full Text Available This review article surveys extensively the current progresses made toward video-based human activity recognition. Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation. In the core technology, three critical processing stages are thoroughly discussed mainly: human object segmentation, feature extraction and representation, activity detection and classification algorithms. In the human activity recognition systems, three main types are mentioned, including single person activity recognition, multiple people interaction and crowd behavior, and abnormal activity recognition. Finally the domains of applications are discussed in detail, specifically, on surveillance environments, entertainment environments and healthcare systems. Our survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications. Moreover, in this survey, various applications are discussed in great detail, specifically, a survey on the applications in healthcare monitoring systems.
Demir, Defne; Skouteris, Helen
The overall aim of the experiment reported here was to establish whether self-recognition in live video can be facilitated when live video training is provided to children aged 2-2.5 years. While the majority of children failed the test of live self-recognition prior to video training, more than half exhibited live self-recognition post video…
A novel, template-based method for face recognition is presented. The goals of the proposed method are to integrate multiple observations for improved robustness and to provide auxiliary confidence data for subsequent use in an automated video surveillance system. The proposed framework consists of a parallel system of classifiers, referred to as observers, where each observer is trained on one face region. The observer outputs are combined to yield the final recognition result. Three of the four confounding factors-expression, illumination, and decoration-are specifically addressed in this paper. The extension of the proposed approach to address the fourth confounding factor-pose-is straightforward and well supported in previous work. A further contribution of the proposed approach is the computation of a revealing confidence measure. This confidence measure will aid the subsequent application of the proposed method to video surveillance scenarios. Results are reported for a database comprising 676 images of 160 subjects under a variety of challenging circumstances. These results indicate significant performance improvements over previous methods and demonstrate the usefulness of the confidence data
Full Text Available This paper presents an innovative SIFT-based method for rigid video object recognition (hereafter called RVO-SIFT. Just like what happens in the vision system of human being, this method makes the object recognition and feature updating process organically unify together, using both trajectory and feature matching, and thereby it can learn new features not only in the training stage but also in the recognition stage, which can improve greatly the completeness of the video object’s features automatically and, in turn, increases the ratio of correct recognition drastically. The experimental results on real video sequences demonstrate its surprising robustness and efficiency.
Full Text Available The paper describes a system of hand gesture recognition by image processing for human robot interaction. The recognition and interpretation of the hand postures acquired through a video camera allow the control of the robotic arm activity: motion - translation and rotation in 3D - and tightening/releasing the clamp. A gesture dictionary was defined and heuristic algorithms for recognition were developed and tested. The system can be used for academic and industrial purposes, especially for those activities where the movements of the robotic arm were not previously scheduled, for training the robot easier than using a remote control. Besides the gesture dictionary, the novelty of the paper consists in a new technique for detecting the relative positions of the fingers in order to recognize the various hand postures, and in the achievement of a robust system for controlling robots by postures of the hands.
Salway, Andrew; Graham, Mike; Tomadaki, Eleftheria; Xu, Yan
The ongoing TIWO project is investigating the synthesis of language technologies, like information extraction and corpus-based text analysis, video data modeling and knowledge representation. The aim is to develop a computational account of how video and text can be integrated by representations of narrative in multimedia systems. The multimedia domain is that of film and audio description – an emerging text type that is produced specifically to be informative about the events and objects dep...
Ukhanova, Ann; Støttrup-Andersen, Jesper; Forchhammer, Søren
Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC s...... recognition in our study has a behavior similar to human recognition, allowing the use of the same mathematical models. We furthermore propose an application of one of the models for video surveillance systems......Definition of video quality requirements for video surveillance poses new questions in the area of quality assessment. This paper presents a quality assessment experiment for an automatic license plate recognition scenario. We explore the influence of the compression by H.264/AVC and H.265/HEVC...... standards on the recognition performance. We compare logarithmic and logistic functions for quality modeling. Our results show that a logistic function can better describe the dependence of recognition performance on the quality for both compression standards. We observe that automatic license plate...
Full Text Available In this paper we propose a new approach for facial micro expressions recognition. For this purpose the Eulerian Video Magnification (EVM method is used to retrieve the subtle motions of the face. The results of this method are obtained as in the magnified images sequence. In this study the numerical tests are performed on two databases: Spontaneous Micro expression (SMIC and Category and Sourcing Managers Executive (CASME. We evaluate our proposed method in two phases using the eigenface method. In phase 1 we recognize the type of a micro expression, for example emotional versus unemotional in SMIC database. Phase 2 classifies the recognized micro expression as negative versus positive in SMIC database and happiness versus disgust in CASME database. The results show that the eigenface method by the EVM method for the retrieval of subtle motions of the face increases the performance of micro expression recognition. Moreover, the proposed approach is more accurate and promising than the previous works in micro expressions recognition.
Video networks is an emerging interdisciplinary field with significant and exciting scientific and technological challenges. It has great promise in solving many real-world problems and enabling a broad range of applications, including smart homes, video surveillance, environment and traffic monitoring, elderly care, intelligent environments, and entertainment in public and private spaces. This paper provides an overview of the design of a wireless video network as an experimental environment, camera selection, hand-off and control, anomaly detection. It addresses challenging questions for individual identification using gait and face at a distance and present new techniques and their comparison for robust identification.
Kobla, Vikrant; DeMenthon, Daniel; Doermann, David S.
Automated classification of digital video is emerging as an important piece of the puzzle in the design of content management systems for digital libraries. The ability to classify videos into various classes such as sports, news, movies, or documentaries, increases the efficiency of indexing, browsing, and retrieval of video in large databases. In this paper, we discuss the extraction of features that enable identification of sports videos directly from the compressed domain of MPEG video. These features include detecting the presence of action replays, determining the amount of scene text in vide, and calculating various statistics on camera and/or object motion. The features are derived from the macroblock, motion,and bit-rate information that is readily accessible from MPEG video with very minimal decoding, leading to substantial gains in processing speeds. Full-decoding of selective frames is required only for text analysis. A decision tree classifier built using these features is able to identify sports clips with an accuracy of about 93 percent.
Kimura, Marcia L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Erikson, Rebecca L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Lombardo, Nicholas J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
The Pacific Northwest National Laboratory (PNNL) will produce a non-cooperative (i.e. not posing for the camera) facial recognition video data set for research purposes to evaluate and enhance facial recognition systems technology. The aggregate data set consists of 1) videos capturing PNNL role players and public volunteers in three key operational settings, 2) photographs of the role players for enrolling in an evaluation database, and 3) ground truth data that documents when the role player is within various camera fields of view. PNNL will deliver the aggregate data set to DHS who may then choose to make it available to other government agencies interested in evaluating and enhancing facial recognition systems. The three operational settings that will be the focus of the video collection effort include: 1) unidirectional crowd flow 2) bi-directional crowd flow, and 3) linear and/or serpentine queues.
Zhang, De-xin; An, Peng; Zhang, Hao-xiang
In this paper, we propose a video searching system that utilizes face recognition as searching indexing feature. As the applications of video cameras have great increase in recent years, face recognition makes a perfect fit for searching targeted individuals within the vast amount of video data. However, the performance of such searching depends on the quality of face images recorded in the video signals. Since the surveillance video cameras record videos without fixed postures for the object, face occlusion is very common in everyday video. The proposed system builds a model for occluded faces using fuzzy principal component analysis (FPCA), and reconstructs the human faces with the available information. Experimental results show that the system has very high efficiency in processing the real life videos, and it is very robust to various kinds of face occlusions. Hence it can relieve people reviewers from the front of the monitors and greatly enhances the efficiency as well. The proposed system has been installed and applied in various environments and has already demonstrated its power by helping solving real cases.
Full Text Available Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs do not include time sequence information (TSI within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM, embeds dynamic time warping (DTW into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.
Full Text Available Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51 validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.
Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.
Different biometric traits such as face appearance and heartbeat signal from Electrocardiogram (ECG)/Phonocardiogram (PCG) are widely used in the human identity recognition. Recent advances in facial video based measurement of cardio-physiological parameters such as heartbeat rate, respiratory rate......, and blood volume pressure provide the possibility of extracting heartbeat signal from facial video instead of using obtrusive ECG or PCG sensors in the body. This paper proposes the Heartbeat Signal from Facial Video (HSFV) as a new biometric trait for human identity recognition, for the first time...... to the best of our knowledge. Feature extraction from the HSFV is accomplished by employing Radon transform on a waterfall model of the replicated HSFV. The pairwise Minkowski distances are obtained from the Radon image as the features. The authentication is accomplished by a decision tree based supervised...
Full Text Available In Geographical Information Systems, geo-coding is used for the task of mapping from implicitly geo-referenced data to explicitly geo-referenced coordinates. At present, an enormous amount of implicitly geo-referenced information is hidden in unstructured text, e.g., Wikipedia, social data and news. Toponym recognition is the foundation of mining this useful geo-referenced information by identifying words as toponyms in text. In this paper, we propose an adapted toponym recognition approach based on deep belief network (DBN by exploring two key issues: word representation and model interpretation. A Skip-Gram model is used in the word representation process to represent words with contextual information that are ignored by current word representation models. We then determine the core hyper-parameters of the DBN model by illustrating the relationship between the performance and the hyper-parameters, e.g., vector dimensionality, DBN structures and probability thresholds. The experiments evaluate the performance of the Skip-Gram model implemented by the Word2Vec open-source tool, determine stable hyper-parameters and compare our approach with a conditional random field (CRF based approach. The experimental results show that the DBN model outperforms the CRF model with smaller corpus. When the corpus size is large enough, their statistical metrics become approaching. However, their recognition results express differences and complementarity on different kinds of toponyms. More importantly, combining their results can directly improve the performance of toponym recognition relative to their individual performances. It seems that the scale of the corpus has an obvious effect on the performance of toponym recognition. Generally, there is no adequate tagged corpus on specific toponym recognition tasks, especially in the era of Big Data. In conclusion, we believe that the DBN-based approach is a promising and powerful method to extract geo
Huo, Hongwen; Feng, Jufu
We present a novel online face recognition approach for video stream in this paper. Our method includes two stages: pre-training and online training. In the pre-training phase, our method observes interactions, collects batches of input data, and attempts to estimate their distributions (Box-Cox transformation is adopted here to normalize rough estimates). In the online training phase, our method incrementally improves classifiers' knowledge of the face space and updates it continuously with incremental eigenspace analysis. The performance achieved by our method shows its great potential in video stream processing.
Chernov, Timofey S.; Razumnuy, Nikita P.; Kozharinov, Alexander S.; Nikolaev, Dmitry P.; Arlazarov, Vladimir V.
Recognition and machine vision systems have long been widely used in many disciplines to automate various processes of life and industry. Input images of optical recognition systems can be subjected to a large number of different distortions, especially in uncontrolled or natural shooting conditions, which leads to unpredictable results of recognition systems, making it impossible to assess their reliability. For this reason, it is necessary to perform quality control of the input data of recognition systems, which is facilitated by modern progress in the field of image quality evaluation. In this paper, we investigate the approach to designing optical recognition systems with built-in input image quality estimation modules and feedback, for which the necessary definitions are introduced and a model for describing such systems is constructed. The efficiency of this approach is illustrated by the example of solving the problem of selecting the best frames for recognition in a video stream for a system with limited resources. Experimental results are presented for the system for identity documents recognition, showing a significant increase in the accuracy and speed of the system under simulated conditions of automatic camera focusing, leading to blurring of frames.
Full Text Available Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM. Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN, then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Su, Yu-Chuan; Chiu, Tzu-Hsuan; Yeh, Chun-Yen; Huang, Hsin-Fu; Hsu, Winston H.
Unconstrained video recognition and Deep Convolution Network (DCN) are two active topics in computer vision recently. In this work, we apply DCNs as frame-based recognizers for video recognition. Our preliminary studies, however, show that video corpora with complete ground truth are usually not large and diverse enough to learn a robust model. The networks trained directly on the video data set suffer from significant overfitting and have poor recognition rate on the test set. The same lack-...
Liu, Zengjian; Yang, Ming; Wang, Xiaolong; Chen, Qingcai; Tang, Buzhou; Wang, Zhe; Xu, Hua
Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer - generates representation of each word of a sentence; LSTM layer - outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer - makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge
Bowman, Elizabeth K.; Turek, Matt; Tunison, Paul; Porter, Reed; Thomas, Steve; Gintautas, Vadas; Shargo, Peter; Lin, Jessica; Li, Qingzhe; Gao, Yifeng; Li, Xiaosheng; Mittu, Ranjeev; Rosé, Carolyn Penstein; Maki, Keith; Bogart, Chris; Choudhari, Samrihdi Shree
Today's warfighters operate in a highly dynamic and uncertain world, and face many competing demands. Asymmetric warfare and the new focus on small, agile forces has altered the framework by which time critical information is digested and acted upon by decision makers. Finding and integrating decision-relevant information is increasingly difficult in data-dense environments. In this new information environment, agile data algorithms, machine learning software, and threat alert mechanisms must be developed to automatically create alerts and drive quick response. Yet these advanced technologies must be balanced with awareness of the underlying context to accurately interpret machine-processed indicators and warnings and recommendations. One promising approach to this challenge brings together information retrieval strategies from text, video, and imagery. In this paper, we describe a technology demonstration that represents two years of tri-service research seeking to meld text and video for enhanced content awareness. The demonstration used multisource data to find an intelligence solution to a problem using a common dataset. Three technology highlights from this effort include 1) Incorporation of external sources of context into imagery normalcy modeling and anomaly detection capabilities, 2) Automated discovery and monitoring of targeted users from social media text, regardless of language, and 3) The concurrent use of text and imagery to characterize behaviour using the concept of kinematic and text motifs to detect novel and anomalous patterns. Our demonstration provided a technology baseline for exploiting heterogeneous data sources to deliver timely and accurate synopses of data that contribute to a dynamic and comprehensive worldview.
Full Text Available Sparse coding is an emerging method that has been successfully applied to both robust object tracking and recognition in the vision literature. In this paper, we propose to explore a sparse coding-based approach toward joint object tracking-and-recognition and explore its potential in the analysis of forward-looking infrared (FLIR video to support nighttime machine vision systems. A key technical contribution of this work is to unify existing sparse coding-based approaches toward tracking and recognition under the same framework, so that they can benefit from each other in a closed-loop. On the one hand, tracking the same object through temporal frames allows us to achieve improved recognition performance through dynamical updating of template/dictionary and combining multiple recognition results; on the other hand, the recognition of individual objects facilitates the tracking of multiple objects (i.e., walking pedestrians, especially in the presence of occlusion within a crowded environment. We report experimental results on both the CASIAPedestrian Database and our own collected FLIR video database to demonstrate the effectiveness of the proposed joint tracking-and-recognition approach.
Whitman, Lucy S.; Lewis, Colin; Oakley, John P.
Atmospheric scattering causes significant degradation in the quality of video images, particularly when imaging over long distances. The principle problem is the reduction in contrast due to scattered light. It is known that when the scattering particles are not too large compared with the imaging wavelength (i.e. Mie scattering) then high spatial resolution information may be contained within a low-contrast image. Unfortunately this information is not easily perceived by a human observer, particularly when using a standard video monitor. A secondary problem is the difficulty of achieving a sharp focus since automatic focus techniques tend to fail in such conditions. Recently several commercial colour video processing systems have become available. These systems use various techniques to improve image quality in low contrast conditions whilst retaining colour content. These systems produce improvements in subjective image quality in some situations, particularly in conditions of haze and light fog. There is also some evidence that video enhancement leads to improved ATR performance when used as a pre-processing stage. Psychological literature indicates that low contrast levels generally lead to a reduction in the performance of human observers in carrying out simple visual tasks. The aim of this paper is to present the results of an empirical study on object recognition in adverse viewing conditions. The chosen visual task was vehicle number plate recognition at long ranges (500 m and beyond). Two different commercial video enhancement systems are evaluated using the same protocol. The results show an increase in effective range with some differences between the different enhancement systems.
Full Text Available In this paper, a novel approach for identifying normal and obscene videos is proposed. In order to classify different episodes of a video independently and discard the need to process all frames, first, key frames are extracted and skin regions are detected for groups of video frames starting with key frames. In the second step, three different features including 1- structural features based on single frame information, 2- features based on spatiotemporal volume and 3-motion-based features, are extracted for each episode of video. The PCA-LDA method is then applied to reduce the size of structural features and select more distinctive features. For the final step, we use fuzzy or a Weighted Support Vector Machine (WSVM classifier to identify video episodes. We also employ a multilayer Kohonen network as an initial clustering algorithm to increase the ability to discriminate between the extracted features into two classes of videos. Features based on motion and periodicity characteristics increase the efficiency of the proposed algorithm in videos with bad illumination and skin colour variation. The proposed method is evaluated using 1100 videos in different environmental and illumination conditions. The experimental results show a correct recognition rate of 94.2% for the proposed algorithm.
Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
Tanja S H Wingenbach
Full Text Available There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or 'extreme' examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates and response latencies for emotion recognition using short video stimuli (1sec of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral across three variations in the intensity of the emotional expression (low, intermediate, high in an adolescent and adult sample (N = 111; 51 male, 60 female aged between 16 and 45 (M = 22.2, SD = 5.7. Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations.
Singha, Joyeeta; Das, Karen
Sign Language Recognition has emerged as one of the important area of research in Computer Vision. The difficulty faced by the researchers is that the instances of signs vary with both motion and appearance. Thus, in this paper a novel approach for recognizing various alphabets of Indian Sign Language is proposed where continuous video sequences of the signs have been considered. The proposed system comprises of three stages: Preprocessing stage, Feature Extraction and Classification. Preprocessing stage includes skin filtering, histogram matching. Eigen values and Eigen Vectors were considered for feature extraction stage and finally Eigen value weighted Euclidean distance is used to recognize the sign. It deals with bare hands, thus allowing the user to interact with the system in natural way. We have considered 24 different alphabets in the video sequences and attained a success rate of 96.25%.
Chen, Jun; Xiao, Yang; Cao, Zhiguo; Fang, Zhiwen
Different video modal for human action recognition has becoming a highly promising trend in the video analysis. In this paper, we propose a method for human action recognition from RGB video to Depth video using domain adaptation, where we use learned feature from RGB videos to do action recognition for depth videos. More specifically, we make three steps for solving this problem in this paper. First, different from image, video is more complex as it has both spatial and temporal information, in order to better encode this information, dynamic image method is used to represent each RGB or Depth video to one image, based on this, most methods for extracting feature in image can be used in video. Secondly, as video can be represented as image, so standard CNN model can be used for training and testing for videos, beside, CNN model can be also used for feature extracting as its powerful feature expressing ability. Thirdly, as RGB videos and Depth videos are belong to two different domains, in order to make two different feature domains has more similarity, domain adaptation is firstly used for solving this problem between RGB and Depth video, based on this, the learned feature from RGB video model can be directly used for Depth video classification. We evaluate the proposed method on one complex RGB-D action dataset (NTU RGB-D), and our method can have more than 2% accuracy improvement using domain adaptation from RGB to Depth action recognition.
Yi, Chucai; Tian, Yingli
Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.
Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.
Riad I. Hammoud
Full Text Available We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA and multi-media indexing and explorer (MINER. VIVA utilizes analyst call-outs (ACOs in the form of chat messages (voice-to-text to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1 a fusion of graphical track and text data using probabilistic methods; (2 an activity pattern learning framework to support querying an index of activities of interest (AOIs and targets of interest (TOIs by movement type and geolocation; and (3 a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV. VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
Hammoud, Riad I; Sahin, Cem S; Blasch, Erik P; Rhodes, Bradley J; Wang, Tao
We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA) and multi-media indexing and explorer (MINER). VIVA utilizes analyst call-outs (ACOs) in the form of chat messages (voice-to-text) to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1) a fusion of graphical track and text data using probabilistic methods; (2) an activity pattern learning framework to support querying an index of activities of interest (AOIs) and targets of interest (TOIs) by movement type and geolocation; and (3) a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV). VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
Full Text Available Abstract We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Full Text Available We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Dhakal, Shanti; Rahnemoonfar, Maryam
Measuring water quality of bays, estuaries, and gulfs is a complicated and time-consuming process. YSI Sonde is an instrument used to measure water quality parameters such as pH, temperature, salinity, and dissolved oxygen. This instrument is taken to water bodies in a boat trip and researchers note down different parameters displayed by the instrument's display monitor. In this project, a mobile application is developed for Android platform that allows a user to take a picture of the YSI Sonde monitor, extract text from the image and store it in a file on the phone. The image captured by the application is first processed to remove perspective distortion. Probabilistic Hough line transform is used to identify lines in the image and the corner of the image is then obtained by determining the intersection of the detected horizontal and vertical lines. The image is warped using the perspective transformation matrix, obtained from the corner points of the source image and the destination image, hence, removing the perspective distortion. Mathematical morphology operation, black-hat is used to correct the shading of the image. The image is binarized using Otsu's binarization technique and is then passed to the Optical Character Recognition (OCR) software for character recognition. The extracted information is stored in a file on the phone and can be retrieved later for analysis. The algorithm was tested on 60 different images of YSI Sonde with different perspective features and shading. Experimental results, in comparison to ground-truth results, demonstrate the effectiveness of the proposed method.
Suddendorf, Thomas; Simcock, Gabrielle; Nielsen, Mark
Three experiments (N = 123) investigated the development of live-video self-recognition using the traditional mark test. In Experiment 1, 24-, 30- and 36-month-old children saw a live video image of equal size and orientation as a control group saw in a mirror. The video version of the test was more difficult than the mirror version with only the…
Zhang, Yajun; Liu, Zongtian; Zhou, Wen
Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM). Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN), then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Full Text Available Gait is a unique perceptible biometric feature at larger distances, and the gait representation approach plays a key role in a video sensor-based gait recognition system. Class Energy Image is one of the most important gait representation methods based on appearance, which has received lots of attentions. In this paper, we reviewed the expressions and meanings of various Class Energy Image approaches, and analyzed the information in the Class Energy Images. Furthermore, the effectiveness and robustness of these approaches were compared on the benchmark gait databases. We outlined the research challenges and provided promising future directions for the field. To the best of our knowledge, this is the first review that focuses on Class Energy Image. It can provide a useful reference in the literature of video sensor-based gait representation approach.
Mitchell, D; Morrow, Philip J; Nugent, Chris D
Activity recognition is used in a wide range of applications including healthcare and security. In a smart environment activity recognition can be used to monitor and support the activities of a user. There have been a range of methods used in activity recognition including sensor-based approaches, vision-based approaches and ontological approaches. This paper presents a novel approach to activity recognition in a smart home environment which combines sensor and video data through an ontological framework. The ontology describes the relationships and interactions between activities, the user, objects, sensors and video data.
Wang, Chao; Wang, Yunhong; Zhang, Zhaoxiang
This paper addresses the problem of tracking and recognizing faces via incremental local sparse representation. First a robust face tracking algorithm is proposed via employing local sparse appearance and covariance pooling method. In the following face recognition stage, with the employment of a novel template update strategy, which combines incremental subspace learning, our recognition algorithm adapts the template to appearance changes and reduces the influence of occlusion and illumination variation. This leads to a robust video-based face tracking and recognition with desirable performance. In the experiments, we test the quality of face recognition in real-world noisy videos on YouTube database, which includes 47 celebrities. Our proposed method produces a high face recognition rate at 95% of all videos. The proposed face tracking and recognition algorithms are also tested on a set of noisy videos under heavy occlusion and illumination variation. The tracking results on challenging benchmark videos demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods. In the case of the challenging dataset in which faces undergo occlusion and illumination variation, and tracking and recognition experiments under significant pose variation on the University of California, San Diego (Honda/UCSD) database, our proposed method also consistently demonstrates a high recognition rate.
Yadav, Aman; Phillips, Michael M.; Lundeberg, Mary A.; Koehler, Matthew J.; Hilden, Katherine; Dirkin, Kathryn H.
In this investigation we assessed whether different formats of media (video, text, and video + text) influenced participants' engagement, cognitive processing and recall of non-fiction cases of people diagnosed with HIV/AIDS. For each of the cases used in the study, we designed three informationally-equivalent versions: video, text, and video +…
Madaan, Nishtha; Mehta, Sameep; Saxena, Mayank; Aggarwal, Aditi; Agrawaal, Taneea S; Malhotra, Vrinda
In past few years, several data-sets have been released for text and images. We present an approach to create the data-set for use in detecting and removing gender bias from text. We also include a set of challenges we have faced while creating this corpora. In this work, we have worked with movie data from Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus contains 4000 movies extracted from Wikipedia and 880 trailers extracted from YouTube which were released from 1...
Kavallakis, George; Vidakis, Nikolaos; Triantafyllidis, Georgios
This paper presents a scheme of creating an emotion index of cover song music video clips by recognizing and classifying facial expressions of the artist in the video. More specifically, it fuses effective and robust algorithms which are employed for expression recognition, along with the use...... of a neural network system using the features extracted by the SIFT algorithm. Also we support the need of this fusion of different expression recognition algorithms, because of the way that emotions are linked to facial expressions in music video clips....
Full Text Available Both static features and motion features have shown promising performance in human activities recognition task. However, the information included in these features is insufficient for complex human activities. In this paper, we propose extracting relational information of static features and motion features for human activities recognition. The videos are represented by a classical Bag-of-Word (BoW model which is useful in many works. To get a compact and discriminative codebook with small dimension, we employ the divisive algorithm based on KL-divergence to reconstruct the codebook. After that, to further capture strong relational information, we construct a bipartite graph to model the relationship between words of different feature set. Then we use a k-way partition to create a new codebook in which similar words are getting together. With this new codebook, videos can be represented by a new BoW vector with strong relational information. Moreover, we propose a method to compute new clusters from the divisive algorithm’s projective function. We test our work on the several datasets and obtain very promising results.
Lin, Weiyao; Sun, Ming-Ting; Poovendran, Radha; Zhang, Zhengyou
This paper presents a novel approach for automatic recognition of human activities for video surveillance applications. We propose to represent an activity by a combination of category components, and demonstrate that this approach offers flexibility to add new activities to the system and an ability to deal with the problem of building models for activities lacking training data. For improving the recognition accuracy, a Confident-Frame- based Recognition algorithm is also proposed, where th...
Diaz, Ruth L; Wong, Ulric; Hodgins, David C; Chiu, Carina G; Goghari, Vina M
Violent video game playing has been associated with both positive and negative effects on cognition. We examined whether playing two or more hours of violent video games a day, compared to not playing video games, was associated with a different pattern of recognition of five facial emotions, while controlling for general perceptual and cognitive differences that might also occur. Undergraduate students were categorized as violent video game players (n = 83) or non-gamers (n = 69) and completed a facial recognition task, consisting of an emotion recognition condition and a control condition of gender recognition. Additionally, participants completed questionnaires assessing their video game and media consumption, aggression, and mood. Violent video game players recognized fearful faces both more accurately and quickly and disgusted faces less accurately than non-gamers. Desensitization to violence, constant exposure to fear and anxiety during game playing, and the habituation to unpleasant stimuli, are possible mechanisms that could explain these results. Future research should evaluate the effects of violent video game playing on emotion processing and social cognition more broadly. © 2015 Wiley Periodicals, Inc.
Kinnunen, Tomi; Sahidullah, Md; Kukanov, Ivan
Text-dependent automatic speaker verification naturally calls for the simultaneous verification of speaker identity and spoken content. These two tasks can be achieved with automatic speaker verification (ASV) and utterance verification (UV) technologies. While both have been addressed previously...
Full Text Available An automatic recognition framework for human facial expressions from a monocular video with an uncalibrated camera is proposed. The expression characteristics are first acquired from a kind of deformable template, similar to a facial muscle distribution. After associated regularization, the time sequences from the trait changes in space-time under complete expressional production are then arranged line by line in a matrix. Next, the matrix dimensionality is reduced by a method of manifold learning of neighborhood-preserving embedding. Finally, the refined matrix containing the expression trait information is recognized by a classifier that integrates the hidden conditional random field (HCRF and support vector machine (SVM. In an experiment using the Cohn–Kanade database, the proposed method showed a comparatively higher recognition rate than the individual HCRF or SVM methods in direct recognition from two-dimensional human face traits. Moreover, the proposed method was shown to be more robust than the typical Kotsia method because the former contains more structural characteristics of the data to be classified in space-time
Jai Prakash Verma; Smita Agrawal; Bankim Patel; Atul Patel
All types of machine automated systems are generating large amount of data in different forms like statistical, text, audio, video, sensor, and bio-metric data that emerges the term Big Data. In this paper we are discussing issues, challenges, and application of these types of Big Data with the consideration of big data dimensions. Here we are discussing social media data analytics, content based analytics, text data analytics, audio, and video data analytics their issues and expected applica...
Hawkins, Spencer D; Barilla, Steven; Williford, Phillip Williford M; Feldman, Steven R; Pearce, Daniel J
We developed dermatology patient education videos and a post-operative text message service that could be accessed universally via web based applications. A secondary outcome of the study was to assess patient opinions of text-messages, email, and video in the health care setting which is reported here. An investigator-blinded, randomized, controlled intervention was evaluated in 90 nonmelanoma MMS patients at Wake Forest Baptist Dermatology. Patients were randomized 1:1:1:1 for exposure to: 1) videos with text messages, 2) videos only, 3) text messages-only, or 4) standard of care. Assessment measures were obtained by the use of REDCap survey questions during the follow up visit. 1) 67% would like to receive an email with information about the procedure beforehand 2) 98% of patients reported they would like other doctors to use educational videos as a form of patient education 3) 88% of our patients think it is appropriate for physicians to communicate to patients via text message in certain situations. Nearly all patients desired physicians to use text-messages and video in their practice and the majority of patients preferred to receive an email with information about their procedure beforehand.
Full Text Available Recent advancements in depth video sensors technologies have made human activity recognition (HAR realizable for elderly monitoring applications. Although conventional HAR utilizes RGB video sensors, HAR could be greatly improved with depth video sensors which produce depth or distance information. In this paper, a depth-based life logging HAR system is designed to recognize the daily activities of elderly people and turn these environments into an intelligent living space. Initially, a depth imaging sensor is used to capture depth silhouettes. Based on these silhouettes, human skeletons with joint information are produced which are further used for activity recognition and generating their life logs. The life-logging system is divided into two processes. Firstly, the training system includes data collection using a depth camera, feature extraction and training for each activity via Hidden Markov Models. Secondly, after training, the recognition engine starts to recognize the learned activities and produces life logs. The system was evaluated using life logging features against principal component and independent component features and achieved satisfactory recognition rates against the conventional approaches. Experiments conducted on the smart indoor activity datasets and the MSRDailyActivity3D dataset show promising results. The proposed system is directly applicable to any elderly monitoring system, such as monitoring healthcare problems for elderly people, or examining the indoor activities of people at home, office or hospital.
Full Text Available In the 21st century, our life is strongly affected by the information technology. Educational technology has been rapidly improved by the development of audiovisual tools. Teachers may choose a number of different types of resources for teaching purposes, including videos and movies. Therefore, this study is aimed at evaluating animated narrative videos from YouTube for the teaching narrative text and identifying potential factors which influence the quality of educational videos. The videos were examined by using assessment rubric to see the quality and suitability of animated narrative videos which might be used in the teaching narrative text. The rubric was adapted from Prince Edward Island (PEI Department of Education: Evaluation and Selection of Learning Resources. It consists of four criteria, content, structure, instructional design, and technical design In addition, the study presents critical awareness of how these aspects can be interpreted to measure animated narrative videos and at the same time the engagement of the teachers in exploring animated narrative videos used in classroom.
Nasrollahi, Kamal; Moeslund, Thomas B.
Face recognition systems are very sensitive to the quality and resolution of their input face images. This makes such systems unreliable when working with long surveillance video sequences without employing some selection and enhancement algorithms. On the other hand, processing all the frames...... of such video sequences by any enhancement or even face recognition algorithm is demanding. Thus, there is a need for a mechanism to summarize the input video sequence to a set of key-frames and then applying an enhancement algorithm to this subset. This paper presents a system doing exactly this. The system...... uses face quality assessment to select the key-frames and a hybrid super-resolution to enhance the face image quality. The suggested system that employs a linear associator face recognizer to evaluate the enhanced results has been tested on real surveillance video sequences and the experimental results...
Full Text Available Steen Vigh Buch,1 Frederik Philip Treschow,2 Jesper Brink Svendsen,3 Bjarne Skjødt Worm4 1Department of Vascular Surgery, Rigshospitalet, Copenhagen, Denmark; 2Department of Anesthesia and Intensive Care, Herlev Hospital, Copenhagen, Denmark; 3Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; 4Department of Anesthesia and Intensive Care, Bispebjerg Hospital, Copenhagen, Denmark Background and aims: This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Materials and methods: Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. Results: The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (P<0.001 and in the follow-up test (P<0.01. Regarding theoretical knowledge, no differences were found between the groups on the primary test, though the video group performed better on the follow-up test (P=0.04. Conclusion: Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills. Keywords: e-learning, video versus text, medicine, clinical skills
Morillot, Olivier; Likforman-Sulem, Laurence; Grosicki, Emmanuèle
Many preprocessing techniques have been proposed for isolated word recognition. However, recently, recognition systems have dealt with text blocks and their compound text lines. In this paper, we propose a new preprocessing approach to efficiently correct baseline skew and fluctuations. Our approach is based on a sliding window within which the vertical position of the baseline is estimated. Segmentation of text lines into subparts is, thus, avoided. Experiments conducted on a large publicly available database (Rimes), with a BLSTM (bidirectional long short-term memory) recurrent neural network recognition system, show that our baseline correction approach highly improves performance.
Buch, Steen Vigh; Treschow, Frederik Philip; Svendsen, Jesper Brink; Worm, Bjarne Skjødt
This study investigated the effectiveness of two different levels of e-learning when teaching clinical skills to medical students. Sixty medical students were included and randomized into two comparable groups. The groups were given either a video- or text/picture-based e-learning module and subsequently underwent both theoretical and practical examination. A follow-up test was performed 1 month later. The students in the video group performed better than the illustrated text-based group in the practical examination, both in the primary test (Pvideo group performed better on the follow-up test (P=0.04). Video-based e-learning is superior to illustrated text-based e-learning when teaching certain practical clinical skills.
frames per- forms well for urban soundscapes but not for polyphonic music. In place of GMM, Lu et al.  adopted spectral clustering to generate...Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficientmodel for urban soundscapes but not
Full Text Available The task of human hand trajectory tracking and gesture trajectory recognition based on synchronized color and depth video is considered. Toward this end, in the facet of hand tracking, a joint observation model with the hand cues of skin saliency, motion and depth is integrated into particle filter in order to move particles to local peak in the likelihood. The proposed hand tracking method, namely, salient skin, motion, and depth based particle filter (SSMD-PF, is capable of improving the tracking accuracy considerably, in the context of the signer performing the gesture toward the camera device and in front of moving, cluttered backgrounds. In the facet of gesture recognition, a shape-order context descriptor on the basis of shape context is introduced, which can describe the gesture in spatiotemporal domain. The efficient shape-order context descriptor can reveal the shape relationship and embed gesture sequence order information into descriptor. Moreover, the shape-order context leads to a robust score for gesture invariant. Our approach is complemented with experimental results on the settings of the challenging hand-signed digits datasets and American sign language dataset, which corroborate the performance of the novel techniques.
Dissanayake, Cheryl; Shembrey, Joh; Suddendorf, Thomas
Two studies are reported which investigate delayed video self-recognition (DSR) in children with autistic disorder and Asperger's disorder relative to one another and to their typically developing peers. A secondary aim was to establish whether DSR ability is dependent on metarepresentational ability. Children's verbal and affective responses to…
Erlandson, Erik J.; Trenkle, John M.; Vogt, Robert C., III
Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition system for machine-printed Arabic text has been implemented. Arabic is a script language, and is therefore difficult to segment at the character level. Character segmentation has been avoided by recognizing text imagery of complete words. The Arabic recognition system computes a vector of image-morphological features on a query word image. This vector is matched against a precomputed database of vectors from a lexicon of Arabic words. Vectors from the database with the highest match score are returned as hypotheses for the unknown image. Several feature vectors may be stored for each word in the database. Database feature vectors generated using multiple fonts and noise models allow the system to be tuned to its input stream. Used in conjunction with database pruning techniques, this Arabic recognition system has obtained promising word recognition rates on low-quality multifont text imagery.
All PROVE-IT(FRiV) project reports are listed below. 1. E. Granger, P. Radtke , and D. Gorodnichy, “Survey of academic research and prototypes for face...recognition in video”, Border Technology Division, Division Report 2014-25 (TR). 2. D. Gorodnichy, E.Granger, and P. Radtke , “Survey of commercial...Gorodnichy, E. Choy, W. Khreich, P. Radtke , J. Bergeron, and D. Bissessar, “Results from evaluation of three commercial off-the-shelf face
Tanja S. H. Wingenbach
Full Text Available According to embodied cognition accounts, viewing others’ facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others’ facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a explicit imitation of viewed facial emotional expressions (stimulus-congruent condition, (b pen-holding with the lips (stimulus-incongruent condition, and (c passive viewing (control condition. It was hypothesised that (1 experimental condition (a and (b result in greater facial muscle activity than (c, (2 experimental condition (a increases emotion recognition accuracy from others’ faces compared to (c, (3 experimental condition (b lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c. Participants (42 males, 42 females underwent a facial emotion recognition experiment (ADFES-BIV while electromyography (EMG was recorded from five facial muscle sites. The experimental conditions’ order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed.
Freed, Erin; Long, Debra; Rodriguez, Tonantzin; Franks, Peter; Kravitz, Richard L; Jerant, Anthony
To compare the effects of two health information texts on patient recognition memory, a key aspect of comprehension. Randomized controlled trial (N=60), comparing the effects of experimental and control colorectal cancer (CRC) screening texts on recognition memory, measured using a statement recognition test, accounting for response bias (score range -0.91 to 5.34). The experimental text had a lower Flesch-Kincaid reading grade level (7.4 versus 9.6), was more focused on addressing screening barriers, and employed more comparative tables than the control text. Recognition memory was higher in the experimental group (2.54 versus 1.09, t=-3.63, P=0.001), including after adjustment for age, education, and health literacy (β=0.42, 95% CI: 0.17, 0.68, P=0.001), and in analyses limited to persons with college degrees (β=0.52, 95% CI: 0.18, 0.86, P=0.004) or no self-reported health literacy problems (β=0.39, 95% CI: 0.07, 0.71, P=0.02). An experimental CRC screening text improved recognition memory, including among patients with high education and self-assessed health literacy. CRC screening texts comparable to our experimental text may be warranted for all screening-eligible patients, if such texts improve screening uptake. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Abdous, M'hammed; He, Wu
Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Wingenbach, Tanja S H; Ashwin, Chris; Brosnan, Mark
There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or 'extreme' examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations.
There has been much research on sex differences in the ability to recognise facial expressions of emotions, with results generally showing a female advantage in reading emotional expressions from the face. However, most of the research to date has used static images and/or ‘extreme’ examples of facial expressions. Therefore, little is known about how expression intensity and dynamic stimuli might affect the commonly reported female advantage in facial emotion recognition. The current study investigated sex differences in accuracy of response (Hu; unbiased hit rates) and response latencies for emotion recognition using short video stimuli (1sec) of 10 different facial emotion expressions (anger, disgust, fear, sadness, surprise, happiness, contempt, pride, embarrassment, neutral) across three variations in the intensity of the emotional expression (low, intermediate, high) in an adolescent and adult sample (N = 111; 51 male, 60 female) aged between 16 and 45 (M = 22.2, SD = 5.7). Overall, females showed more accurate facial emotion recognition compared to males and were faster in correctly recognising facial emotions. The female advantage in reading expressions from the faces of others was unaffected by expression intensity levels and emotion categories used in the study. The effects were specific to recognition of emotions, as males and females did not differ in the recognition of neutral faces. Together, the results showed a robust sex difference favouring females in facial emotion recognition using video stimuli of a wide range of emotions and expression intensity variations. PMID:29293674
BjÃƒÂ¶rn Olof Hedin
Full Text Available This study examines how media files sent to mobile phones can be used to improve education at universities, and describes a prototype implement of such a system using standard components. To accomplish this, university students were equipped with mobile phones and software that allowed teachers to send text-based, audio-based and video-based messages to the students. Data was collected using questionnaires, focus groups and log files. The conclusions were that students preferred to have information and learning content sent as text, rather than audio or video. Text messages sent to phones should be no longer than 2000 characters. The most appreciated services were notifications of changes in course schedules, short lecture introductions and reminders. The prototype showed that this functionality is easy to implement using standard components.
Full Text Available The paper discusses Optical Character Recognition (OCR of historical texts of the 18th–20th century in the Romanian language using the Cyrillic script. We differ three epochs (approximately, the 18th, 19th, and 20th centuries, with different usage of the Cyrillic alphabet in Romanian and, correspondingly, different approach to OCR. We developed historical alphabets and sets of glyphs recognition templates specific for each epoch. The dictionaries in proper alphabets and orthographies were also created. In addition, virtual keyboards, fonts, transliteration utilities, etc. were developed. The resulting technology and toolset permit successful recognition of historical Romanian texts in the Cyrillic script. After transliteration to the modern Latin script we obtain no-barrier access to historical documents.
Callejas-Cuervo, Mauro; Martínez-Tejada, Laura Alejandra; Alarcón-Aldana, Andrea Catherine
Abstract Emotion recognition systems from physiological signals are innovative techniques that allow studying the behavior and reaction of an individual when exposed to information that may evoke emotional reactions through multimedia tools, for example, video games. This type of approach is used to identify the behavior of an individual in different fields, such as medicine, education, psychology, etc., in order to assess the effect that the content has on the individual that is interacting ...
Moghadam, Saeed Montazeri; Seyyedsalehi, Seyyed Ali
Nonlinear components extracted from deep structures of bottleneck neural networks exhibit a great ability to express input space in a low-dimensional manifold. Sharing and combining the components boost the capability of the neural networks to synthesize and interpolate new and imaginary data. This synthesis is possibly a simple model of imaginations in human brain where the components are expressed in a nonlinear low dimensional manifold. The current paper introduces a novel Dynamic Deep Bottleneck Neural Network to analyze and extract three main features of videos regarding the expression of emotions on the face. These main features are identity, emotion and expression intensity that are laid in three different sub-manifolds of one nonlinear general manifold. The proposed model enjoying the advantages of recurrent networks was used to analyze the sequence and dynamics of information in videos. It is noteworthy to mention that this model also has also the potential to synthesize new videos showing variations of one specific emotion on the face of unknown subjects. Experiments on discrimination and recognition ability of extracted components showed that the proposed model has an average of 97.77% accuracy in recognition of six prominent emotions (Fear, Surprise, Sadness, Anger, Disgust, and Happiness), and 78.17% accuracy in the recognition of intensity. The produced videos revealed variations from neutral to the apex of an emotion on the face of the unfamiliar test subject which is on average 0.8 similar to reference videos in the scale of the SSIM method. Copyright © 2018 Elsevier Ltd. All rights reserved.
Bu, Jiang; Lao, Song-Yan; Bai, Liang
Nowadays, different applications like automatic video indexing, keyword based video search and TV commercials can be developed by detecting and recognizing the billboard trademark. We propose a hierarchical solution for real-time billboard trademark recognition in various sports video, billboard frames are detected in the first level, fuzzy decision tree with easily-computing features are employed to accelerate the process, while in the second level, color and regional SIFT features are combined for the first time to describe the appearance of trademarks, and the shared nearest neighbor (SNN) clustering with x2 distance is utilized instead of traditional K-means clustering to construct the SIFT vocabulary, at last, Latent Semantic Analysis (LSA) based SIFT vocabulary matching is performed on the template trademark and the candidate regions in billboard frame. The preliminary experiments demonstrate the effectiveness of the hierarchical solution, and real time constraints are also met by our solution.
Yew, Chuu Tian; Suandi, Shahrel Azmin
In this paper, we present an applied learning-based color tone mapping technique for video surveillance system. This technique can be applied onto both color and grayscale surveillance images. The basic idea is to learn the color or intensity statistics from a training dataset of photorealistic images of the candidates appeared in the surveillance images, and remap the color or intensity of the input image so that the color or intensity statistics match those in the training dataset. It is well known that the difference in commercial surveillance cameras models, and signal processing chipsets used by different manufacturers will cause the color and intensity of the images to differ from one another, thus creating additional challenges for face recognition in video surveillance system. Using Multi-Class Support Vector Machines as the classifier on a publicly available video surveillance camera database, namely SCface database, this approach is validated and compared to the results of using holistic approach on grayscale images. The results show that this technique is suitable to improve the color or intensity quality of video surveillance system for face recognition.
Sriman, Bowornrat; Schomaker, Lambertus; De Marsico, Maria; Figueiredo, Mário; Fred, Ana
Natural urban scene images contain many problems for character recognition such as luminance noise, varying font styles or cluttered backgrounds. Detecting and recognizing text in a natural scene is a difficult problem. Several techniques have been proposed to overcome these problems. These are,
Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Ozarslan, Suleyman; Eren, P. Erhan
Participatory sensing is an approach which allows mobile devices such as mobile phones to be used for data collection, analysis and sharing processes by individuals. Data collection is the first and most important part of a participatory sensing system, but it is time consuming for the participants. In this paper, we discuss automatic data collection approaches for reducing the time required for collection, and increasing the amount of collected data. In this context, we explore automated text recognition on images of store receipts which are captured by mobile phone cameras, and the correction of the recognized text. Accordingly, our first goal is to evaluate the performance of the Optical Character Recognition (OCR) method with respect to data collection from store receipt images. Images captured by mobile phones exhibit some typical problems, and common image processing methods cannot handle some of them. Consequently, the second goal is to address these types of problems through our proposed Knowledge Based Correction (KBC) method used in support of the OCR, and also to evaluate the KBC method with respect to the improvement on the accurate recognition rate. Results of the experiments show that the KBC method improves the accurate data recognition rate noticeably.
Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte
OBJECTIVES: The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students......' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. METHODS: The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed...
Hori, Takayuki; Ohya, Jun; Kurumisawa, Jun
This paper proposes a Tensor Decomposition Based method that can recognize an unknown person's action from a video sequence, where the unknown person is not included in the database (tensor) used for the recognition. The tensor consists of persons, actions and time-series image features. For the observed unknown person's action, one of the actions stored in the tensor is assumed. Using the motion signature obtained from the assumption, the unknown person's actions are synthesized. The actions of one of the persons in the tensor are replaced by the synthesized actions. Then, the core tensor for the replaced tensor is computed. This process is repeated for the actions and persons. For each iteration, the difference between the replaced and original core tensors is computed. The assumption that gives the minimal difference is the action recognition result. For the time-series image features to be stored in the tensor and to be extracted from the observed video sequence, the human body silhouette's contour shape based feature is used. To show the validity of our proposed method, our proposed method is experimentally compared with Nearest Neighbor rule and Principal Component analysis based method. Experiments using 33 persons' seven kinds of action show that our proposed method achieves better recognition accuracies for the seven actions than the other methods.
Nona Heydari Esfahani
Full Text Available In this paper, robust text-independent speaker recognition is taken into consideration. The proposed method performs on manual silence-removed utterances that are segmented into smaller speech units containing few phones and at least one vowel. The segments are basic units for long-term feature extraction. Sub-band entropy is directly extracted in each segment. A robust vowel detection method is then applied on each segment to separate a high energy vowel that is used as unit for pitch frequency and formant extraction. By applying a clustering technique, extracted short-term features namely MFCC coefficients are combined with long term features. Experiments using MLP classifier show that the average speaker accuracy recognition rate is 97.33% for clean speech and 61.33% in noisy environment for -2db SNR, that shows improvement compared to other conventional methods.
Kaewphan, Suwisa; Van Landeghem, Sofie; Ohta, Tomoko; Van de Peer, Yves; Ginter, Filip; Pyysalo, Sampo
Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: firstname.lastname@example.org PMID:26428294
Holte, Michael Boelstoft; Tran, Cuong; Trivedi, Mohan
approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi......–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent......-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more...
Feng, Jiao; Fang, Xiaojing; Li, Shoufeng; Wang, Yongjin
We present a method for distinguishing human face from high-emulation mask, which is increasingly used by criminals for activities such as stealing card numbers and passwords on ATM. Traditional facial recognition technique is difficult to detect such camouflaged criminals. In this paper, we use the high-resolution hyperspectral video capture system to detect high-emulation mask. A RGB camera is used for traditional facial recognition. A prism and a gray scale camera are used to capture spectral information of the observed face. Experiments show that mask made of silica gel has different spectral reflectance compared with the human skin. As multispectral image offers additional spectral information about physical characteristics, high-emulation mask can be easily recognized.
McIntosh, Lindsey G; Park, Sohee
Social impairment is a core feature of schizophrenia, present from the pre-morbid stage and predictive of outcome, but the etiology of this deficit remains poorly understood. Successful and adaptive social interactions depend on one's ability to make rapid and accurate judgments about others in real time. Our surprising ability to form accurate first impressions from brief exposures, known as "thin slices" of behavior has been studied very extensively in healthy participants. We sought to examine affect and social trait judgment from thin slices of static or video stimuli in order to investigate the ability of schizophrenic individuals to form reliable social impressions of others. 21 individuals with schizophrenia (SZ) and 20 matched healthy participants (HC) were asked to identify emotions and social traits for actors in standardized face stimuli as well as brief video clips. Sound was removed from videos to remove all verbal cues. Clinical symptoms in SZ and delusional ideation in both groups were measured. Results showed a general impairment in affect recognition for both types of stimuli in SZ. However, the two groups did not differ in the judgments of trustworthiness, approachability, attractiveness, and intelligence. Interestingly, in SZ, the severity of positive symptoms was correlated with higher ratings of attractiveness, trustworthiness, and approachability. Finally, increased delusional ideation in SZ was associated with a tendency to rate others as more trustworthy, while the opposite was true for HC. These findings suggest that complex social judgments in SZ are affected by symptomatology. Copyright © 2014 Elsevier B.V. All rights reserved.
Full Text Available Background: Attention of national and foreign researchers was focused so far on structural and semantic features of syntactic idioms. Automatic analysis of these peculiar units that are on the verge of syntax and phraseology still was not carried out in the scientific literature. This issue requires a theoretical understanding and practical implementation. Purpose: To create an algorithm of recognition of syntactic idioms with one- or two-term core component in the corpus of texts. Results: Based on the results of previous theoretical studies we highlighted a number of formal and statistical criteria that enable to distinguish syntactic idioms from other language units in the corpus of Ukrainian-language texts. The author developed a block diagram of syntactic idioms recognition, incorporating two branches constructed accordingly for the sentences with one-term and sentences with two-term core component. The first branch is based on the presence of word repeats (full words concurrence or presence of other word forms of the word and the list of core components determined on previous stages of the study (є, це, то, не, так; як; з/із/зі, між, над, серед; а, але, зате, однак, проте. The second branch was created for another type of syntactic idioms – one with a two-term core component. It takes into account the following properties of the analyzed units: the presence of combinations of service parts of speech, service parts of speech with pronoun or adverb, pronoun and adverb; compliance of words combinations with the register of the syntactic idioms core components currently comprising 92 structures; association measure of mutual information ≥9, etc. Discussion: Offered algorithm enables automatic identification of syntactic idioms in the corpus of texts and removal of contexts of their use, it can be used to improve the procedure of automatic text processing and creation of automated translation
El Moubtahij Hicham
Full Text Available This paper presents an analytical approach of an offline handwritten Arabic text recognition system. It is based on the Hidden Markov Models (HMM Toolkit (HTK without explicit segmentation. The first phase is preprocessing, where the data is introduced in the system after quality enhancements. Then, a set of characteristics (features of local densities and features statistics are extracted by using the technique of sliding windows. Subsequently, the resulting feature vectors are injected to the Hidden Markov Model Toolkit (HTK. The simple database âArabic-Numbersâ and IFN/ENIT are used to evaluate the performance of this system. Keywords: Hidden Markov Models (HMM Toolkit (HTK, Sliding windows
Dat Tien Nguyen
Full Text Available Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT, speed-up robust feature (SURF, local binary patterns (LBP, histogram of oriented gradients (HOG, and weighted HOG. Recently, the convolutional neural network (CNN method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Liu, Xiaofeng; Ge, Yubin; Yang, Chao; Jia, Ping
Video-based facial expression recognition has become increasingly important for plenty of applications in the real world. Despite that numerous efforts have been made for the single sequence, how to balance the complex distribution of intra- and interclass variations well between sequences has remained a great difficulty in this area. We propose the adaptive (N+M)-tuplet clusters loss function and optimize it with the softmax loss simultaneously in the training phrase. The variations introduced by personal attributes are alleviated using the similarity measurements of multiple samples in the feature space with many fewer comparison times as conventional deep metric learning approaches, which enables the metric calculations for large data applications (e.g., videos). Both the spatial and temporal relations are well explored by a unified framework that consists of an Inception-ResNet network with long short term memory and the two fully connected layer branches structure. Our proposed method has been evaluated with three well-known databases, and the experimental results show that our method outperforms many state-of-the-art approaches.
Kortelainen, Jukka; Seppänen, Tapio
Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Tyner, Bryan C.; Fienup, Daniel M.
Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance.…
Jalal, Ahmad; Kamal, Shaharyar; Kim, Daijin
Recent advancements in depth video sensors technologies have made human activity recognition (HAR) realizable for elderly monitoring applications. Although conventional HAR utilizes RGB video sensors, HAR could be greatly improved with depth video sensors which produce depth or distance information. In this paper, a depth-based life logging HAR system is designed to recognize the daily activities of elderly people and turn these environments into an intelligent living space. Initially, a depth imaging sensor is used to capture depth silhouettes. Based on these silhouettes, human skeletons with joint information are produced which are further used for activity recognition and generating their life logs. The life-logging system is divided into two processes. Firstly, the training system includes data collection using a depth camera, feature extraction and training for each activity via Hidden Markov Models. Secondly, after training, the recognition engine starts to recognize the learned activities and produces life logs. The system was evaluated using life logging features against principal component and independent component features and achieved satisfactory recognition rates against the conventional approaches. Experiments conducted on the smart indoor activity datasets and the MSRDailyActivity3D dataset show promising results. The proposed system is directly applicable to any elderly monitoring system, such as monitoring healthcare problems for elderly people, or examining the indoor activities of people at home, office or hospital.
Huang, Y-M.; Liu, C-J.; Shadiev, Rustam; Shen, M-H.; Hwang, W-Y.
One major drawback of previous research on speech-to-text recognition (STR) is that most findings showing the effectiveness of STR for learning were based upon subjective evidence. Very few studies have used eye-tracking techniques to investigate visual attention of students on STR-generated text. Furthermore, not much attention was paid to…
Wingenbach, Tanja S H; Brosnan, Mark; Pfaltz, Monique C; Plichta, Michael M; Ashwin, Chris
According to embodied cognition accounts, viewing others' facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others' facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a) explicit imitation of viewed facial emotional expressions (stimulus-congruent condition), (b) pen-holding with the lips (stimulus-incongruent condition), and (c) passive viewing (control condition). It was hypothesised that (1) experimental condition (a) and (b) result in greater facial muscle activity than (c), (2) experimental condition (a) increases emotion recognition accuracy from others' faces compared to (c), (3) experimental condition (b) lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c). Participants (42 males, 42 females) underwent a facial emotion recognition experiment (ADFES-BIV) while electromyography (EMG) was recorded from five facial muscle sites. The experimental conditions' order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed.
Wingenbach, Tanja S. H.; Brosnan, Mark; Pfaltz, Monique C.; Plichta, Michael M.; Ashwin, Chris
According to embodied cognition accounts, viewing others’ facial emotion can elicit the respective emotion representation in observers which entails simulations of sensory, motor, and contextual experiences. In line with that, published research found viewing others’ facial emotion to elicit automatic matched facial muscle activation, which was further found to facilitate emotion recognition. Perhaps making congruent facial muscle activity explicit produces an even greater recognition advantage. If there is conflicting sensory information, i.e., incongruent facial muscle activity, this might impede recognition. The effects of actively manipulating facial muscle activity on facial emotion recognition from videos were investigated across three experimental conditions: (a) explicit imitation of viewed facial emotional expressions (stimulus-congruent condition), (b) pen-holding with the lips (stimulus-incongruent condition), and (c) passive viewing (control condition). It was hypothesised that (1) experimental condition (a) and (b) result in greater facial muscle activity than (c), (2) experimental condition (a) increases emotion recognition accuracy from others’ faces compared to (c), (3) experimental condition (b) lowers recognition accuracy for expressions with a salient facial feature in the lower, but not the upper face area, compared to (c). Participants (42 males, 42 females) underwent a facial emotion recognition experiment (ADFES-BIV) while electromyography (EMG) was recorded from five facial muscle sites. The experimental conditions’ order was counter-balanced. Pen-holding caused stimulus-incongruent facial muscle activity for expressions with facial feature saliency in the lower face region, which reduced recognition of lower face region emotions. Explicit imitation caused stimulus-congruent facial muscle activity without modulating recognition. Methodological implications are discussed. PMID:29928240
E.M. Van Mulligen (Erik M.); Z. Afzal (Zubair); S.A. Akhondi (Saber); D. Vo (Dang); J.A. Kors (Jan)
textabstractWe participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach.
Vích, Robert; Nouza, J.; Vondra, Martin
-, č. 5042 (2008), s. 136-148 ISSN 0302-9743 R&D Projects: GA AV ČR 1ET301710509; GA AV ČR 1QS108040569 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech recognition * speech processing Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering
A key element to online learning is the ability to create a sense of presence to improve learning outcomes. This quasi-experimental study evaluated the impact of interactive video communication versus text-based feedback and found a significant difference between the 2 groups related to teaching, social, and cognitive presence. Recommendations to enhance presence should focus on providing timely feedback, interactive learning experiences, and opportunities for students to establish relationships with peers and faculty.
Pedersen, Kamilla; Moeller, Martin Holdgaard; Paltved, Charlotte; Mors, Ole; Ringsted, Charlotte; Morcke, Anne Mette
The aim of this study was to explore medical students' learning experiences from the didactic teaching formats using either text-based patient cases or video-based patient cases with similar content. The authors explored how the two different patient case formats influenced students' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed interviews. Students taught with text-based patient cases emphasized excitement and drama towards the personal clinical narratives presented by the teachers during the course, but never referred to the patient cases. Authority and boundary setting were regarded as important in managing patients. Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. The format of patient cases included in teaching may have a substantial impact on students' patient-centeredness. Video-based patient cases are probably more effective than text-based patient cases in fostering patient-centered perspectives in medical students. Teachers sharing stories from their own clinical experiences stimulates both engagement and excitement, but may also provoke unintended stigma and influence an authoritative approach in medical students towards managing patients in clinical psychiatry.
Tyner, Bryan C; Fienup, Daniel M
Graphing is socially significant for behavior analysts; however, graphing can be difficult to learn. Video modeling (VM) may be a useful instructional method but lacks evidence for effective teaching of computer skills. A between-groups design compared the effects of VM, text-based instruction, and no instruction on graphing performance. Participants who used VM constructed graphs significantly faster and with fewer errors than those who used text-based instruction or no instruction. Implications for instruction are discussed. © Society for the Experimental Analysis of Behavior.
Full Text Available Increase in number of elderly people who are living independently needs especial care in the form of healthcare monitoring systems. Recent advancements in depth video technologies have made human activity recognition (HAR realizable for elderly healthcare applications. In this paper, a depth video-based novel method for HAR is presented using robust multi-features and embedded Hidden Markov Models (HMMs to recognize daily life activities of elderly people living alone in indoor environment such as smart homes. In the proposed HAR framework, initially, depth maps are analyzed by temporal motion identification method to segment human silhouettes from noisy background and compute depth silhouette area for each activity to track human movements in a scene. Several representative features, including invariant, multi-view differentiation and spatiotemporal body joints features were fused together to explore gradient orientation change, intensity differentiation, temporal variation and local motion of specific body parts. Then, these features are processed by the dynamics of their respective class and learned, modeled, trained and recognized with specific embedded HMM having active feature values. Furthermore, we construct a new online human activity dataset by a depth sensor to evaluate the proposed features. Our experiments on three depth datasets demonstrated that the proposed multi-features are efficient and robust over the state of the art features for human action and activity recognition.
Full Text Available Textual information embedded in multimedia can provide a vital tool for indexing and retrieval. A lot of work is done in the field of text localization and detection because of its very fundamental importance. One of the biggest challenges of text detection is to deal with variation in font sizes and image resolution. This problem gets elevated due to the undersegmentation or oversegmentation of the regions in an image. The paper addresses this problem by proposing a solution using novel fuzzy-based method. This paper advocates postprocessing segmentation method that can solve the problem of variation in text sizes and image resolution. The methodology is tested on ICDAR 2011 Robust Reading Challenge dataset which amply proves the strength of the recommended method.
Imran, Ali Shariq; Moreno Celleri, Alejandro Manuel; Cheikh, Faouzi Alaya
The usage of non-scripted lecture videos as a part of learning material is becoming an everyday activity in most of higher education institutions due to the growing interest in flexible and blended education. Generally these videos are delivered as part of Learning Objects (LO) through various
Private online language tutoring is growing in popularity. An important prerequisite for development of effective pedagogies in this context is a good understanding of how different modalities can be combined. This study provides a detailed account of how several experienced private online teachers use text chat in their Skype-based English…
Saqer, Haneen; de Visser, Ewart; Strohl, Jonathan; Parasuraman, Raja
The proliferation of portable communication and entertainment devices has introduced new dangers to the driving environment, particularly for young and inexperienced drivers. Graduate students from George Mason University illustrate a powerful, practical, and cost-effective program that has been successful in educating these drivers on the dangers of texting while driving, which can easily be adapted and implemented in other communities.
Mahadeo, Nitin Kumar
This dissertation focuses on iris biometrics. Although the iris is the most accurate biometric, its adoption has been relatively slow. Conventional iris recognition systems utilize still eye images captured in ideal environments and require highly constrained subject presentation. A drop in recognition performance is observed when these constraints are removed as the quality of the data acquired is affected by heterogeneous factors. For iris recognition to be widely adopted, it can therefore ...
series, and movie trailers . We observe these professional videos are typically semantically dissimilar to the event videos which we are interested in...a list of keywords from Wikipedia, which provides an extensive index of celebrity, TV series and movie names1. We exclude the videos whose...Swimming 0.520 0.489 0.691 0.764 Biking 0.324 0.307 0.420 0.561 Graduation 0.083 0.058 0.135 0.121 Birthday 0.149 0.216 0.187 0.257 Wedding reception
We present the design, and analyze the performance of a multi-stage natural language processing system employing named entity recognition, Bayesian statistics, and rule logic to identify and characterize heart disease risk factor events in diabetic patients over time. The system was originally developed for the 2014 i2b2 Challenges in Natural Language in Clinical Data. The system's strengths included a high level of accuracy for identifying named entities associated with heart disease risk factor events. The system's primary weakness was due to inaccuracies when characterizing the attributes of some events. For example, determining the relative time of an event with respect to the record date, whether an event is attributable to the patient's history or the patient's family history, and differentiating between current and prior smoking status. We believe these inaccuracies were due in large part to the lack of an effective approach for integrating context into our event detection model. To address these inaccuracies, we explore the addition of a distributional semantic model for characterizing contextual evidence of heart disease risk factor events. Using this semantic model, we raise our initial 2014 i2b2 Challenges in Natural Language of Clinical data F1 score of 0.838 to 0.890 and increased precision by 10.3% without use of any lexicons that might bias our results. Copyright © 2015 Elsevier Inc. All rights reserved.
Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules
Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that
Full Text Available Micro-expressions play an essential part in understanding non-verbal communication and deceit detection. They are involuntary, brief facial movements that are shown when a person is trying to conceal something. Automatic analysis of micro-expression is challenging due to their low amplitude and to their short duration (they occur as fast as 1/15 to 1/25 of a second. We propose a fully micro-expression analysis system consisting of a high-speed image acquisition setup and a software framework which can detect the frames when the micro-expressions occurred as well as determine the type of the emerged expression. The detection and classification methods use fast and simple motion descriptors based on absolute image differences. The recognition module it only involves the computation of several 2D Gaussian probabilities. The software framework was tested on two publicly available high speed micro-expression databases and the whole system was used to acquire new data. The experiments we performed show that our solution outperforms state of the art works which use more complex and computationally intensive descriptors.
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random
Atick, Joseph J.; Griffin, Paul M.; Redlich, A. N.
Recent advances in image and pattern recognition technology- -especially face recognition--are leading to the development of a new generation of information systems of great value to the law enforcement community. With these systems it is now possible to pool and manage vast amounts of biometric intelligence such as face and finger print records and conduct computerized searches on them. We review one of the enabling technologies underlying these systems: the FaceIt face recognition engine; and discuss three applications that illustrate its benefits as a problem-solving technology and an efficient and cost effective investigative tool.
Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein
Many Web-based computer-tailored interventions are characterized by high dropout rates, which limit their potential impact. This study had 4 aims: (1) examining if the use of a Web-based computer-tailored obesity prevention intervention can be increased by using videos as the delivery format, (2) examining if the delivery of intervention content via participants' preferred delivery format can increase intervention use, (3) examining if intervention effects are moderated by intervention use and matching or mismatching intervention delivery format preference, (4) and identifying which sociodemographic factors and intervention appreciation variables predict intervention use. Data were used from a randomized controlled study into the efficacy of a video and text version of a Web-based computer-tailored obesity prevention intervention consisting of a baseline measurement and a 6-month follow-up measurement. The intervention consisted of 6 weekly sessions and could be used for 3 months. ANCOVAs were conducted to assess differences in use between the video and text version and between participants allocated to a matching and mismatching intervention delivery format. Potential moderation by intervention use and matching/mismatching delivery format on self-reported body mass index (BMI), physical activity, and energy intake was examined using regression analyses with interaction terms. Finally, regression analysis was performed to assess determinants of intervention use. In total, 1419 participants completed the baseline questionnaire (follow-up response=71.53%, 1015/1419). Intervention use declined rapidly over time; the first 2 intervention sessions were completed by approximately half of the participants and only 10.9% (104/956) of the study population completed all 6 sessions of the intervention. There were no significant differences in use between the video and text version. Intervention use was significantly higher among participants who were allocated to an
Yu, Jun; Wang, Zeng-Fu
A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.
Full Text Available There is a large number of studies on how to promote students’ cognitive processes and learning achievements through various learning activities supported by advanced learning technologies. However, not many of them focus on applying the knowledge that students learn in school to solve authentic daily life problems. This study aims to propose a cognitive diffusion model called User-oriented Context-to-Text Recognition for Learning (U-CTRL to facilitate and improve students’ learning and cognitive processes from lower levels (i.e., Remember and Understand to higher levels (i.e., Apply and above through an innovative approach, called User-Oriented Context-to-Text Recognition for Learning (U-CTRL. With U-CTRL, students participate in learning activities in which they capture the learning context that can be scanned and recognized by a computer application as text. Furthermore, this study proposes the use of an innovative model, called Cognitive Diffusion Model, to investigate the diffusion and transition of students’ cognitive processes in different learning stages including pre-schooling, after-schooling, crossing the chasm, and higher cognitive processing. Finally, two cases are presented to demonstrate how the U-CTRL approach can be used to facilitate student cognition in their learning of English and Natural science.
Vision is only a part of a system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. These mechanisms provide a reliable recognition if the object is occluded or cannot be recognized as a whole. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, is the most feasible for such models. It converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Network-Symbolic Transformations derive abstract structures, which allows for invariant recognition of an object as exemplar of a class. Active vision helps creating consistent models. Attention, separation of figure from ground and perceptual grouping are special kinds of network-symbolic transformations. Such Image/Video Understanding Systems will be reliably recognizing targets.
Schlipsing, Marc; Salmen, Jan; Tschentscher, Marc
are taken into account. Our contribution is twofold: (1) the deliberate use of machine learning and pattern recognition techniques allows us to achieve high classification accuracy in varying environments. We systematically evaluate combinations of image features and learning machines in the given online......Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for "live" recognition and tracking of players exist. But these methods are far from collecting and analyzing event data fully autonomously. To generate accurate results......, human interaction is required at different stages including system setup, calibration, supervision of classifier training, and resolution of tracking conflicts. Furthermore, the real-time constraints are challenging: in contrast to other object recognition and tracking applications, we cannot treat data...
Full Text Available The article examines the texts of political advertising video clips issued by the candidates for presidency in France during the campaign before the first round of elections in 2017. The mentioned examples of media texts are analysed from the compositional point of view as well as from that of the content particularities which are directly connected to the text structure. In general, the majority of the studied clips have a similar structure and consist of three parts: introduction, main part and conclusion. However, as a result of the research, a range of advantages marking well-structured videos was revealed. These include: addressing the voters and stating the speech topic clearly at the beginning of the clip, a relevant attention-grabbing opening phrase, consistency and clarity of the information presentation, appropriate use of additional video plots, conclusion at the end of the clip.
Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil
Background The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. Objective A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. Methods An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George’s, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Results Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students’ ability to review and critically appraise the presented information. Conclusions Our findings suggest that text was perceived to be a
Woodham, Luke A; Ellaway, Rachel H; Round, Jonathan; Vaughan, Sophie; Poulton, Terry; Zary, Nabil
The impact of the use of video resources in primarily paper-based problem-based learning (PBL) settings has been widely explored. Although it can provide many benefits, the use of video can also hamper the critical thinking of learners in contexts where learners are developing clinical reasoning. However, the use of video has not been explored in the context of interactive virtual patients for PBL. A pilot study was conducted to explore how undergraduate medical students interpreted and evaluated information from video- and text-based materials presented in the context of a branched interactive online virtual patient designed for PBL. The goal was to inform the development and use of virtual patients for PBL and to inform future research in this area. An existing virtual patient for PBL was adapted for use in video and provided as an intervention to students in the transition year of the undergraduate medicine course at St George's, University of London. Survey instruments were used to capture student and PBL tutor experiences and perceptions of the intervention, and a formative review meeting was run with PBL tutors. Descriptive statistics were generated for the structured responses and a thematic analysis was used to identify emergent themes in the unstructured responses. Analysis of student responses (n=119) and tutor comments (n=18) yielded 8 distinct themes relating to the perceived educational efficacy of information presented in video and text formats in a PBL context. Although some students found some characteristics of the videos beneficial, when asked to express a preference for video or text the majority of those that responded to the question (65%, 65/100) expressed a preference for text. Student responses indicated that the use of video slowed the pace of PBL and impeded students' ability to review and critically appraise the presented information. Our findings suggest that text was perceived to be a better source of information than video in virtual
Onofri, L.; Soda, P.; Pechenizkiy, M.; Iannello, G.
Human activity recognition has gained an increasing relevance in computer vision and it can be tackled with either non-hierarchical or hierarchical approaches. The former, also known as single-layered approaches, are those that represent and recognize human activities directly from the extracted
Kroll, Christine; von der Werth, Monika; Leuck, Holger; Stahl, Christoph; Schertler, Klaus
For Intelligence, Surveillance, Reconnaissance (ISR) missions of manned and unmanned air systems typical electrooptical payloads provide high-definition video data which has to be exploited with respect to relevant ground targets in real-time by automatic/assisted target recognition software. Airbus Defence and Space is developing required technologies for real-time sensor exploitation since years and has combined the latest advances of Deep Convolutional Neural Networks (CNN) with a proprietary high-speed Support Vector Machine (SVM) learning method into a powerful object recognition system with impressive results on relevant high-definition video scenes compared to conventional target recognition approaches. This paper describes the principal requirements for real-time target recognition in high-definition video for ISR missions and the Airbus approach of combining an invariant feature extraction using pre-trained CNNs and the high-speed training and classification ability of a novel frequency-domain SVM training method. The frequency-domain approach allows for a highly optimized implementation for General Purpose Computation on a Graphics Processing Unit (GPGPU) and also an efficient training of large training samples. The selected CNN which is pre-trained only once on domain-extrinsic data reveals a highly invariant feature extraction. This allows for a significantly reduced adaptation and training of the target recognition method for new target classes and mission scenarios. A comprehensive training and test dataset was defined and prepared using relevant high-definition airborne video sequences. The assessment concept is explained and performance results are given using the established precision-recall diagrams, average precision and runtime figures on representative test data. A comparison to legacy target recognition approaches shows the impressive performance increase by the proposed CNN+SVM machine-learning approach and the capability of real-time high
benefits of using ei- ther a CNN or a temporal model like an RNN or Long Short Term Memory ( LSTM ) network individually, very few works [14, 15] have...subjects. In our experiments, we focus on predicting the valence score using just the video modality. Also, since the test set labels were not readily...correlation coefficient, σ2x and σ 2 y are the variance of the predicted and ground truth values re- spectively and µx and µy are their means
Kei Long Cheung
Full Text Available Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1 the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI, and (2 the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM difference = −205.40, p = 0.00 and 12 months (LSM difference = −128.14, p = 0.03. Only video intervention resulted in lower average daily energy intake after one year (d = 0.12. Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring.
Shuangbao Paul Wang
Full Text Available In this paper, we present a novel system, inVideo, for automatically indexing and searching videos based on the keywords spoken in the audio track and the visual content of the video frames. Using the highly efficient video indexing engine we developed, inVideo is able to analyze videos using machine learning and pattern recognition without the need for initial viewing by a human. The time-stamped commenting and tagging features refine the accuracy of search results. The cloud-based implementation makes it possible to conduct elastic search, augmented search, and data analytics. Our research shows that inVideo presents an efficient tool in processing and analyzing videos and increasing interactions in video-based online learning environment. Data from a cybersecurity program with more than 500 students show that applying inVideo to current video material, interactions between student-student and student-faculty increased significantly across 24 sections program-wide.
Cheung, Kei Long; Schwabe, Inga; Walthouwer, Michel J. L.; Oenema, Anke; de Vries, Hein
Computer-tailored programs may help to prevent overweight and obesity, which are worldwide public health problems. This study investigated (1) the 12-month effectiveness of a video- and text-based computer-tailored intervention on energy intake, physical activity, and body mass index (BMI), and (2) the role of educational level in intervention effects. A randomized controlled trial in The Netherlands was conducted, in which adults were allocated to a video-based condition, text-based condition, or control condition, with baseline, 6 months, and 12 months follow-up. Outcome variables were self-reported BMI, physical activity, and energy intake. Mixed-effects modelling was used to investigate intervention effects and potential interaction effects. Compared to the control group, the video intervention group was effective regarding energy intake after 6 months (least squares means (LSM) difference = −205.40, p = 0.00) and 12 months (LSM difference = −128.14, p = 0.03). Only video intervention resulted in lower average daily energy intake after one year (d = 0.12). Educational role and BMI did not seem to interact with this effect. No intervention effects on BMI and physical activity were found. The video computer-tailored intervention was effective on energy intake after one year. This effect was not dependent on educational levels or BMI categories, suggesting that video tailoring can be effective for a broad range of risk groups and may be preferred over text tailoring. PMID:29065545
Borreo, Alessandro; Onofri, Leonardo; Soda, Paolo
Public datasets played a key role in the increasing level of interest that vision-based human action recognition has attracted in last years. While the production of such datasets has been influenced by the variability introduced by various actors performing the actions, the different modalities of interactions with the environment introduced by the variation of the scenes around the actors has been scarcely took into account. As a consequence, public datasets do not provide a proper test-bed for recognition algorithms that aim at achieving high accuracy, irrespective of the environment where actions are performed. This is all the more so, when systems are designed to recognize activities of daily living (ADL), which are characterized by a high level of human-environment interaction. For that reason, we present in this manuscript the MEA dataset, a new multi-environment ADL dataset, which permitted us to show how the change of scenario can affect the performances of state-of-the-art approaches for action recognition.
Kliemann, Dorit; Rosenblau, Gabriela; Bölte, Sven; Heekeren, Hauke R.; Dziobek, Isabel
Recognizing others' emotional states is crucial for effective social interaction. While most facial emotion recognition tasks use explicit prompts that trigger consciously controlled processing, emotional faces are almost exclusively processed implicitly in real life. Recent attempts in social cognition suggest a dual process perspective, whereby explicit and implicit processes largely operate independently. However, due to differences in methodology the direct comparison of implicit and explicit social cognition has remained a challenge. Here, we introduce a new tool to comparably measure implicit and explicit processing aspects comprising basic and complex emotions in facial expressions. We developed two video-based tasks with similar answer formats to assess performance in respective facial emotion recognition processes: Face Puzzle, implicit and explicit. To assess the tasks' sensitivity to atypical social cognition and to infer interrelationship patterns between explicit and implicit processes in typical and atypical development, we included healthy adults (NT, n = 24) and adults with autism spectrum disorder (ASD, n = 24). Item analyses yielded good reliability of the new tasks. Group-specific results indicated sensitivity to subtle social impairments in high-functioning ASD. Correlation analyses with established implicit and explicit socio-cognitive measures were further in favor of the tasks' external validity. Between group comparisons provide first hints of differential relations between implicit and explicit aspects of facial emotion recognition processes in healthy compared to ASD participants. In addition, an increased magnitude of between group differences in the implicit task was found for a speed-accuracy composite measure. The new Face Puzzle tool thus provides two new tasks to separately assess explicit and implicit social functioning, for instance, to measure subtle impairments as well as potential improvements due to social cognitive
In this article, I shall examine the cognitive, heuristic and theoretical functions of the concept of recognition. To evaluate both the explanatory power and the limitations of a sociological concept, the theory construction must be analysed and its actual productivity for sociological theory mus...
Hung, Yu-Wan; Higgins, Steve
This study investigates the different learning opportunities enabled by text-based and video-based synchronous computer-mediated communication (SCMC) from an interactionist perspective. Six Chinese-speaking learners of English and six English-speaking learners of Chinese were paired up as tandem (reciprocal) learning dyads. Each dyad participated…
Walsh-Buhi, Eric R.; Helmy, Hannah; Harsch, Kristin; Rella, Natalie; Godcharles, Cheryl; Ogunrunde, Adejoke; Lopez Castillo, Humberto
Objective: This paper reports on a pilot study evaluating the feasibility and acceptability of a text- and mobile video-based intervention to educate women and men attending college about non-daily contraception, with a particular focus on long-acting reversible contraception (LARC). A secondary objective is to describe the process of intervention…
fifin naili rizkiyah
Full Text Available Abstract: This research is aimed at finding out how Process-Genre Based Approach strategy with YouTube Videos as the media are employed to improve the students� ability in writing hortatory exposition texts. This study uses collaborative classroom action research design following the procedures namely planning, implementing, observing, and reflecting. The procedures of carrying out the strategy are: (1 relating several issues/ cases to the students� background knowledge and introducing the generic structures and linguistic features of hortatory exposition text as the BKoF stage, (2 analyzing the generic structure and the language features used in the text and getting model on how to write a hortatory exposition text by using the YouTube Video as the MoT stage, (3 writing a hortatory exposition text collaboratively in a small group and in pairs through process writing as the JCoT stage, and (4 writing a hortatory exposition text individually as the ICoT stage. The result shows that the use of Process-Genre Based Approach and YouTube Videos can improve the students� ability in writing hortatory exposition texts. The percentage of the students achieving the score above the minimum passing grade (70 had improved from only 15.8% (3 out of 19 students in the preliminary study to 100% (22 students in the Cycle 1. Besides, the score of each aspect; content, organization, vocabulary, grammar, and mechanics also improved. � Key Words: writing ability, hortatory exposition text, process-genre based approach, youtube video
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. PMID:28335510
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Gur, Michal; Nir, Vered; Teleshov, Anna; Bar-Yoseph, Ronen; Manor, Eynav; Diab, Gizelle; Bentur, Lea
Background Poor communications between cystic fibrosis (CF) patients and health-care providers may result in gaps in knowledge and misconceptions about medication usage, and can lead to poor adherence. We aimed to assess the feasibility of using WhatsApp and Skype to improve communications. Methods This single-centre pilot study included CF patients who were older than eight years of age assigned to two groups: one without intervention (control group), and one with intervention. Each patient from the intervention group received Skype-based online video chats and WhatsApp messages from members of the multidisciplinary CF team. CF questionnaires, revised (CFQ-R) scores, knowledge and adherence based on CF My Way and patients satisfaction were evaluated before and after three months. Feasibility was assessed by session attendance, acceptability and satisfaction survey. Descriptive analysis and paired and non-paired t-tests were used as applicable. Results Eighteen patients were recruited to this feasibility study (nine in each group). Each intervention group participant had between four and six Skype video chats and received 22-45 WhatsApp messages. In this small study, CFQ-R scores, knowledge, adherence and patient satisfaction were similar in both groups before and after the three-month intervention. Conclusions A telehealth-based approach, using Skype video chats and WhatsApp messages, was feasible and acceptable in this pilot study. A larger and longer multi-centre study is warranted to examine the efficacy of these interventions to improve knowledge, adherence and communication.
Pedersen, Kamilla; Holdgaard, Martin Møller; Paltved, Charlotte
' perceptions of psychiatric patients and students' reflections on meeting and communicating with psychiatric patients. METHODS: The authors conducted group interviews with 30 medical students who volunteered to participate in interviews and applied inductive thematic content analysis to the transcribed....... Students taught with video-based patient cases, in contrast, often referred to the patient cases when highlighting new insights, including the importance of patient perspectives when communicating with patients. CONCLUSION: The format of patient cases included in teaching may have a substantial impact...... unintended stigma and influence an authoritative approach in medical students towards managing patients in clinical psychiatry....
Full Text Available Abstract Action recognition from video is a problem that has many important applications to human motion analysis. In real-world settings, the viewpoint of the camera cannot always be fixed relative to the subject, so view-invariant action recognition methods are needed. Previous view-invariant methods use multiple cameras in both the training and testing phases of action recognition or require storing many examples of a single action from multiple viewpoints. In this paper, we present a framework for learning a compact representation of primitive actions (e.g., walk, punch, kick, sit that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation. Using our method, which models the low-dimensional structure of these actions relative to viewpoint, we show recognition rates on a publicly available dataset previously only achieved using multiple simultaneous views.
Eka Bayu Pramanca
Full Text Available This research discusses about how two different techniques affect the students’ ability in descriptive text at SMP N 2 Metro. The objectives of this research are (1 to know the difference result of using YouTube Downloaded Video and Serial Pictures media toward students’ writing ability in descriptive text and (2 to know which one is more effective of students’ writing ability in descriptive text instruction between learning by using YouTube Downloaded Video and Serial Pictures media. The implemented method is quantitative research design in that both researchers use true experimental research design. In this research , experimental and control class pre-test and post test are conducted. It is carried out at the first grade of SMP N 2 Metro in academic year 2012/2013. The population in this research is 7 different classes with total number of 224 students. 2 classes of the total population are taken as the samples; VII.1 students in experimental class and VII.2 students in control class by using cluster random sampling technique. The instruments of the research are tests, treatment and post-test. The data analyzing procedure uses t-test and results the following output. The result of ttest is 3,96 and ttable is 2,06. It means that tcount > ttable with the criterion of ttest is Ha is accepted if tcount > ttable. So, there is any difference result of students’ writing ability using YouTube Downloaded Video and Serial Pictures Media. However; Youtube Downloaded Video media is more effective media than Serial Pictures media toward students’ writing ability. This research is consistent with the previous result of the studies and thus this technique is recommended to use in writing instruction especially in descriptive text in order that students may feel fun and enjoy during the learning process.
Full Text Available This paper proposes novel framework for facial expressions analysis using dynamic and static information in video sequences. First, based on incremental formulation, discriminative deformable face alignment method is adapted to locate facial points to correct in-plane head rotation and break up facial region from background. Then, spatial-temporal motion local binary pattern (LBP feature is extracted and integrated with Gabor multiorientation fusion histogram to give descriptors, which reflect static and dynamic texture information of facial expressions. Finally, a one-versus-one strategy based multiclass support vector machine (SVM classifier is applied to classify facial expressions. Experiments on Cohn-Kanade (CK + facial expression dataset illustrate that integrated framework outperforms methods using single descriptors. Compared with other state-of-the-art methods on CK+, MMI, and Oulu-CASIA VIS datasets, our proposed framework performs better.
Walthouwer, Michel Jean Louis; Oenema, Anke; Lechner, Lilian; de Vries, Hein
Web-based computer-tailored interventions often suffer from small effect sizes and high drop-out rates, particularly among people with a low level of education. Using videos as a delivery format can possibly improve the effects and attractiveness of these interventions The main aim of this study was to examine the effects of a video and text version of a Web-based computer-tailored obesity prevention intervention on dietary intake, physical activity, and body mass index (BMI) among Dutch adults. A second study aim was to examine differences in appreciation between the video and text version. The final study aim was to examine possible differences in intervention effects and appreciation per educational level. A three-armed randomized controlled trial was conducted with a baseline and 6 months follow-up measurement. The intervention consisted of six sessions, lasting about 15 minutes each. In the video version, the core tailored information was provided by means of videos. In the text version, the same tailored information was provided in text format. Outcome variables were self-reported and included BMI, physical activity, energy intake, and appreciation of the intervention. Multiple imputation was used to replace missing values. The effect analyses were carried out with multiple linear regression analyses and adjusted for confounders. The process evaluation data were analyzed with independent samples t tests. The baseline questionnaire was completed by 1419 participants and the 6 months follow-up measurement by 1015 participants (71.53%). No significant interaction effects of educational level were found on any of the outcome variables. Compared to the control condition, the video version resulted in lower BMI (B=-0.25, P=.049) and lower average daily energy intake from energy-dense food products (B=-175.58, PWeb-based computer-tailored obesity prevention intervention was the most effective intervention and most appreciated. Future research needs to examine if the
Shadiev, Rustam; Wu, Ting-Ting; Sun, Ai; Huang, Yueh-Min
In this study, 21 university students, who represented thirteen nationalities, participated in an online cross-cultural learning activity. The participants were engaged in interactions and exchanges carried out on Facebook® and Skype® platforms, and their multilingual communications were supported by speech-to-text recognition (STR) and…
Saha, Amal K
Banking is one of the most significant adopters of cutting-edge information technologies. Since its modern era beginning in the form of paper based accounting maintained in the branch, adoption of computerized system made it possible to centralize the processing in data centre and improve customer experience by making a more available and efficient system. The latest twist in this evolution is adoption of natural language processing and speech recognition in the user interface between the hum...
Mohd Asyraf Zulkifley
Full Text Available Event recognition is one of the most active research areas in video surveillance fields. Advancement in event recognition systems mainly aims to provide convenience, safety and an efficient lifestyle for humanity. A precise, accurate and robust approach is necessary to enable event recognition systems to respond to sudden changes in various uncontrolled environments, such as the case of an emergency, physical threat and a fire or bomb alert. The performance of sudden event recognition systems depends heavily on the accuracy of low level processing, like detection, recognition, tracking and machine learning algorithms. This survey aims to detect and characterize a sudden event, which is a subset of an abnormal event in several video surveillance applications. This paper discusses the following in detail: (1 the importance of a sudden event over a general anomalous event; (2 frameworks used in sudden event recognition; (3 the requirements and comparative studies of a sudden event recognition system and (4 various decision-making approaches for sudden event recognition. The advantages and drawbacks of using 3D images from multiple cameras for real-time application are also discussed. The paper concludes with suggestions for future research directions in sudden event recognition.
Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru
In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.
Tuna, Tayfun; Subhlok, Jaspal; Barker, Lecia; Shah, Shishir; Johnson, Olin; Hovey, Christopher
Videos of classroom lectures have proven to be a popular and versatile learning resource. A key shortcoming of the lecture video format is accessing the content of interest hidden in a video. This work meets this challenge with an advanced video framework featuring topical indexing, search, and captioning (ICS videos). Standard optical character recognition (OCR) technology was enhanced with image transformations for extraction of text from video frames to support indexing and search. The images and text on video frames is analyzed to divide lecture videos into topical segments. The ICS video player integrates indexing, search, and captioning in video playback providing instant access to the content of interest. This video framework has been used by more than 70 courses in a variety of STEM disciplines and assessed by more than 4000 students. Results presented from the surveys demonstrate the value of the videos as a learning resource and the role played by videos in a students learning process. Survey results also establish the value of indexing and search features in a video platform for education. This paper reports on the development and evaluation of ICS videos framework and over 5 years of usage experience in several STEM courses.
Full Text Available ... YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration ... Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: ...
Zhu, Qile; Li, Xiaolin; Conesa, Ana; Pereira, Cécile
Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. email@example.com or firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
E. Lenczowski, BS
Conclusion: In this study, we found that a brief, plain-language video was effective at conveying understandable content to help subjects learn to identify common cancerous and benign skin growths while also teaching them strategies to protect against skin cancer.
Kardava, Irakli; Tadyszak, Krzysztof; Gulua, Nana; Jurga, Stefan
For more flexibility of environmental perception by artificial intelligence it is needed to exist the supporting software modules, which will be able to automate the creation of specific language syntax and to make a further analysis for relevant decisions based on semantic functions. According of our proposed approach, of which implementation it is possible to create the couples of formal rules of given sentences (in case of natural languages) or statements (in case of special languages) by helping of computer vision, speech recognition or editable text conversion system for further automatic improvement. In other words, we have developed an approach, by which it can be achieved to significantly improve the training process automation of artificial intelligence, which as a result will give us a higher level of self-developing skills independently from us (from users). At the base of our approach we have developed a software demo version, which includes the algorithm and software code for the entire above mentioned component's implementation (computer vision, speech recognition and editable text conversion system). The program has the ability to work in a multi - stream mode and simultaneously create a syntax based on receiving information from several sources.
Full text: Video surveillance is a very crucial component in safeguard and physical protection. Digital technology has revolutionized the surveillance scenario and brought in various new capabilities like better image quality, faster search and retrieval of video images, less storage space for recording, efficient transmission and storage of video, better protection of recorded video images, and easy remote accesses to live and recorded video etc. The basic safeguard requirement for verifiably uninterrupted surveillance has remained largely unchanged since its inception. However, changes to the inspection paradigm to admit automated review and remote monitoring have dramatically increased the demands on safeguard surveillance system. Today's safeguard systems can incorporate intelligent motion detection with very low rate of false alarm and less archiving volume, embedded image processing capability for object behavior and event based indexing, object recognition, efficient querying and report generation etc. It also demands cryptographically authenticating, encrypted, and highly compressed video data for efficient, secure, tamper indicating and transmission. In physical protection, intelligent on robust video motion detection, real time moving object detection and tracking from stationary and moving camera platform, multi-camera cooperative tracking, activity detection and recognition, human motion analysis etc. is going to play a key rote in perimeter security. Incorporation of front and video imagery exploitation tools like automatic number plate recognition, vehicle identification and classification, vehicle undercarriage inspection, face recognition, iris recognition and other biometric tools, gesture recognition etc. makes personnel and vehicle access control robust and foolproof. Innovative digital image enhancement techniques coupled with novel sensor design makes low cost, omni-directional vision capable, all weather, day night surveillance a reality
scores from the HMM based OCR engine. Finally, the terms P( WIX ) can be estimated separately from training data as a distribution of normalized widths...front-end component, e.g., a client computer having a graphical user interface and/ or a Web browser through which a user can interact with an
Full Text Available With the wide applications of vision based intelligent systems, image and video analysis technologies have attracted the attention of researchers in the computer vision field. In image and video analysis, human activity recognition is an important research direction. By interpreting and understanding human activity, we can recognize and predict the occurrence of crimes and help the police or other agencies react immediately. In the past, a large number of papers have been published on human activity recognition in video and image sequences. In this paper, we provide a comprehensive survey of the recent development of the techniques, including methods, systems, and quantitative evaluation towards the performance of human activity recognition.
Ankit Kumar Agrawal
Full Text Available Abstract The amount of images and videos being shared by the user is exponentially increasing but applications that perform video analytics is severely lacking or work on limited set of data. It is also challenging to perform analytics with less time complexity. Object recognition is the primary step in video analytics. We implement a robust method to extract objects from the data which is in unstructured format and cannot be processed directly by relational databases. In this study we present our report with results after performance evaluation and compare them with results of MATLAB.
Elbouz, Marwa; Alfalou, Ayman; Brosseau, Christian
Home automation is being implemented into more and more domiciles of the elderly and disabled in order to maintain their independence and safety. For that purpose, we propose and validate a surveillance video system, which detects various posture-based events. One of the novel points of this system is to use adapted Vander-Lugt correlator (VLC) and joint-transfer correlator (JTC) techniques to make decisions on the identity of a patient and his three-dimensional (3-D) positions in order to overcome the problem of crowd environment. We propose a fuzzy logic technique to get decisions on the subject's behavior. Our system is focused on the goals of accuracy, convenience, and cost, which in addition does not require any devices attached to the subject. The system permits one to study and model subject responses to behavioral change intervention because several levels of alarm can be incorporated according different situations considered. Our algorithm performs a fast 3-D recovery of the subject's head position by locating eyes within the face image and involves a model-based prediction and optical correlation techniques to guide the tracking procedure. The object detection is based on (hue, saturation, value) color space. The system also involves an adapted fuzzy logic control algorithm to make a decision based on information given to the system. Furthermore, the principles described here are applicable to a very wide range of situations and robust enough to be implementable in ongoing experiments.
Bachman, John A; Gyori, Benjamin M; Sorger, Peter K
For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity
Thonnat , Monique
International audience; Extracting automatically the semantics from visual data is a real challenge. We describe in this paper how recent work in cognitive vision leads to significative results in activity recognition for visualsurveillance and video monitoring. In particular we present work performed in the domain of video understanding in our PULSAR team at INRIA in Sophia Antipolis. Our main objective is to analyse in real-time video streams captured by static video cameras and to recogniz...
Full Text Available Video has become an interactive medium of communication in everyday life. The sheer volume of video makes it extremely difficult to browse through and find the required data. Hence extraction of key frames from the video which represents the abstract of the entire video becomes necessary. The aim of the video shot detection is to find the position of the shot boundaries, so that key frames can be selected from each shot for subsequent processing such as video summarization, indexing etc. For most of the surveillance applications like video summery, face recognition etc., the hardware (real time implementation of these algorithms becomes necessary. Here in this paper we present the architecture for simultaneous accessing of consecutive frames, which are then used for the implementation of various Video Shot Detection algorithms. We also present the real time implementation of three video shot detection algorithms using the above mentioned architecture on FPGA (Field Programmable Gate Arrays.
Full Text Available Recently face recognition is attracting much attention in the society of network multimedia information access. Areas such as network security, content indexing and retrieval, and video compression benefits from face recognition technology because "people" are the center of attention in a lot of video. Network access control via face recognition not only makes hackers virtually impossible to steal one's "password", but also increases the user-friendliness in human-computer interaction. Indexing and/or retrieving video data based on the appearances of particular persons will be useful for users such as news reporters, political scientists, and moviegoers. For the applications of videophone and teleconferencing, the assistance of face recognition also provides a more efficient coding scheme. In this paper, we give an introductory course of this new information processing technology. The paper shows the readers the generic framework for the face recognition system, and the variants that are frequently encountered by the face recognizer. Several famous face recognition algorithms, such as eigenfaces and neural networks, will also be explained.
Full Text Available The analysis of video acquired with a wearable camera is a challenge that multimedia community is facing with the proliferation of such sensors in various applications. In this paper, we focus on the problem of automatic visual place recognition in a weakly constrained environment, targeting the indexing of video streams by topological place recognition. We propose to combine several machine learning approaches in a time regularized framework for image-based place recognition indoors. The framework combines the power of multiple visual cues and integrates the temporal continuity information of video. We extend it with computationally efficient semisupervised method leveraging unlabeled video sequences for an improved indexing performance. The proposed approach was applied on challenging video corpora. Experiments on a public and a real-world video sequence databases show the gain brought by the different stages of the method.
Full Text Available The understanding of ecosystem dynamics in deep-sea areas is to date limited by technical constraints on sampling repetition. We have elaborated a morphometry-based protocol for automated video-image analysis where animal movement tracking (by frame subtraction is accompanied by species identification from animals’ outlines by Fourier Descriptors and Standard K-Nearest Neighbours methods. One-week footage from a permanent video-station located at 1,100 m depth in Sagami Bay (Central Japan was analysed. Out of 150,000 frames (1 per 4 s, a subset of 10.000 was analyzed by a trained operator to increase the efficiency of the automated procedure. Error estimation of the automated and trained operator procedure was computed as a measure of protocol performance. Three displacing species were identified as the most recurrent: Zoarcid fishes (eelpouts, red crabs (Paralomis multispina, and snails (Buccinum soyomaruae. Species identification with KNN thresholding produced better results in automated motion detection. Results were discussed assuming that the technological bottleneck is to date deeply conditioning the exploration of the deep-sea.
Full Text Available Abstract Background The computer-aided identification of specific gait patterns is an important issue in the assessment of Parkinson's disease (PD. In this study, a computer vision-based gait analysis approach is developed to assist the clinical assessments of PD with kernel-based principal component analysis (KPCA. Method Twelve PD patients and twelve healthy adults with no neurological history or motor disorders within the past six months were recruited and separated according to their "Non-PD", "Drug-On", and "Drug-Off" states. The participants were asked to wear light-colored clothing and perform three walking trials through a corridor decorated with a navy curtain at their natural pace. The participants' gait performance during the steady-state walking period was captured by a digital camera for gait analysis. The collected walking image frames were then transformed into binary silhouettes for noise reduction and compression. Using the developed KPCA-based method, the features within the binary silhouettes can be extracted to quantitatively determine the gait cycle time, stride length, walking velocity, and cadence. Results and Discussion The KPCA-based method uses a feature-extraction approach, which was verified to be more effective than traditional image area and principal component analysis (PCA approaches in classifying "Non-PD" controls and "Drug-Off/On" PD patients. Encouragingly, this method has a high accuracy rate, 80.51%, for recognizing different gaits. Quantitative gait parameters are obtained, and the power spectrums of the patients' gaits are analyzed. We show that that the slow and irregular actions of PD patients during walking tend to transfer some of the power from the main lobe frequency to a lower frequency band. Our results indicate the feasibility of using gait performance to evaluate the motor function of patients with PD. Conclusion This KPCA-based method requires only a digital camera and a decorated corridor setup
McClean, Clare M.
Reviews strengths and weaknesses of five optical character recognition (OCR) software packages used to digitize paper documents before publishing on the Internet. Outlines options available and stages of the conversion process. Describes the learning experience of Eurotext, a United Kingdom-based electronic libraries project (eLib). (PEN)
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Full Text Available ... Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing Chronic Pain and Depression ...
Full Text Available ... YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ... *PDF files require the free Adobe® Reader® software for viewing. This website is maintained by the ...
Full Text Available ... questions Clinical Studies Publications Catalog Photos and Images Spanish Language Information Grants and Funding Extramural Research Division ... Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video ...
Full Text Available ... for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...
Full Text Available Purpose: The represented research results are aimed to improve theoretical basics of computer vision and artificial intelligence of dynamical system. Proposed approach of object detection and recognition is based on probabilistic fundamentals to ensure the required level of correct object recognition. Methods: Presented approach is grounded at probabilistic methods, statistical methods of probability density estimation and computer-based simulation at verification stage of development. Results: Proposed approach for object detection and recognition for video stream data processing has shown several advantages in comparison with existing methods due to its simple realization and small time of data processing. Presented results of experimental verification look plausible for object detection and recognition in video stream. Discussion: The approach can be implemented in dynamical system within changeable environment such as remotely piloted aircraft systems and can be a part of artificial intelligence in navigation and control systems.
Wang, Jiang; Wu, Ying
Action recognition technology has many real-world applications in human-computer interaction, surveillance, video retrieval, retirement home monitoring, and robotics. The commoditization of depth sensors has also opened up further applications that were not feasible before. This text focuses on feature representation and machine learning algorithms for action recognition from depth sensors. After presenting a comprehensive overview of the state of the art, the authors then provide in-depth descriptions of their recently developed feature representations and machine learning techniques, includi
Full Text Available We present a global overview of image- and video-processing-based methods to help the communication of hearing impaired people. Two directions of communication have to be considered: from a hearing person to a hearing impaired person and vice versa. In this paper, firstly, we describe sign language (SL and the cued speech (CS language which are two different languages used by the deaf community. Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people. Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.
Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.
Full Text Available The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.
Proper Names and Named Entities Recognition in the Automatic Text Processing. Review of the book: Nouvel, D., Ehrmann, M., & Rosset, S. (2016. Named Entities for Computational Linguistics. London; Hoboken: ISTE Ltd; John Wiley & Sons, Inc., 2016.
Daria M. Golikova
Full Text Available The reviewed book by Damien Nouvel, Maud Ehrmann, and Sophie Rosset Named Entities for Computational Linguistics deals with automatic processing of texts, written in a natural language, and with named entities recognition, aimed at extracting most important information in these texts. The notion of named entities here extends to the entire set of linguistic units referring to an object. The researchers minutely consider the concept of named entities, juxtaposing this category to that of proper names and comparing their definitions, and describe all the stages of creation and implementation of automatic text annotation algorithms, as well as different ways of evaluating their performance quality. Proper names, in this context, are seen as a particular instance of named entities, one of the typical sources of reference to real objects to be electronically recognized in the text. The book provides a detailed overview and analysis of previous studies in the same field, based mainly on the English language data. It presents instruments and resources required to create and implement the algorithms in question, these may include typologies, knowledge or databases, and various types of corpora. Theoretical considerations, proposed by the authors, are supported by a significant number of exemplary cases, with algorithms operation principles presented in charts. The reviewed book gives quite a comprehensive picture of modern computational linguistic studies focused on named entities recognition and indicates some problems which are unresolved as yet.
Full Text Available ... search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
Full Text Available ... Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia NEI Home Contact Us A-Z Site Map NEI on Social Media Information in Spanish (Información en español) Website, ...
Full Text Available ... search for current job openings visit HHS USAJobs Home » NEI YouTube Videos » NEI YouTube Videos: Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract ...
This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV
Full Text Available The discourses of crisis in the humanities is juxtaposed with an analysis of remix video practices to suggest that the cognitive and cultural engagement feared lost in the former appear with frequency and enthusiasm in the latter. Whether humanists focus on the deleterious effects of the digital or celebrate the digital humanities but resist a turn to computation, their anxieties turn to the disappearance of textual analysis, aesthetics, critique, and self-reflection. Remix video, as exemplified by mashups, trailer remixes, and vids, depends on these same competencies for the creation and circulation of its works. Remix video is not the answer to the crises of the humanities; rather, the recognition of a common set of practices, skills, and values underpinning scholars and video practitioners' work provides the basis for a coalitional approach: identification of shared opportunities to promote and engage potential participants in the modes of thinking and production that contend with complex cultural ideas.
Chernyshov Alexander V.
Full Text Available The article focuses on the origins of the song videos as TV and Internet-genre. In addition, it considers problems of screen images creation depending on the musical form and the text of a songs in connection with relevant principles of accent and phraseological video editing and filming techniques as well as with additional frames and sound elements.
Full Text Available A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients (ROI-IHOG feature extraction method is proposed later. A support vector machine (SVM classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.
by centering x and y flow values around 128 and multiplying by a scalar such that flow values fall between 0 and 255. We also calculate the flow magni...as for MPII- MD. 4.2. Evaluation Metrics Quantitative evaluation of the models are performed us- ing the METEOR  metric which was originally pro...candidate refer- ence sentences. METEOR compares exact token matches, stemmed tokens, paraphrase matches, as well as semanti- cally similar matches
Full Text Available Most security systems, with their transmission bandwidth and computing power both being sufficient, emphasize their automatic recognition techniques. However, in some situations such as baby monitors and intruder avoidance by mobile sensors, the decision function sometimes can be shifted to the concerned human to reduce the transmission and computation cost. We therefore propose a binary video compression method in low resolution to achieve a low cost mobile video communication for inexpensive camera sensors. Shape compensation as proposed in this communication successfully replaces the standard Discrete Cosine Transformation (DCT after motion compensation.
Full Text Available As academics we study, research and teach audiovisual media, yet rarely disseminate and mediate through it. Today, developments in production technologies have enabled academic researchers to create videos and mediate audiovisually. In academia it is taken for granted that everyone can write a text. Is it now time to assume that everyone can make a video essay? Using the online journal of academic videos Audiovisual Thinking and the videos published in it as a case study, this article seeks to reflect on the emergence and legacy of academic audiovisual dissemination. Anchoring academic video and audiovisual dissemination of knowledge in two critical traditions, documentary theory and semiotics, we will argue that academic video is in fact already present in a variety of academic disciplines, and that academic audiovisual essays are bringing trends and developments that have long been part of academic discourse to their logical conclusion.
Full Text Available ... Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with special ...
Full Text Available ... Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: Planning for Adulthood (Video) Print Young adults with ...
This book provides a unique view of human activity recognition, especially fine-grained human activity structure learning, human-interaction recognition, RGB-D data based action recognition, temporal decomposition, and causality learning in unconstrained human activity videos. The techniques discussed give readers tools that provide a significant improvement over existing methodologies of video content understanding by taking advantage of activity recognition. It links multiple popular research fields in computer vision, machine learning, human-centered computing, human-computer interaction, image classification, and pattern recognition. In addition, the book includes several key chapters covering multiple emerging topics in the field. Contributed by top experts and practitioners, the chapters present key topics from different angles and blend both methodology and application, composing a solid overview of the human activity recognition techniques. .
Full Text Available ... will allow you to take a more active role in your care. The information in these videos ... Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for ...
Full Text Available ... Eye Disease Dilated Eye Exam Dry Eye For Kids Glaucoma Healthy Vision Tips Leber Congenital Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded ...
Full Text Available ... of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 ... Patient to an Adult Rheumatologist Drug Information for Patients Arthritis Drug Information Sheets Benefits and Risks of ...
Gleue, Alan D.; Depcik, Chris; Peltier, Ted
Last school year, I had a web link emailed to me entitled "A Dashboard Physics Lesson." The link, created and posted by Dale Basier on his "Lab Out Loud" blog, illustrates video of a car's speedometer synchronized with video of the road. These two separate video streams are compiled into one video that students can watch and analyze. After seeing…
Sullivan, Helen W; O'Donoghue, Amie C; Gard Read, Jennifer; Amoozegar, Jacqueline B; Aikin, Kathryn J; Rupert, Douglas J
Direct-to-consumer (DTC) promotion of prescription drugs can affect consumer behaviors and health outcomes, and Internet drug promotion is growing rapidly. Branded drug websites often capitalize on the multimedia capabilities of the Internet by using videos to emphasize drug benefits and characteristics. However, it is unknown how such videos affect consumer processing of drug information. This study aimed to examine how videos on prescription drug websites, and the inclusion of risk information in those videos, influence consumer knowledge and perceptions. We conducted an experimental study in which online panel participants with acid reflux (n=1070) or high blood pressure (n=1055) were randomly assigned to view 1 of the 10 fictitious prescription drug websites and complete a short questionnaire. On each website, we manipulated the type of video (patient testimonial, mechanism of action animation, or none) and whether the video mentioned drug risks. Participants who viewed any video were less likely to recognize drug risks presented only in the website text (P≤.01). Including risk information in videos increased participants' recognition of the risks presented in the videos (P≤.01). However, in some cases, including risk information in videos decreased participants' recognition of the risks not presented in the videos (ie, risks presented in text only; P≤.04). Participants who viewed a video without drug risk information thought that the website placed more emphasis on benefits, compared with participants who viewed the video with drug risk information (P≤.01). Compared with participants who viewed a video without drug risk information, participants who viewed a video with drug risk information thought that the drug was less effective in the high blood pressure sample (P=.03) and thought that risks were more serious in the acid reflux sample (P=.01). There were no significant differences between risk and nonrisk video conditions on other perception
Bornoe, Nis; Barkhuus, Louise
Microblogging is a recently popular phenomenon and with the increasing trend for video cameras to be built into mobile phones, a new type of microblogging has entered the arena of electronic communication: video microblogging. In this study we examine video microblogging, which is the broadcasting...... of short videos. A series of semi-structured interviews offers an understanding of why and how video microblogging is used and what the users post and broadcast....
Full Text Available ... videos from Veterans Health Administration Veterans Crisis Line -- After the Call see more videos from Veterans Health ... videos from Veterans Health Administration Talking About It Matters see more videos from Veterans Health Administration Stand ...
Full Text Available Image recognition is a technology which can be used in various applications such as medical image recognition systems, security, defense video tracking, and factory automation. In this paper we present a novel pipelined architecture of an adaptive integrated Artificial Neural Network for image recognition. In our proposed work we have combined the feature of spiking neuron concept with ANN to achieve the efficient architecture for image recognition. The set of training images are trained by ANN and target output has been identified. Real time videos are captured and then converted into frames for testing purpose and the image were recognized. The machine can operate at up to 40 frames/sec using images acquired from the camera. The system has been implemented on XC3S400 SPARTAN-3 Field Programmable Gate Arrays.
This international bestseller and essential reference is the "bible" for digital video engineers and programmers worldwide. This is by far the most informative analog and digital video reference available, includes the hottest new trends and cutting-edge developments in the field. Video Demystified, Fourth Edition is a "one stop" reference guide for the various digital video technologies. The fourth edition is completely updated with all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video (Video over DSL, Ethernet, etc.), as well as discussions of the latest standards throughout. The accompanying CD-ROM is updated to include a unique set of video test files in the newest formats. *This essential reference is the "bible" for digital video engineers and programmers worldwide *Contains all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video *Completely revised with all the latest and most up-to-date industry standards.
Habibian, A.; Snoek, C.G.M.
Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is
Länsitie, Janne; Stevenson, Blair; Männistö, Riku; Karjalainen, Tommi; Karjalainen, Asko
The short film is an introduction to the concept of video pedagogy. The five categories of video pedagogy further elaborate how videos can be used as a part of instruction and learning process. Most pedagogical videos represent more than one category. A video itself doesn’t necessarily define the category – the ways in which the video is used as a part of pedagogical script are more defining factors. What five categories did you find? Did you agree with the categories, or are more...
Full Text Available Abstract A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available Video surveillance plays a vital role in maintaining the social security although, until now, large uncertainty still exists in danger understanding and recognition, which can be partly attributed to intractable environment changes in the backgrounds. This article presents a brain-inspired computing of attention value of surrounding environment changes (EC with a processes-based cognition model by introducing a ratio value λ of EC-implications within considered periods. Theoretical models for computation of warning level of EC-implications to the universal video recognition efficiency (quantified as time cost of implication-ratio variations from λk to λk+1, k=1,2,… are further established. Imbedding proposed models into the online algorithms is suggested as a future research priority towards precision security for critical applications and, furthermore, schemes for a practical implementation of such integration are also preliminarily discussed.
Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... Your Arthritis Managing Chronic Pain and Depression in Arthritis Nutrition & Rheumatoid Arthritis Arthritis and Health-related Quality of Life ...
Mark J. P. Wolf
Full Text Available This essay asks how religion and theological ideas might be made manifest in video games, and particularly the creation of video games as a religious activity, looking at contemplative experiences in video games, and the creation and world-building of game worlds as a form of Tolkienian subcreation, which itself leads to contemplation regarding the creation of worlds.
Full Text Available ... Nonmotor Symptoms of Parkinson's Disease Expert Briefings: Gait, Balance and Falls in Parkinson's Disease Expert Briefings: Coping ... Library is an extensive collection of books, fact sheets, videos, podcasts, and more. To get started, use ...
Full Text Available ... Facts What is acoustic neuroma? Diagnosing Symptoms Side Effects Keywords World Language Videos Questions to ask Choosing ... Surgery What is acoustic neuroma Diagnosing Symptoms Side effects Question To Ask Treatment Options Back Overview Observation ...
Full Text Available ... 8211 info@ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese (Simplified) Chinese ( ...
Full Text Available ... Care Disease Types FAQ Handout for Patients and Families Is It Right for You How to Get ... For the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos ...
Full Text Available ... patient kit Treatment Options Overview Observation Radiation Surgery What is acoustic neuroma Diagnosing ... Back Community Patient Stories Share Your Story Video Stories Caregivers Milestones Gallery Submit Your Milestone Team ANA Volunteer ...
Full Text Available ... Support Groups Is a support group for me? Find a Group Upcoming Events Video Library Photo Gallery ... Support ANetwork Peer Support Program Community Connections Overview Find a Meeting Host a Meeting Volunteer Become a ...
Full Text Available ... Mission, Vision & Values Shop ANA Leadership & Staff Annual Reports Acoustic Neuroma Association 600 Peachtree Parkway Suite 108 ... About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English ...
Full Text Available ... Search Search What Is It Definition Pediatric Palliative Care Disease Types FAQ Handout for Patients and Families ... For Family Caregivers Glossary Resources Browse our palliative care resources below: Links Videos Podcasts Webinars For the ...
Full Text Available ... About ANA Mission, Vision & Values Shop ANA Leadership & Staff Annual Reports Acoustic Neuroma Association 600 Peachtree Parkway ... ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video ...
Full Text Available ... info@ANAUSA.org About ANA Mission, Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese (Simplified) Chinese ( ...
Full Text Available ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts ... to your Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources ...
Full Text Available ... Click to learn more... LOGIN CALENDAR DONATE NEWS Home Learn Back Learn about acoustic neuroma AN Facts ... Vision & Values Leadership & Staff Annual Reports Shop ANA Home Learn Educational Video English English Arabic Catalan Chinese ( ...
Full Text Available ... the Media For Clinicians For Policymakers For Family Caregivers Glossary Menu In this section Links Videos Podcasts ... the Media For Clinicians For Policymakers For Family Caregivers Glossary Resources Browse our palliative care resources below: ...
Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.
Full Text Available Recognizing human activities from video sequences or still images is a challenging task due to problems such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance. Many applications, including video surveillance systems, human-computer interaction, and robotics for human behavior characterization, require a multiple activity recognition system. In this work, we provide a detailed review of recent and state-of-the-art research advances in the field of human activity classification. We propose a categorization of human activity methodologies and discuss their advantages and limitations. In particular, we divide human activity classification methods into two large categories according to whether they use data from different modalities or not. Then, each of these categories is further analyzed into sub-categories, which reflect how they model human activities and what type of activities they are interested in. Moreover, we provide a comprehensive analysis of the existing, publicly available human activity classification datasets and examine the requirements for an ideal human activity recognition dataset. Finally, we report the characteristics of future research directions and present some open issues on human activity recognition.
Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G. M.
This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call VideoStory, the correlati...
Habibian, A.; Mensink, T.; Snoek, C.G.M.
This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlatio...
Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John
Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.
Mounîm A. El-Yacoubi
Full Text Available We present an autonomous assistive robotic system for human activity recognition from video sequences. Due to the large variability inherent to video capture from a non-fixed robot (as opposed to a fixed camera, as well as the robot's limited computing resources, implementation has been guided by robustness to this variability and by memory and computing speed efficiency. To accommodate motion speed variability across users, we encode motion using dense interest point trajectories. Our recognition model harnesses the dense interest point bag-of-words representation through an intersection kernel-based SVM that better accommodates the large intra-class variability stemming from a robot operating in different locations and conditions. To contextually assess the engine as implemented in the robot, we compare it with the most recent approaches of human action recognition performed on public datasets (non-robot-based, including a novel approach of our own that is based on a two-layer SVM-hidden conditional random field sequential recognition model. The latter's performance is among the best within the recent state of the art. We show that our robot-based recognition engine, while less accurate than the sequential model, nonetheless shows good performances, especially given the adverse test conditions of the robot, relative to those of a fixed camera.
刘明; 张裕鼎; 张立春
以某中学一年级221名学生为被试，考察了不同类型及不同声压水平的背景音乐对中学生说明文文本信息再认过程的影响。实验结果表明：（1）音乐类型主效应显著，声压水平主效应不显著，两者交互作用显著，高音条件下不同音乐类型产生显著的再认成绩差异，而低音条件下未产生显著性差异。（2）高音条件下，不同类型的背景音乐对中学生的说明文文本信息再认成绩产生不同程度的影响。与无音乐环境相比，古典音乐对成绩产生显著的促进作用；流行音乐含中文歌词、流行音乐含日语歌词两个水平均产生显著的干扰作用；流行音乐不含歌词对再认无显著影响；此外，流行音乐含中文歌词干扰作用最大，但与流行音乐含日文歌词相比差异不显著。%We investigate the effects of different types of background music on recognition of the expository text information. The 221 participants are from 5 classes of Grade 1 of one middle school, with the likely same level of the Chinese course. The results of the experiments are as follows: (1) The main effect of the type of background music is significant while that of sound pressure level is not, and the interactive effect is significant: on the condition of high sound pressure level, the type of background music significant effect on the recognition score, but it has no effect on condition of the low sound pressure level. (2) On the condition of high sound pressure level, different types of background music take significantly different effect on the recognition score of the expository text information: comparing with no sound level, the classic music take significant facilitation effect, both the pop music with Chinese lyric and with Japanese lyric produce significant interference effect, and the pop music without lyric produces no effect. Besides, the interference effect taken by the pop music with Chinese is the
Moezzi, Saied; Katkere, Arun L.; Jain, Ramesh C.
Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. That is accomplished by assimilating important information from each video stream into a comprehensive, dynamic, 3D model of the environment. Using this 3D digital video, interactive viewers can then move around the remote environment and observe the events taking place from any desired perspective. Our Immersive Video System currently provides interactive viewing and `walkthrus' of staged karate demonstrations, basketball games, dance performances, and typical campus scenes. In its full realization, Immersive Video will be a paradigm shift in visual communication which will revolutionize television and video media, and become an integral part of future telepresence and virtual reality systems.
Trybula, Walter J.
Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
This is the first text to provide a unified and self-contained introduction to visual pattern recognition and machine learning. It is useful as a general introduction to artifical intelligence and knowledge engineering, and no previous knowledge of pattern recognition or machine learning is necessary. Basic for various pattern recognition and machine learning methods. Translated from Japanese, the book also features chapter exercises, keywords, and summaries.
This thesis is based on a detailed analysis of various topics related to the question of whether video games can be art. In the first place it analyzes the current academic discussion on this subject and confronts different opinions of both supporters and objectors of the idea, that video games can be a full-fledged art form. The second point of this paper is to analyze the properties, that are inherent to video games, in order to find the reason, why cultural elite considers video games as i...
Wu, Zhe; Singh, Bharat; Davis, Larry S.; Subrahmanian, V. S.
We present a system for covert automated deception detection in real-life courtroom trial videos. We study the importance of different modalities like vision, audio and text for this task. On the vision side, our system uses classifiers trained on low level video features which predict human micro-expressions. We show that predictions of high-level micro-expressions can be used as features for deception prediction. Surprisingly, IDT (Improved Dense Trajectory) features which have been widely ...
Full Text Available This article deals with a recognition system using an algorithm based on the Principal Component Analysis (PCA technique. The recognition system consists only of a PC and an integrated video camera. The algorithm is developed in MATLAB language and calculates the eigenfaces considered as features of the face. The PCA technique is based on the matching between the facial test image and the training prototype vectors. The mathcing score between the facial test image and the training prototype vectors is calculated between their coefficient vectors. If the matching is high, we have the best recognition. The results of the algorithm based on the PCA technique are very good, even if the person looks from one side at the video camera.
Maria J. Santofimia
Full Text Available Smart Spaces, Ambient Intelligence, and Ambient Assisted Living are environmental paradigms that strongly depend on their capability to recognize human actions. While most solutions rest on sensor value interpretations and video analysis applications, few have realized the importance of incorporating common-sense capabilities to support the recognition process. Unfortunately, human action recognition cannot be successfully accomplished by only analyzing body postures. On the contrary, this task should be supported by profound knowledge of human agency nature and its tight connection to the reasons and motivations that explain it. The combination of this knowledge and the knowledge about how the world works is essential for recognizing and understanding human actions without committing common-senseless mistakes. This work demonstrates the impact that episodic reasoning has in improving the accuracy of a computer vision system for human action recognition. This work also presents formalization, implementation, and evaluation details of the knowledge model that supports the episodic reasoning.
Full Text Available Face recognition systems are now being used in many applications such as border crossings, banks, and mobile payments. The wide scale deployment of facial recognition systems has attracted intensive attention to the reliability of face biometrics against spoof attacks, where a photo, a video, or a 3D mask of a genuine user’s face can be used to gain illegitimate access to facilities or services. Though several face antispoofing or liveness detection methods (which determine at the time of capture whether a face is live or spoof have been proposed, the issue is still unsolved due to difficulty in finding discriminative and computationally inexpensive features and methods for spoof attacks. In addition, existing techniques use whole face image or complete video for liveness detection. However, often certain face regions (video frames are redundant or correspond to the clutter in the image (video, thus leading generally to low performances. Therefore, we propose seven novel methods to find discriminative image patches, which we define as regions that are salient, instrumental, and class-specific. Four well-known classifiers, namely, support vector machine (SVM, Naive-Bayes, Quadratic Discriminant Analysis (QDA, and Ensemble, are then used to distinguish between genuine and spoof faces using a voting based scheme. Experimental analysis on two publicly available databases (Idiap REPLAY-ATTACK and CASIA-FASD shows promising results compared to existing works.
Dette kapitel har fokus på metodiske problemstillinger, der opstår i forhold til at bruge (digital) video i forbindelse med forskningskommunikation, ikke mindst online. Video har længe været benyttet i forskningen til dataindsamling og forskningskommunikation. Med digitaliseringen og internettet ...
Pattern recognition is a scientific discipline that is becoming increasingly important in the age of automation and information handling and retrieval. Patter Recognition, 2e covers the entire spectrum of pattern recognition applications, from image analysis to speech recognition and communications. This book presents cutting-edge material on neural networks, - a set of linked microprocessors that can form associations and uses pattern recognition to ""learn"" -and enhances student motivation by approaching pattern recognition from the designer's point of view. A direct result of more than 10
Krüger, Volker; Zhou, Shaohua; Chellappa, Rama
to all vision techniques that intend to extract visual information out of a low snr. image. It is exactly a strength of cognitive systems that they are able to cope with non-ideal situations. In this chapter we will present a techniques that allows to integrate visual information over time and we...
Battiato, Sebastiano; Farinella, Giovanni
Computer vision is the science and technology of making machines that see. It is concerned with the theory, design and implementation of algorithms that can automatically process visual data to recognize objects, track and recover their shape and spatial layout. The International Computer Vision Summer School - ICVSS was established in 2007 to provide both an objective and clear overview and an in-depth analysis of the state-of-the-art research in Computer Vision. The courses are delivered by world renowned experts in the field, from both academia and industry, and cover both theoretical and practical aspects of real Computer Vision problems. The school is organized every year by University of Cambridge (Computer Vision and Robotics Group) and University of Catania (Image Processing Lab). Different topics are covered each year.This edited volume contains a selection of articles covering some of the talks and tutorials held during the last editions of the school. The chapters provide an in-depth overview o...
Yang, Su; Zhu, Qing
The goal of sign language recognition (SLR) is to translate the sign language into text, and provide a convenient tool for the communication between the deaf-mute and the ordinary. In this paper, we formulate an appropriate model based on convolutional neural network (CNN) combined with Long Short-Term Memory (LSTM) network, in order to accomplish the continuous recognition work. With the strong ability of CNN, the information of pictures captured from Chinese sign language (CSL) videos can be learned and transformed into vector. Since the video can be regarded as an ordered sequence of frames, LSTM model is employed to connect with the fully-connected layer of CNN. As a recurrent neural network (RNN), it is suitable for sequence learning tasks with the capability of recognizing patterns defined by temporal distance. Compared with traditional RNN, LSTM has performed better on storing and accessing information. We evaluate this method on our self-built dataset including 40 daily vocabularies. The experimental results show that the recognition method with CNN-LSTM can achieve a high recognition rate with small training sets, which will meet the needs of real-time SLR system.
Nortvig, Anne Mette; Sørensen, Birgitte Holm
This project’s aim was to support and facilitate master’s students’ preparation and collaboration by making video podcasts of short lectures available on YouTube prior to students’ first face-to-face seminar. The empirical material stems from group interviews, from statistical data created through...... YouTube analytics and from surveys answered by students after the seminar. The project sought to explore how video podcasts support learning and reflection online and how students use and reflect on the integration of online activities in the videos. Findings showed that students engaged actively...
Funk, Jeanne B
The video game industry insists that it is doing everything possible to provide information about the content of games so that parents can make informed choices; however, surveys indicate that ratings may not reflect consumer views of the nature of the content. This article describes some of the currently popular video games, as well as developments that are on the horizon, and discusses the status of research on the positive and negative impacts of playing video games. Recommendations are made to help parents ensure that children play games that are consistent with their values.
Full Text Available ... v/K5u3sb-Dbkc Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more videos from Veterans ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/v/ ... the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Full Text Available ... more videos from Veterans Health Administration Lost: The Power of One Connection see more videos from Veterans Health Administration The Power of 1 PSA see more videos from Veterans ...
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/ ... Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Full Text Available ... Videos Experiencing Celiac Disease What is Celiac Disease Diet Information At ... Us Celiac Disease Program | Videos Boston Children's Hospital will teach you and your family about a ...
DeWitt, Iain D. J.
Although spoken word recognition is more fundamental to human communication than text recognition, knowledge of word-processing in auditory cortex is comparatively impoverished. This dissertation synthesizes current models of auditory cortex, models of cortical pattern recognition, models of single-word reading, results in phonetics and results in…
The Video Comparator is a comparative gage that uses electronic images from two sources, a standard and an unknown. Two matched video cameras are used to obtain the electronic images. The video signals are mixed and displayed on a single video receiver (CRT). The video system is manufactured by ITP of Chatsworth, CA and is a Tele-Microscope II, Model 148. One of the cameras is mounted on a toolmaker's microscope stand and produces a 250X image of a cast. The other camera is mounted on a stand and produces an image of a 250X template. The two video images are mixed in a control box provided by ITP and displayed on a CRT. The template or the cast can be moved to align the desired features. Vertical reference lines are provided on the CRT, and a feature on the cast can be aligned with a line on the CRT screen. The stage containing the casts can be moved using a Boeckleler micrometer equipped with a digital readout, and a second feature aligned with the reference line and the distance moved obtained from the digital display
Full Text Available Recognizing human actions in videos is an active topic with broad commercial potentials. Most of the existing action recognition methods are supposed to have the same camera view during both training and testing. And thus performances of these single-view approaches may be severely influenced by the camera movement and variation of viewpoints. In this paper, we address the above problem by utilizing videos simultaneously recorded from multiple views. To this end, we propose a learning framework based on multitask random forest to exploit a discriminative mid-level representation for videos from multiple cameras. In the first step, subvolumes of continuous human-centered figures are extracted from original videos. In the next step, spatiotemporal cuboids sampled from these subvolumes are characterized by multiple low-level descriptors. Then a set of multitask random forests are built upon multiview cuboids sampled at adjacent positions and construct an integrated mid-level representation for multiview subvolumes of one action. Finally, a random forest classifier is employed to predict the action category in terms of the learned representation. Experiments conducted on the multiview IXMAS action dataset illustrate that the proposed method can effectively recognize human actions depicted in multiview videos.
Full Text Available This paper presents a method for recognizing human actions from a single query action video. We propose an action recognition scheme based on the ordinal measure of accumulated motion, which is robust to variations of appearances. To this end, we first define the accumulated motion image (AMI using image differences. Then the AMI of the query action video is resized to a subimage by intensity averaging and a rank matrix is generated by ordering the sample values in the sub-image. By computing the distances from the rank matrix of the query action video to the rank matrices of all local windows in the target video, local windows close to the query action are detected as candidates. To find the best match among the candidates, their energy histograms, which are obtained by projecting AMI values in horizontal and vertical directions, respectively, are compared with those of the query action video. The proposed method does not require any preprocessing task such as learning and segmentation. To justify the efficiency and robustness of our approach, the experiments are conducted on various datasets.
Denikin Anton A.
Full Text Available The article considers the aesthetical and practical possibilities for sounds (sound design in video games and interactive applications. Outlines the key features of the game sound, such as simulation, representativeness, interactivity, immersion, randomization, and audio-visuality. The author defines the basic terminology in study of game audio, as well as identifies significant aesthetic differences between film sounds and sounds in video game projects. It is an attempt to determine the techniques of art analysis for the approaches in study of video games including aesthetics of their sounds. The article offers a range of research methods, considering the video game scoring as a contemporary creative practice.
Full Text Available In the past few years there has been an explosion in the use of digital video data. Many people have personal computers at home, and with the help of the Internet users can easily share video files on their computer. This makes possible the unauthorized use of digital media, and without adequate protection systems the authors and distributors have no means to prevent it.Digital watermarking techniques can help these systems to be more effective by embedding secret data right into the video stream. This makes minor changes in the frames of the video, but these changes are almost imperceptible to the human visual system. The embedded information can involve copyright data, access control etc. A robust watermark is resistant to various distortions of the video, so it cannot be removed without affecting the quality of the host medium. In this paper I propose a video watermarking scheme that fulfills the requirements of a robust watermark.
Full Text Available ... Patients from Johns Hopkins Stategies to Increase your Level of Physical Activity Role of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic Arthritis 101 2010 E.S.C.A.P.E. Study Patient Update Transitioning the JRA ...
Full Text Available ... Amblyopia Listen NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract Convergence ... is maintained by the NEI Office of Science Communications, Public Liaison, and Education. Technical questions about this ...
.... Experiments have been completed comparing the effects of several types of facial motion on face recognition, the effects of face familiarity on recognition from video clips taken at a distance...
Full Text Available Previous research has been inconsistent on whether violent video games exert positive and/or negative effects on cognition. In particular, attentional bias in facial affect processing after violent video game exposure continues to be controversial. The aim of the present study was to investigate attentional bias in facial recognition after short term exposure to violent video games and to characterize the neural correlates of this effect. In order to accomplish this, participants were exposed to either neutral or violent video games for 25 min and then event-related potentials (ERPs were recorded during two emotional search tasks. The first search task assessed attentional facilitation, in which participants were required to identify an emotional face from a crowd of neutral faces. In contrast, the second task measured disengagement, in which participants were required to identify a neutral face from a crowd of emotional faces. Our results found a significant presence of the ERP component, N2pc, during the facilitation task; however, no differences were observed between the two video game groups. This finding does not support a link between attentional facilitation and violent video game exposure. Comparatively, during the disengagement task, N2pc responses were not observed when participants viewed happy faces following violent video game exposure; however, a weak N2pc response was observed after neutral video game exposure. These results provided only inconsistent support for the disengagement hypothesis, suggesting that participants found it difficult to separate a neutral face from a crowd of emotional faces.
Full Text Available ... Is Initiated After Diagnosis? CareMAP: When Is It Time to Get Help? Unconditional Love CareMAP: Rest and Sleep: ... CareMAP: Mealtime and Swallowing: Part 1 ... of books, fact sheets, videos, podcasts, and more. To get started, use the search feature or check ...
Full Text Available Video processing source code for algorithms and tools used in software media pipelines (e.g. image scalers, colour converters, etc.) The currently available source code is written in C++ with their associated libraries and DirectShow- Filters....
Full Text Available ... Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary Sign Up for ... Us Provider Directory What Is Palliative Care Definition Disease Types ...
Full Text Available ... 30041 770-205-8211 info@ANAUSA.org The world’s #1 acoustic neuroma resource Click to learn more... ... is acoustic neuroma? Diagnosing Symptoms Side Effects Keywords World Language Videos Questions to ask Choosing a healthcare ...
George Nauck of ENCORE!!! invented and markets the Advanced Range Performance (ARPM) Video Golf System for measuring the result of a golf swing. After Nauck requested their assistance, Marshall Space Flight Center scientists suggested video and image processing/computing technology, and provided leads on commercial companies that dealt with the pertinent technologies. Nauck contracted with Applied Research Inc. to develop a prototype. The system employs an elevated camera, which sits behind the tee and follows the flight of the ball down range, catching the point of impact and subsequent roll. Instant replay of the video on a PC monitor at the tee allows measurement of the carry and roll. The unit measures distance and deviation from the target line, as well as distance from the target when one is selected. The information serves as an immediate basis for making adjustments or as a record of skill level progress for golfers.
Full Text Available Vehicle behavior recognition is an attractive research field which is useful for many computer vision and intelligent traffic analysis tasks. This paper presents an all-in-one behavior recognition framework for moving vehicles based on the latest deep learning techniques. Unlike traditional traffic analysis methods which rely on low-resolution videos captured by road cameras, we capture 4K ( 3840 × 2178 traffic videos at a busy road intersection of a modern megacity by flying a unmanned aerial vehicle (UAV during the rush hours. We then manually annotate locations and types of road vehicles. The proposed method consists of the following three steps: (1 vehicle detection and type recognition based on deep neural networks; (2 vehicle tracking by data association and vehicle trajectory modeling; (3 vehicle behavior recognition by nearest neighbor search and by bidirectional long short-term memory network, respectively. This paper also presents experimental results of the proposed framework in comparison with state-of-the-art approaches on the 4K testing traffic video, which demonstrated the effectiveness and superiority of the proposed method.
Full Text Available Video captioning refers to the task of generating a natural language sentence that explains the content of the input video clips. This study proposes a deep neural network model for effective video captioning. Apart from visual features, the proposed model learns additionally semantic features that describe the video content effectively. In our model, visual features of the input video are extracted using convolutional neural networks such as C3D and ResNet, while semantic features are obtained using recurrent neural networks such as LSTM. In addition, our model includes an attention-based caption generation network to generate the correct natural language captions based on the multimodal video feature sequences. Various experiments, conducted with the two large benchmark datasets, Microsoft Video Description (MSVD and Microsoft Research Video-to-Text (MSR-VTT, demonstrate the performance of the proposed model.
Full Text Available Currently most ophthalmic operating rooms are equipped with an analog video recording system [analog Charge Couple Device camera for video grabbing and a Video Cassette Recorder for recording]. We discuss the various advantages of a digital video capture device, its archiving capabilities and our experience during the transition from analog to digital video recording and archiving. The basic terminology and concepts related to analog and digital video, along with the choice of hardware, software and formats for archiving are discussed.
subscriptions Videos Photos Live Webcast Social Media Facebook @oasofficial Facebook Twitter @oas_official Audios Photos Social Media Facebook Twitter Newsletters Press and Communications Department Contact us at Rights Actions against Corruption C Children Civil Registry Civil Society Contact Us Culture Cyber
Barbosa, Patrícia Margarida Silva de Castro Neves
Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores Human activity recognition algorithms have been studied actively from decades using a sequence of 2D and 3D images from a video surveillance. This new surveillance solutions and the areas of image processing and analysis have been receiving special attention and interest from the scientific community. Thus, it became possible to witness the appearance of new video compression techniques, the tr...
Full Text Available ... the special health problems and requirements of the blind.” News & Events Events Calendar NEI Press Releases News ... Videos Home Age-Related Macular Degeneration Amblyopia Animations Blindness Cataract Convergence Insufficiency Diabetic Eye Disease Dilated Eye ...
Full Text Available ... and what other conditions are associated with RA. Learning more about your condition will allow you to ... Arthritis Educational Video Series Psoriatic Arthritis 101 2010 E.S.C.A.P.E. Study Patient Update Transitioning ...
Full Text Available ... Diabetic Eye Disease Education Program Glaucoma Education Program Low Vision Education Program Hispanic/Latino Program Vision and Aging ... Kids Glaucoma Healthy Vision Tips Leber Congenital Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos ...
Full Text Available ... and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision ... DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> ...
Full Text Available ... to take a more active role in your care. The information in these videos should not take ... She is a critical member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing ...
Full Text Available ... a more active role in your care. The information in these videos should not take the place of any advice you ... Management for Rheumatoid Arthritis Patients Rehabilitation of Older Adult ...
Full Text Available This paper presents a new approach for fall detection from partially-observed depth-map video sequences. The proposed approach utilizes the 3D skeletal joint positions obtained from the Microsoft Kinect sensor to build a view-invariant descriptor for human activity representation, called the motion-pose geometric descriptor (MPGD. Furthermore, we have developed a histogram-based representation (HBR based on the MPGD to construct a length-independent representation of the observed video subsequences. Using the constructed HBR, we formulate the fall detection problem as a posterior-maximization problem in which the posteriori probability for each observed video subsequence is estimated using a multi-class SVM (support vector machine classifier. Then, we combine the computed posteriori probabilities from all of the observed subsequences to obtain an overall class posteriori probability of the entire partially-observed depth-map video sequence. To evaluate the performance of the proposed approach, we have utilized the Kinect sensor to record a dataset of depth-map video sequences that simulates four fall-related activities of elderly people, including: walking, sitting, falling form standing and falling from sitting. Then, using the collected dataset, we have developed three evaluation scenarios based on the number of unobserved video subsequences in the testing videos, including: fully-observed video sequence scenario, single unobserved video subsequence of random lengths scenarios and two unobserved video subsequences of random lengths scenarios. Experimental results show that the proposed approach achieved an average recognition accuracy of 93 . 6 % , 77 . 6 % and 65 . 1 % , in recognizing the activities during the first, second and third evaluation scenario, respectively. These results demonstrate the feasibility of the proposed approach to detect falls from partially-observed videos.
In addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations. The review was obtained from PubMed and Google Scholar search on combinations of the following keywords: 'surgery', 'video', 'phase', 'task', 'skills', 'event', 'shot', 'analysis', 'retrieval', 'detection', 'classification', and 'recognition'. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation. A total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection. Content-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
Full Text Available Multimodal signal analysis based on sophisticated sensors, efficient communicationsystems and fast parallel processing methods has a rapidly increasing range of multidisciplinaryapplications. The present paper is devoted to pattern recognition, machine learning, and the analysisof sleep stages in the detection of sleep disorders using polysomnography (PSG data, includingelectroencephalography (EEG, breathing (Flow, and electro-oculogram (EOG signals. The proposedmethod is based on the classification of selected features by a neural network system with sigmoidaland softmax transfer functions using Bayesian methods for the evaluation of the probabilities of theseparate classes. The application is devoted to the analysis of the sleep stages of 184 individualswith different diagnoses, using EEG and further PSG signals. Data analysis points to an averageincrease of the length of the Wake stage by 2.7% per 10 years and a decrease of the length of theRapid Eye Movement (REM stages by 0.8% per 10 years. The mean classification accuracy for givensets of records and single EEG and multimodal features is 88.7% ( standard deviation, STD: 2.1 and89.6% (STD:1.9, respectively. The proposed methods enable the use of adaptive learning processesfor the detection and classification of health disorders based on prior specialist experience andman–machine interaction.
Full Text Available ... Health Growth & Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos ...
Full Text Available ... Parents site Sitio para padres General Health Growth & Development Infections Diseases ... Special Needs: Planning for Adulthood (Video) KidsHealth / For Parents / Special Needs: ...
Full Text Available ... Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos Recipes ...
Full Text Available Video surveillance system senses and trails out all the threatening issues in the real time environment. It prevents from security threats with the help of visual devices which gather the information related to videos like CCTV’S and IP (Internet Protocol cameras. Video surveillance system has become a key for addressing problems in the public security. They are mostly deployed on the IP based network. So, all the possible security threats exist in the IP based application might also be the threats available for the reliable application which is available for video surveillance. In result, it may increase cybercrime, illegal video access, mishandling videos and so on. Hence, in this paper an intelligent model is used to propose security for video surveillance system which ensures safety and it provides secured access on video.
Full Text Available This paper presents about the transmission of Digital Video Broadcasting system with streaming video resolution 640x480 on different IQ rate and modulation. In the video transmission, distortion often occurs, so the received video has bad quality. Key frames selection algorithm is flexibel on a change of video, but on these methods, the temporal information of a video sequence is omitted. To minimize distortion between the original video and received video, we aimed at adding methodology using sequential distortion minimization algorithm. Its aim was to create a new video, better than original video without significant loss of content between the original video and received video, fixed sequentially. The reliability of video transmission was observed based on a constellation diagram, with the best result on IQ rate 2 Mhz and modulation 8 QAM. The best video transmission was also investigated using SEDIM (Sequential Distortion Minimization Method and without SEDIM. The experimental result showed that the PSNR (Peak Signal to Noise Ratio average of video transmission using SEDIM was an increase from 19,855 dB to 48,386 dB and SSIM (Structural Similarity average increase 10,49%. The experimental results and comparison of proposed method obtained a good performance. USRP board was used as RF front-end on 2,2 GHz.
Habibian, Amirhossein; Mensink, Thomas; Snoek, Cees G M
This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.
Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor
Understanding behaviors is the core of video content analysis, which is highly related to two important applications: abnormal event detection and action recognition. Dictionary learning, as one of the mid-level representations, is an important step to process a video. It has achieved state...
Habibian, A.; Mensink, T.; Snoek, C.G.M.
This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire
Tavormina, Maurilio Giuseppe Maria; Tavormina, Romina
The frequent and protracted use of video games with serious personal, family and social consequences is no longer just a pleasant pastime and could lead to mental and physical health problems. Although there is no official recognition of video game addiction on the Internet as a mild mental health disorder, further scientific research is needed.
Full Text Available in conjunction with the technical aspects of video display in browsers, when varying media formats are used. The <video> tag used in this work renders videos from two sources with different MIME types. Feeds from the video sources, namely YouTube and UCT...
Monwar, Md Maruf; Rezaei, Siamak
Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.
Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue
of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...
Full Text Available The aim of this paper is to present video quality prediction models for objective non-intrusive, prediction of H.264 encoded video for all content types combining parameters both in the physical and application layer over Universal Mobile Telecommunication Systems (UMTS networks. In order to characterize the Quality of Service (QoS level, a learning model based on Adaptive Neural Fuzzy Inference System (ANFIS and a second model based on non-linear regression analysis is proposed to predict the video quality in terms of the Mean Opinion Score (MOS. The objective of the paper is two-fold. First, to find the impact of QoS parameters on end-to-end video quality for H.264 encoded video. Second, to develop learning models based on ANFIS and non-linear regression analysis to predict video quality over UMTS networks by considering the impact of radio link loss models. The loss models considered are 2-state Markov models. Both the models are trained with a combination of physical and application layer parameters and validated with unseen dataset. Preliminary results show that good prediction accuracy was obtained from both the models. The work should help in the development of a reference-free video prediction model and QoS control methods for video over UMTS networks.
Full Text Available The highly efficient and robust stitching of aerial video captured by unmanned aerial vehicles (UAVs is a challenging problem in the field of robot vision. Existing commercial image stitching systems have seen success with offline stitching tasks, but they cannot guarantee high-speed performance when dealing with online aerial video sequences. In this paper, we present a novel system which has an unique ability to stitch high-frame rate aerial video at a speed of 150 frames per second (FPS. In addition, rather than using a high-speed vision platform such as FPGA or CUDA, our system is running on a normal personal computer. To achieve this, after the careful comparison of the existing invariant features, we choose the FAST corner and binary descriptor for efficient feature extraction and representation, and present a spatial and temporal coherent filter to fuse the UAV motion information into the feature matching. The proposed filter can remove the majority of feature correspondence outliers and significantly increase the speed of robust feature matching by up to 20 times. To achieve a balance between robustness and efficiency, a dynamic key frame-based stitching framework is used to reduce the accumulation errors. Extensive experiments on challenging UAV datasets demonstrate that our approach can break through the speed limitation and generate an accurate stitching image for aerial video stitching tasks.
Full Text Available Video is a popular and a motivating potential medium in schools. Using video in the language classroom helps the language teachers in many different ways. Video, for instance, brings the outside world into the language classroom, providing the class with many different topics and reasons to talk about. It can provide comprehensible input to the learners through contextualised models of language use. It also offers good opportunities to introduce native English speech into the language classroom. Through this article I will try to show what the benefits of using video are and, at the end, I present an instrument to select and classify video materials.
Clintin P. Davis-Stober
Full Text Available The Recognition Heuristic (Gigerenzer and Goldstein, 1996; Goldstein and Gigerenzer, 2002 makes the counter-intuitive prediction that a decision maker utilizing less information may do as well as, or outperform, an idealized decision maker utilizing more information. We lay a theoretical foundation for the use of single-variable heuristics such as the Recognition Heuristic as an optimal decision strategy within a linear modeling framework. We identify conditions under which over-weighting a single predictor is a mini-max strategy among a class of a priori chosen weights based on decision heuristics with respect to a measure of statistical lack of fit we call ``risk''. These strategies, in turn, outperform standard multiple regression as long as the amount of data available is limited. We also show that, under related conditions, weighting only one variable and ignoring all others produces the same risk as ignoring the single variable and weighting all others. This approach has the advantage of generalizing beyond the original environment of the Recognition Heuristic to situations with more than two choice options, binary or continuous representations of recognition, and to other single variable heuristics. We analyze the structure of data used in some prior recognition tasks and find that it matches the sufficient conditions for optimality in our results. Rather than being a poor or adequate substitute for a compensatory model, the Recognition Heuristic closely approximates an optimal strategy when a decision maker has finite data about the world.
Maslow, Katie; Mezey, Mathy
Many hospital patients with dementia have no documented dementia diagnosis. In some cases, this is because they have never been diagnosed. Recognition of Dementia in Hospitalized Older Adults proposes several approaches that hospital nurses can use to increase recognition of dementia. This article describes the Try This approaches, how to implement them, and how to incorporate them into a hospital's current admission procedures. For a free online video demonstrating the use of these approaches, go to http://links.lww.com/A216.
de Jong, Franciska M.G.; Gauvain, Jean-Luc; den Hartog, Jurgen; den Hartog, Jeremy; Netter, Klaus
This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the
VanDeventer, Stephanie S.; White, James A.
Investigates the display of expert behavior by seven outstanding video game-playing children ages 10 and 11. Analyzes observation and debriefing transcripts for evidence of self-monitoring, pattern recognition, principled decision making, qualitative thinking, and superior memory, and discusses implications for educators regarding the development…
Abstract Retrieval video has been used to search a video based on the query entered by user which were text and image. This system could increase the searching ability on video browsing and expected to reduce the video’s retrieval time. The research purposes were designing and creating a software application of retrieval video based on the text and image on the video. The index process for the text is tokenizing, filtering (stopword, stemming. The results of stemming to saved in the text index table. Index process for the image is to create an image color histogram and compute the mean and standard deviation at each primary color red, green and blue (RGB of each image. The results of feature extraction is stored in the image table The process of video retrieval using the query text, images or both. To text query system to process the text query by looking at the text index tables. If there is a text query on the index table system will display information of the video according to the text query. To image query system to process the image query by finding the value of the feature extraction means red, green means, means blue, red standard deviation, standard deviation and standard deviation of blue green. If the value of the six features extracted query image on the index table image will display the video information system according to the query image. To query text and query images, the system will display the video information if the query text and query images have a relationship that is query text and query image has the same film title. Keywords— video, index, retrieval, text, image
Full Text Available According to the recognition heuristic (RH theory, decisions follow the recognition principle: Given a high validity of the recognition cue, people should prefer recognized choice options compared to unrecognized ones. Assuming that the memory strength of choice options is strongly correlated with both the choice criterion and recognition judgments, the RH is a reasonable strategy that approximates optimal decisions with a minimum of cognitive effort (Davis-Stober, Dana, and Budescu, 2010. However, theories of recognition memory are not generally compatible with this assumption. For example, some threshold models of recognition presume that recognition judgments can arise from two types of cognitive states: (1 certainty states in which judgments are almost perfectly correlated with memory strength and (2 uncertainty states in which recognition judgments reflect guessing rather than differences in memory strength. We report an experiment designed to test the prediction that the RH applies to certainty states only. Our results show that memory states rather than recognition judgments affect use of recognition information in binary decisions.
Full Text Available A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
Full Text Available Abstract A novel motion-adaptive deinterlacing algorithm with edge-pattern recognition and hybrid motion detection is introduced. The great variety of video contents makes the processing of assorted motion, edges, textures, and the combination of them very difficult with a single algorithm. The edge-pattern recognition algorithm introduced in this paper exhibits the flexibility in processing both textures and edges which need to be separately accomplished by line average and edge-based line average before. Moreover, predicting the neighboring pixels for pattern analysis and interpolation further enhances the adaptability of the edge-pattern recognition unit when motion detection is incorporated. Our hybrid motion detection features accurate detection of fast and slow motion in interlaced video and also the motion with edges. Using only three fields for detection also renders higher temporal correlation for interpolation. The better performance of our deinterlacing algorithm with higher content-adaptability and less memory cost than the state-of-the-art 4-field motion detection algorithms can be seen from the subjective and objective experimental results of the CIF and PAL video sequences.
Full Text Available The quality of the smartphone’s camera enables us to capture high quality pictures at a high resolution, so we can perform different types of recognition on these images. Face detection is one of these types of recognition that is very common in our society. We use it every day on Facebook to tag friends in our pictures. It is also used in video games alongside Kinect concept, or in security to allow the access to private places only to authorized persons. These are just some examples of using facial recognition, because in modern society, detection and facial recognition tend to surround us everywhere. The aim of this article is to create an appli-cation for smartphones that can recognize human faces. The main goal of this application is to grant access to certain areas or rooms only to certain authorized persons. For example, we can speak here of hospitals or educational institutions where there are rooms where only certain employees can enter. Of course, this type of application can cover a wide range of uses, such as helping people suffering from Alzheimer's to recognize the people they loved, to fill gaps persons who can’t remember the names of their relatives or for example to automatically capture the face of our own children when they smile.
Full Text Available This paper aims to enhance the bag of features in order to improve the accuracy of human activity recognition. In this paper, human activity recognition process consists of four stages: local space time features detection, feature description, bag of features representation, and SVMs classification. The k-means step in the bag of features is enhanced by applying three levels of clustering: clustering per video, clustering per action class, and clustering for the final code book. The experimental results show that the proposed method of enhancement reduces the time and memory requirements, and enables the use of all training data in the k-means clustering algorithm. The evaluation of accuracy of action classification on two popular datasets (KTH and Weizmann has been performed. In addition, the proposed method improves the human activity recognition accuracy by 5.57% on the KTH dataset using the same detector, descriptor, and classifier.
Lassoued , Imen; Zagrouba , Ezzedine; Chahir , Youssef
International audience; Action recognition in video and still image is one of the most challenging research topics in pattern recognition and computer vision. This paper proposes a new method for video action classification based on 3D Zernike moments. These last ones aim to capturing both structural and temporal information of a time varying sequence. The originality of this approach consists to represent actions in video sequences by a three-dimension shape obtained from different silhouett...
Since their first inception, automatic reading systems have evolved substantially, yet the recognition of handwriting remains an open research problem due to its substantial variation in appearance. With the introduction of Markovian models to the field, a promising modeling and recognition paradigm was established for automatic handwriting recognition. However, no standard procedures for building Markov model-based recognizers have yet been established. This text provides a comprehensive overview of the application of Markov models in the field of handwriting recognition, covering both hidden
Careers Â» Inclusion in the Workplace - Text Version Inclusion in the Workplace - Text Version This is the text version for the Inclusion: Leading by Example video. I'm Martin Keller. I'm the NREL of the laboratory. Another very important element in inclusion is diversity. Because if we have a
Full Text Available Video game accessibility may not seem of significance to some, and it may sound trivial to anyone who does not play video games. This assumption is false. With the digitalization of our culture, video games are an ever increasing part of our life. They contribute to peer to peer interactions, education, music and the arts. A video game can be created by hundreds of musicians and artists, and they can have production budgets that exceed modern blockbuster films. Inaccessible video games are analogous to movie theaters without closed captioning or accessible facilities. The movement to have accessible video games is small, unorganized and misdirected. Just like the other battles to make society accessible were accomplished through legislation and law, the battle for video game accessibility must be focused toward the law and not the market.
Mølgaard, Lasse Lohilahti; Jørgensen, Kasper Winther
Speaker recognition is basically divided into speaker identification and speaker verification. Verification is the task of automatically determining if a person really is the person he or she claims to be. This technology can be used as a biometric feature for verifying the identity of a person...
Full Text Available Background: Psychogenic tremor is the most common psychogenic movement disorder. It has characteristic clinical features that can help distinguish it from other tremor disorders. There is no diagnostic gold standard and the diagnosis is based primarily on clinical history and examination. Despite proposed diagnostic criteria, the diagnosis of psychogenic tremor can be challenging. While there are numerous studies evaluating psychogenic tremor in the literature, there are no publications that provide a video/visual guide that demonstrate the clinical characteristics of psychogenic tremor. Educating clinicians about psychogenic tremor will hopefully lead to earlier diagnosis and treatment. Methods: We selected videos from the database at the Parkinson's Disease Center and Movement Disorders Clinic at Baylor College of Medicine that illustrate classic findings supporting the diagnosis of psychogenic tremor.Results: We include 10 clinical vignettes with accompanying videos that highlight characteristic clinical signs of psychogenic tremor including distractibility, variability, entrainability, suggestibility, and coherence.Discussion: Psychogenic tremor should be considered in the differential diagnosis of patients presenting with tremor, particularly if it is of abrupt onset, intermittent, variable and not congruous with organic tremor. The diagnosis of psychogenic tremor, however, should not be simply based on exclusion of organic tremor, such as essential, parkinsonian, or cerebellar tremor, but on positive criteria demonstrating characteristic features. Early recognition and management are critical for good long-term outcome.
The DARPA Airborne Video Surveillance (AVS) program was established to develop and promote technologies to make airborne video more useful, providing capabilities that achieve a UAV force multiplier...
Full Text Available Multi-view action recognition has gained a great interest in video surveillance, human computer interaction, and multimedia retrieval, where multiple cameras of different types are deployed to provide a complementary field of views. Fusion of multiple camera views evidently leads to more robust decisions on both tracking multiple targets and analysing complex human activities, especially where there are occlusions. In this paper, we incorporate the marginalised stacked denoising autoencoders (mSDA algorithm to further improve the bag of words (BoWs representation in terms of robustness and usefulness for multi-view action recognition. The resulting representations are fed into three simple fusion strategies as well as a multiple kernel learning algorithm at the classification stage. Based on the internal evaluation, the codebook size of BoWs and the number of layers of mSDA may not significantly affect recognition performance. According to results on three multi-view benchmark datasets, the proposed framework improves recognition performance across all three datasets and outputs record recognition performance, beating the state-of-art algorithms in the literature. It is also capable of performing real-time action recognition at a frame rate ranging from 33 to 45, which could be further improved by using more powerful machines in future applications.
Full Text Available This article is devoted to the integration of cloud technology functions into 3D IP video surveil-lance systems in order to conduct further video Analytics, incoming real-time data, as well as stored video materials on the server in the «cloud». The main attention is devoted to «cloud technologies» usage optimizing the process of recognition of the desired object by increasing the criteria of flexibility and scalability of the system. Transferring image load from the client to the cloud server, to the virtual part of the system. The development of the issues considered in the article in terms of data analysis, which will significantly improve the effectiveness of the implementation of special tasks facing special units.
We currently live in a world filled with videos. There are videos on YouTube, feature movies and even videos recorded with our own cameras and smartphones. These videos present an excellent opportunity to not only explore physical concepts, but also inspire others to investigate physics ideas. With video analysis, we can explore the fantasy world in science-fiction films. We can also look at online videos to determine if they are genuine or fake. Video analysis can be used in the introductory physics lab and it can even be used to explore the make-believe physics embedded in video games. This book covers the basic ideas behind video analysis along with the fundamental physics principles used in video analysis. The book also includes several examples of the unique situations in which video analysis can be used.
Unattended video surveillance systems are particularly vulnerable to the substitution of false video images into the cable that connects the camera to the video recorder. New technology has made it practical to insert a solid state video memory into the video cable, freeze a video image from the camera, and hold this image as long as desired. Various techniques, such as line supervision and sync detection, have been used to detect video cable tampering. The video authentication technique described in this paper uses the actual video image from the camera as the basis for detecting any image substitution made during the transmission of the video image to the recorder. The technique, designed for unattended video systems, can be used for any video transmission system where a two-way digital data link can be established. The technique uses similar microprocessor circuitry at the video camera and at the video recorder to select sample points in the video image for comparison. The gray scale value of these points is compared at the recorder controller and if the values agree within limits, the image is authenticated. If a significantly different image was substituted, the comparison would fail at a number of points and the video image would not be authenticated. The video authentication system can run as a stand-alone system or at the request of another system
Full Text Available Lip movement of speaker is very informative for many application of speech signal processing such as multi-modal speech recognition and password authentication without speech signal. However, in collecting multi-modal speech information, we need a video camera, large amount of memory, video interface, and high speed processor to extract lip movement in real time. Such a system tends to be expensive and large. This is one reasons of preventing the use of multi-modal speech processing. In this study, we have developed a simple infrared lip movement sensor mounted on a headset, and made it possible to acquire lip movement by PDA, mobile phone, and notebook PC. The sensor consists of an infrared LED and an infrared photo transistor, and measures the lip movement by the reflected light from the mouth region. From experiment, we achieved 66% successfully word recognition rate only by lip movement features. This experimental result shows that our developed sensor can be utilized as a tool for multi-modal speech processing by combining a microphone mounted on the headset.
Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun
Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
just looking at a clock and doing it on a rhythmic basis or even just outside of the standard, you analysis accuracy is difficult. And this is essentially the quality of what you're finding and how you that process. All right. Lesson number four. Asset analysis is complicated. This is essentially knowing
tell an audience - let's say you are doing a Ted X talk. You've been invited. And these people are kind , was at a reception the same day of his graduation. I think it's 1965, it's Berkeley. He comes out, and impact to the grid. Erfan Ibrahim: Wonderful. So I'll entertain two questions for the audience, and then
applicability of artificial intelligence to search for cybersecurity gaps in our existing SKATA networks. Second primarily renewable that all back each other up; that are all highly intelligent, artificial intelligence we have in cyber security, digital technologies, artificial intelligence. We think that that would
grid integration, continuous code improvement, fuel cell vehicle operation, and renewable hydrogen Systems Integration Facility or ESIF. Research projects including H2FIRST, component testing, hydrogen
Loktev Alexey Alexeevich
Full Text Available Comprehensive distributed safety, control, and monitoring systems applied by companies and organizations of different ownership structure play a substantial role in the present-day society. Video surveillance elements that ensure image processing and decision making in automated or automatic modes are the essential components of new systems. This paper covers the modeling of video surveillance systems installed in buildings, and the algorithm, or pattern, of video camera placement with due account for nearly all characteristics of buildings, detection and recognition facilities, and cameras themselves. This algorithm will be subsequently implemented as a user application. The project contemplates a comprehensive approach to the automatic placement of cameras that take account of their mutual positioning and compatibility of tasks. The project objective is to develop the principal elements of the algorithm of recognition of a moving object to be detected by several cameras. The image obtained by different cameras will be processed. Parameters of motion are to be identified to develop a table of possible options of routes. The implementation of the recognition algorithm represents an independent research project to be covered by a different article. This project consists in the assessment of the degree of complexity of an algorithm of camera placement designated for identification of cases of inaccurate algorithm implementation, as well as in the formulation of supplementary requirements and input data by means of intercrossing sectors covered by neighbouring cameras. The project also contemplates identification of potential problems in the course of development of a physical security and monitoring system at the stage of the project design development and testing. The camera placement algorithm has been implemented as a software application that has already been pilot tested on buildings and inside premises that have irregular dimensions. The
Full Text Available ... from Veterans Health Administration The Power of 1 PSA see more videos from Veterans Health Administration Commitments PSA see more videos from Veterans Health Administration The ...
Full Text Available ... Resources Spread the Word Videos Homeless Resources Additional Information Make the Connection Get Help When To Call ... Suicide Spread the Word Videos Homeless Resources Additional Information Make the Connection Resource Locator If you or ...
Full Text Available ... videos from Veterans Health Administration Talking About It Matters see more videos from Veterans Health Administration Stand ... Health Administration I am A Veteran Family/Friend Active Duty/Reserve and Guard Signs of Crisis Identifying ...
Full Text Available The phacoemulsification surgery is one of the most advanced surgeries to treat cataract. However, the conventional surgeries are always with low automatic level of operation and over reliance on the ability of surgeons. Alternatively, one imaginative scene is to use video processing and pattern recognition technologies to automatically detect the cataract grade and intelligently control the release of the ultrasonic energy while operating. Unlike cataract grading in the diagnosis system with static images, complicated background, unexpected noise, and varied information are always introduced in dynamic videos of the surgery. Here we develop a VidEo-Based Intelligent Recognitionand Decision (VEBIRD system, which breaks new ground by providing a generic framework for automatically tracking the operation process and classifying the cataract grade in microscope videos of the phacoemulsification cataract surgery. VEBIRD comprises a robust eye (iris detector with randomized Hough transform to precisely locate the eye in the noise background, an effective probe tracker with Tracking-Learning-Detection to thereafter track the operation probe in the dynamic process, and an intelligent decider with discriminative learning to finally recognize the cataract grade in the complicated video. Experiments with a variety of real microscope videos of phacoemulsification verify VEBIRD’s effectiveness.
Sánchez Bocanegra, Carlos Luis
Rare Disease Video Portal (RD Video) is a portal web where contains videos from Youtube including all details from 12 channels of Youtube. Rare Disease Video Portal (RD Video) es un portal web que contiene los vídeos de Youtube incluyendo todos los detalles de 12 canales de Youtube. Rare Disease Video Portal (RD Video) és un portal web que conté els vídeos de Youtube i que inclou tots els detalls de 12 Canals de Youtube.
E. G. Zaytseva
Full Text Available The method of determination of the sharpness depth borders was improved for contemporary video technology. The computer programme for determination of corresponding video recording parameters was created.
Robert C. Lorenz
Full Text Available Video games contain elaborate reinforcement and reward schedules that have the potential to maximize motivation. Neuroimaging studies suggest that video games might have an influence on the reward system. However, it is not clear whether reward-related properties represent a precondition, which biases an individual towards playing video games, or if these changes are the result of playing video games. Therefore, we conducted a longitudinal study to explore reward-related functional predictors in relation to video gaming experience as well as functional changes in the brain in response to video game training.Fifty healthy participants were randomly assigned to a video game training (TG or control group (CG. Before and after training/control period, functional magnetic resonance imaging (fMRI was conducted using a non-video game related reward task.At pretest, both groups showed strongest activation in ventral striatum (VS during reward anticipation. At posttest, the TG showed very similar VS activity compared to pretest. In the CG, the VS activity was significantly attenuated.This longitudinal study revealed that video game training may preserve reward responsiveness in the ventral striatum in a retest situation over time. We suggest that video games are able to keep striatal responses to reward flexible, a mechanism which might be of critical value for applications such as therapeutic cognitive training.
Davies, Florence; Greene, Terry
This paper describes Directed Activities Related to Text (DART), procedures that were developed and are used in the Reading for Learning Project at the University of Nottingham (England) to enhance learning from texts and that fall into two broad categories: (1) text analysis procedures, which require students to engage in some form of analysis of…
Büyükbayrak, Hakan; Yanikoglu, Berrin; Erçil, Aytül
We describe a system for recognizing online, handwritten mathematical expressions. The system is designed with a user-interface for writing scientific articles, supporting the recognition of basic mathematical expressions as well as integrals, summations, matrices etc. A feed-forward neural network recognizes symbols which are assumed to be single-stroke and a recursive algorithm parses the expression by combining neural network output and the structure of the expression. Preliminary results show that writer-dependent recognition rates are very high (99.8%) while writer-independent symbol recognition rates are lower (75%). The interface associated with the proposed system integrates the built-in recognition capabilities of the Microsoft's Tablet PC API for recognizing textual input and supports conversion of hand-drawn figures into PNG format. This enables the user to enter text, mathematics and draw figures in a single interface. After recognition, all output is combined into one LATEX code and compiled into a PDF file.
Full Text Available Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs, that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences.
Full Text Available ... Answers (Q&A) Staying Safe Videos for Educators Search English Español Special Needs: Planning for Adulthood (Video) ... Nondiscrimination Visit the Nemours Web site. Note: All information on KidsHealth® is for educational purposes only. For ...
Full Text Available Based on video recordings of the movement of the patients with epilepsy, this paper proposed a human action recognition scheme to detect distinct motion patterns and to distinguish the normal status from the abnormal status of epileptic patients. The scheme first extracts local features and holistic features, which are complementary to each other. Afterwards, a support vector machine is applied to classification. Based on the experimental results, this scheme obtains a satisfactory classification result and provides a fundamental analysis towards the human-robot interaction with socially assistive robots in caring the patients with epilepsy (or other patients with brain disorders in order to protect them from injury.
Full Text Available Designing an effective and high performance network requires an accurate characterization and modeling of network traffic. The modeling of video frame sizes is normally applied in simulation studies and mathematical analysis and generating streams for testing and compliance purposes. Besides, video traffic assumed as a major source of multimedia traffic in future heterogeneous network. Therefore, the statistical distribution of video data can be used as the inputs for performance modeling of networks. The finding of this paper comprises the theoretical definition of distribution which seems to be relevant to the video trace in terms of its statistical properties and finds the best distribution using both the graphical method and the hypothesis test. The data set used in this article consists of layered video traces generating from Scalable Video Codec (SVC video compression technique of three different movies.
R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22
a prerequisite for directional code measurement (Chen & Lee 1992). The handwritten PSL ... shows the directional code and an example to illustrate the coding method for a word formed by the concatenation .... Primitive selection. 5.1 Corner ...
Full Text Available Deficits in social cognition including facial affect recognition and their detrimental effects on functional outcome are well established in schizophrenia. Structured training can have substantial effects on social cognitive measures including facial affect recognition. Elucidating training effects on cortical mechanisms involved in facial affect recognition may identify causes of dysfunctional facial affect recognition in schizophrenia and foster remediation strategies. In the present study, 57 schizophrenia patients were randomly assigned to (a computer-based facial affect training that focused on affect discrimination and working memory in 20 daily 1-hour sessions, (b similarly intense, targeted cognitive training on auditory-verbal discrimination and working memory, or (c treatment as usual. Neuromagnetic activity was measured before and after training during a dynamic facial affect recognition task (5 s videos showing human faces gradually changing from neutral to fear or to happy expressions. Effects on 10–13 Hz (alpha power during the transition from neutral to emotional expressions were assessed via MEG based on previous findings that alpha power increase is related to facial affect recognition and is smaller in schizophrenia than in healthy subjects. Targeted affect training improved overt performance on the training tasks. Moreover, alpha power increase during the dynamic facial affect recognition task was larger after affect training than after treatment-as-usual, though similar to that after targeted perceptual–cognitive training, indicating somewhat nonspecific benefits. Alpha power modulation was unrelated to general neuropsychological test performance, which improved in all groups. Results suggest that specific neural processes supporting facial affect recognition, evident in oscillatory phenomena, are modifiable. This should be considered when developing remediation strategies targeting social cognition in schizophrenia.
Asif Ali Laghari
Full Text Available Video sharing on social clouds is popular among the users around the world. High-Definition (HD videos have big file size so the storing in cloud storage and streaming of videos with high quality from cloud to the client are a big problem for service providers. Social clouds compress the videos to save storage and stream over slow networks to provide quality of service (QoS. Compression of video decreases the quality compared to original video and parameters are changed during the online play as well as after download. Degradation of video quality due to compression decreases the quality of experience (QoE level of end users. To assess the QoE of video compression, we conducted subjective (QoE experiments by uploading, sharing, and playing videos from social clouds. Three popular social clouds, Facebook, Tumblr, and Twitter, were selected to upload and play videos online for users. The QoE was recorded by using questionnaire given to users to provide their experience about the video quality they perceive. Results show that Facebook and Twitter compressed HD videos more as compared to other clouds. However, Facebook gives a better quality of compressed videos compared to Twitter. Therefore, users assigned low ratings for Twitter for online video quality compared to Tumblr that provided high-quality online play of videos with less compression.
It has recently been found that during recognition memory tests participants’ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ‘‘Pupil Old/New Effect’’ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participants’ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a varie...
Full Text Available The objective of this paper is to perform the innovation design for improving the recognition of a captured QR code image with blur through the Pillbox filter analysis. QR code images can be captured by digital video cameras. Many factors contribute to QR code decoding failure, such as the low quality of the image. Focus is an important factor that affects the quality of the image. This study discusses the out-of-focus QR code image and aims to improve the recognition of the contents in the QR code image. Many studies have used the pillbox filter (circular averaging filter method to simulate an out-of-focus image. This method is also used in this investigation to improve the recognition of a captured QR code image. A blurred QR code image is separated into nine levels. In the experiment, four different quantitative approaches are used to reconstruct and decode an out-of-focus QR code image. These nine reconstructed QR code images using methods are then compared. The final experimental results indicate improvements in identification.
Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
Michail N. Giannakos
Full Text Available Online video lectures have been considered an instructional media for various pedagogic approaches, such as the flipped classroom and open online courses. In comparison to other instructional media, online video affords the opportunity for recording student clickstream patterns within a video lecture. Video analytics within lecture videos may provide insights into student learning performance and inform the improvement of video-assisted teaching tactics. Nevertheless, video analytics are not accessible to learning stakeholders, such as researchers and educators, mainly because online video platforms do not broadly share the interactions of the users with their systems. For this purpose, we have designed an open-access video analytics system for use in a video-assisted course. In this paper, we present a longitudinal study, which provides valuable insights through the lens of the collected video analytics. In particular, we found that there is a relationship between video navigation (repeated views and the level of cognition/thinking required for a specific video segment. Our results indicated that learning performance progress was slightly improved and stabilized after the third week of the video-assisted course. We also found that attitudes regarding easiness, usability, usefulness, and acceptance of this type of course remained at the same levels throughout the course. Finally, we triangulate analytics from diverse sources, discuss them, and provide the lessons learned for further development and refinement of video-assisted courses and practices.
Full Text Available ... treatments are available, what is happening in the immune system and what other conditions are associated with RA. Learning more about your condition will allow you to take a more active role in your care. The information in these videos ...
Full Text Available ... Program Vision and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision Research & Ophthalmology (DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home >> NEI YouTube Videos >> ...
Full Text Available ... and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision ... DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home » NEI YouTube Videos » ...
Full Text Available ... Macular Degeneration Amblyopia Animations Blindness Cataract Convergence Insufficiency Diabetic Eye Disease Dilated Eye Exam Dry Eye For Kids Glaucoma Healthy Vision Tips Leber Congenital Amaurosis Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube ...
Manuel Calvelo Ríos
Full Text Available El Video resulta ser una herramienta sumamente útil para el desarrollo rural. Entendemos por desarrollo rural el intento de regular las relaciones campo-ciudad en términos más equitativos para el hombre del campo. Es por tanto una decisión política.
Full Text Available ... Program Vision and Aging Program African American Program Training and Jobs Fellowships NEI Summer Intern Program Diversity In Vision Research & Ophthalmology (DIVRO) Student Training Programs To search for current job openings visit HHS USAJobs Home » NEI YouTube Videos » ...
Full Text Available In this paper, we propose a system to detect and track fingertips online and recognize Mandarin Phonetic Symbol (MPS for user-friendly Chinese input purposes. Using fingertips and cameras to replace pens and touch panels as input devices could reduce the cost and improve the ease-of-use and comfort of computer-human interface. In the proposed framework, particle filters with enhanced appearance models are applied for robust fingertip tracking. Afterwards, MPS combination recognition is performed on the tracked fingertip trajectories using Hidden Markov Models. In the proposed system, the fingertips of the users could be robustly tracked. Also, the challenges of entering, leaving and virtual strokes caused by video-based fingertip input can be overcome. Experimental results have shown the feasibility and effectiveness of the proposed work.
Olasimbo Ayodeji Arigbabu
Full Text Available Soft biometrics can be used as a prescreening filter, either by using single trait or by combining several traits to aid the performance of recognition systems in an unobtrusive way. In many practical visual surveillance scenarios, facial information becomes difficult to be effectively constructed due to several varying challenges. However, from distance the visual appearance of an object can be efficiently inferred, thereby providing the possibility of estimating body related information. This paper presents an approach for estimating body related soft biometrics; specifically we propose a new approach based on body measurement and artificial neural network for predicting body weight of subjects and incorporate the existing technique on single view metrology for height estimation in videos with low frame rate. Our evaluation on 1120 frame sets of 80 subjects from a newly compiled dataset shows that the mentioned soft biometric information of human subjects can be adequately predicted from set of frames.
Full Text Available With the development of heterogeneous networks and video coding standards, multiresolution video applications over networks become important. It is critical to ensure the service quality of the network for time-sensitive video services. Worldwide Interoperability for Microwave Access (WIMAX is a good candidate for delivering video signals because through WIMAX the delivery quality based on the quality-of-service (QoS setting can be guaranteed. The selection of suitable QoS parameters is, however, not trivial for service users. Instead, what a video service user really concerns with is the video quality of presentation (QoP which includes the video resolution, the fidelity, and the frame rate. In this paper, we present a quality control mechanism in multiresolution video coding structures over WIMAX networks and also investigate the relationship between QoP and QoS in end-to-end connections. Consequently, the video presentation quality can be simply mapped to the network requirements by a mapping table, and then the end-to-end QoS is achieved. We performed experiments with multiresolution MPEG coding over WIMAX networks. In addition to the QoP parameters, the video characteristics, such as, the picture activity and the video mobility, also affect the QoS significantly.
Ayala, Sandra M
Ten first grade students, participating in a Tier II response to intervention (RTI) reading program received an intervention of video self modeling to improve decoding skills and sight word recognition. The students were video recorded blending and segmenting decodable words, and reading sight words taken directly from their curriculum instruction. Individual videos were recorded and edited to show students successfully and accurately decoding words and practicing sight word recognition. Each...
Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.
Heba Hamdy Ali
Full Text Available Depth Maps-based Human Activity Recognition is the process of categorizing depth sequences with a particular activity. In this problem, some applications represent robust solutions in domains such as surveillance system, computer vision applications, and video retrieval systems. The task is challenging due to variations inside one class and distinguishes between activities of various classes and video recording settings. In this study, we introduce a detailed study of current advances in the depth maps-based image representations and feature extraction process. Moreover, we discuss the state of art datasets and subsequent classification procedure. Also, a comparative study of some of the more popular depth-map approaches has provided in greater detail. The proposed methods are evaluated on three depth-based datasets “MSR Action 3D”, “MSR Hand Gesture”, and “MSR Daily Activity 3D”. Experimental results achieved 100%, 95.83%, and 96.55% respectively. While combining depth and color features on “RGBD-HuDaAct” Dataset, achieved 89.1%. Keywords: Activity recognition, Depth, Feature extraction, Video, Human body detection, Hand gesture
Lovink, G.; Somers Miles, R.
Video Vortex Reader II is the Institute of Network Cultures' second collection of texts that critically explore the rapidly changing landscape of online video and its use. With the success of YouTube ('2 billion views per day') and the rise of other online video sharing platforms, the moving image
Full Text Available This paper presents an object occlusion detection algorithm using object depth information that is estimated by automatic camera calibration. The object occlusion problem is a major factor to degrade the performance of object tracking and recognition. To detect an object occlusion, the proposed algorithm consists of three steps: (i automatic camera calibration using both moving objects and a background structure; (ii object depth estimation; and (iii detection of occluded regions. The proposed algorithm estimates the depth of the object without extra sensors but with a generic red, green and blue (RGB camera. As a result, the proposed algorithm can be applied to improve the performance of object tracking and object recognition algorithms for video surveillance systems.
Full Text Available Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line. The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.
Dan L. Lacrămă
Full Text Available This paper presents an original solution for flexible hand-written text division into rows. Unlike the standard procedure, the proposed method avoids the isolated characters extensions amputation and reduces the recognition error rate in the final stage.
Cued-recall and two-alternative, forced-choice recognition measures were used to evaluate subjects' retention of the specific wordings of studied texts. Results obtained after 10-minute and 24 hour retention intervals suggest that the studied wordings of texts are functional components of their memory representations. Theories that assume…
Recent improvements in processing power, storage space, and video codec development enable users now to playback video on their handheld devices in a reasonable quality. However, given the form factor restrictions of such a mobile device, screen size still remains a natural limit and - as the term "handheld" implies - always will be a critical resource. This is not only true for video but any data that is processed on such devices. For this reason, developers have come up with new and innovative ways to deal with large documents in such limited scenarios. For example, if you look at the iPhone, innovative techniques such as flicking have been introduced to skim large lists of text (e.g. hundreds of entries in your music collection). Automatically adapting the zoom level to, for example, the width of table cells when double tapping on the screen enables reasonable browsing of web pages that have originally been designed for large, desktop PC sized screens. A multi touch interface allows you to easily zoom in and out of large text documents and images using two fingers. In the next section, we will illustrate that advanced techniques to browse large video files have been developed in the past years, as well. However, if you look at state-of-the-art video players on mobile devices, normally just simple, VCR like controls are supported (at least at the time of this writing) that only allow users to just start, stop, and pause video playback. If supported at all, browsing and navigation functionality is often restricted to simple skipping of chapters via two single buttons for backward and forward navigation and a small and thus not very sensitive timeline slider.
I Made Oka Widyantara
Full Text Available This paper aims to analyze Internet-based streaming video service in the communication media with variable bit rates. The proposed scheme on Dynamic Adaptive Streaming over HTTP (DASH using the internet network that adapts to the protocol Hyper Text Transfer Protocol (HTTP. DASH technology allows a video in the video segmentation into several packages that will distreamingkan. DASH initial stage is to compress the video source to lower the bit rate video codec uses H.26. Video compressed further in the segmentation using MP4Box generates streaming packets with the specified duration. These packages are assembled into packets in a streaming media format Presentation Description (MPD or known as MPEG-DASH. Streaming video format MPEG-DASH run on a platform with the player bitdash teritegrasi bitcoin. With this scheme, the video will have several variants of the bit rates that gave rise to the concept of scalability of streaming video services on the client side. The main target of the mechanism is smooth the MPEG-DASH streaming video display on the client. The simulation results show that the scheme based scalable video streaming MPEG-DASH able to improve the quality of image display on the client side, where the procedure bufering videos can be made constant and fine for the duration of video views
Full Text Available While it has been established that using full body motion to play active video games results in increased levels of energy expenditure, there is little information on the classification of human movement during active video game play in relationship to fundamental movement skills. The aim of this study was to validate software utilising Kinect sensor motion capture technology to recognise fundamental movement skills (FMS, during active video game play. Two human assessors rated jumping and side-stepping and these assessments were compared to the Kinect Action Recognition Tool (KART, to establish a level of agreement and determine the number of movements completed during five minutes of active video game play, for 43 children (m = 12 years 7 months ± 1 year 6 months. During five minutes of active video game play, inter-rater reliability, when examining the two human raters, was found to be higher for the jump (r = 0.94, p < .01 than the sidestep (r = 0.87, p < .01, although both were excellent. Excellent reliability was also found between human raters and the KART system for the jump (r = 0.84, p, .01 and moderate reliability for sidestep (r = 0.6983, p < .01 during game play, demonstrating that both humans and KART had higher agreement for jumps than sidesteps in the game play condition. The results of the study provide confidence that the Kinect sensor can be used to count the number of jumps and sidestep during five minutes of active video game play with a similar level of accuracy as human raters. However, in contrast to humans, the KART system required a fraction of the time to analyse and tabulate the results.
Smith, Rachel Charlotte; Christensen, Kasper Skov; Iversen, Ole Sejer
We introduce Video Design Games to train educators in teaching design. The Video Design Game is a workshop format consisting of three rounds in which participants observe, reflect and generalize based on video snippets from their own practice. The paper reports on a Video Design Game workshop...... in which 25 educators as part of a digital fabrication and design program were able to critically reflect on their teaching practice....
Full Text Available ... Information At Home Shopping Cooking Gluten Free Baking School Eating Out Away From Home Emotional Adjustment Kids Speak Research and Innovation Contact Us Celiac Disease Program | Videos ...
Full Text Available ... Emotional Adjustment Kids Speak Research and Innovation Contact Us Celiac Disease Program | Videos Boston Children's Hospital will teach you and your family about a healthful celiac lifestyle. Education is key in making parents feel more at ...
Full Text Available ... Donate Pay Your Bill MyChildren's Patient Portal Celiac Disease Program Contact the Celiac Disease Program 1-617- ... Kids Speak Research and Innovation Contact Us Celiac Disease Program | Videos Boston Children's Hospital will teach you ...
Full Text Available ... Fitness Diseases & Conditions Infections Drugs & Alcohol School & ... and opportunities available to them. While you help your tween or teen plan for the future, watch this video series together to learn about everything ...
Full Text Available ... Disease Diet Information At Home Shopping Cooking Gluten Free Baking School Eating Out Away From Home Emotional Adjustment Kids Speak Research and Innovation Contact Us Celiac Disease Program | Videos ...
Full Text Available ... Videos Experiencing Celiac Disease What is Celiac Disease Diet Information At Home Shopping Cooking Gluten Free Baking ... What is Celiac Disease? : Diagnosis and treatment III. Diet Information : How to start and maintain a gluten- ...
Full Text Available [Skip to Content] for Parents Parents site Sitio para padres General Health Growth & Development Infections Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos Recipes for Kids ...
Full Text Available ... Diseases & Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos ... Health Food & Fitness Diseases & Conditions Infections Drugs & Alcohol School & Jobs Sports Expert Answers (Q&A) Staying Safe ...
Full Text Available ... Group Patient Resources Gluten Free Cookbooks Gluten Free Recipes Videos Experiencing Celiac Disease What is Celiac Disease Diet Information At Home Shopping Cooking Gluten Free Baking School Eating Out Away From ...
Full Text Available ... them. While you help your tween or teen plan for the future, watch this video series together ... Care for Your Child With Special Needs Special Education: Getting Help for Your Child Words to Know ( ...
Full Text Available ... Food & Fitness Diseases & Conditions Infections Drugs & Alcohol School & Jobs Sports Expert Answers (Q&A) Staying Safe Videos ... everything from financial and health care benefits to employment and housing options. More on this topic for: ...
Full Text Available ... video series together to learn about everything from financial and health care benefits to employment and housing options. More on this topic for: Parents Financial Planning for Kids With Special Needs Giving Teens ...
Full Text Available ... Conditions Pregnancy & Baby Nutrition & Fitness Emotions & Behavior School & Family Life First Aid & Safety Doctors & Hospitals Videos Recipes ... Care for Your Child With Special Needs Special Education: Getting Help for Your Child Words to Know ( ...
Full Text Available ... Food & Fitness Diseases & Conditions Infections Drugs & Alcohol School & Jobs Sports Expert Answers (Q&A) Staying Safe Videos ... Care for Your Child With Special Needs Special Education: Getting Help for Your Child Words to Know ( ...
Full Text Available ... the future, watch this video series together to learn about everything from financial and health care benefits to employment ... Respite Care for Your Child With Special Needs Special Education: Getting Help for ...
Full Text Available ... Feelings Expert Answers Q&A Movies & More for Teens Teens site Sitio para adolescentes Body Mind Sexual Health ... to them. While you help your tween or teen plan for the future, watch this video series ...
Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Ducey, Richard V.
This report examines a growing submarket, the children's video marketplace, which comprises broadcast, cable, and video programming for children 2 to 11 years old. A description of the tremendous growth in the availability and distribution of children's programming is presented, the economics of the children's video marketplace are briefly…
Buggey, Tom; Ogle, Lindsey
Video self-modeling (VSM) first appeared on the psychology and education stage in the early 1970s. The practical applications of VSM were limited by lack of access to tools for editing video, which is necessary for almost all self-modeling videos. Thus, VSM remained in the research domain until the advent of camcorders and VCR/DVD players and,…
Qin, Guofeng; Wang, Xiaoguo; Wang, Li; Li, Yang; Li, Qiyan
Detection of vehicles plays an important role in the area of the modern intelligent traffic management. And the pattern recognition is a hot issue in the area of computer vision. An auto- recognition system in cooperative information platform is studied. In the cooperative platform, 3G wireless network, including GPS, GPRS (CDMA), Internet (Intranet), remote video monitor and M-DMB networks are integrated. The remote video information can be taken from the terminals and sent to the cooperative platform, then detected by the auto-recognition system. The images are pretreated and segmented, including feature extraction, template matching and pattern recognition. The system identifies different models and gets vehicular traffic statistics. Finally, the implementation of the system is introduced.
Full Text Available Computers and computerized machines have tremendously penetrated all aspects of our lives. This raises the importance of Human-Computer Interface (HCI. The common HCI techniques still rely on simple devices such as keyboard, mice, and joysticks, which are not enough to convoy the latest technology. Hand gesture has become one of the most important attractive alternatives to existing traditional HCI techniques. This paper proposes a new hand gesture detection system for Human-Computer Interaction using real-time video streaming. This is achieved by removing the background using average background algorithm and the 1$ algorithm for hand’s template matching. Then every hand gesture is translated to commands that can be used to control robot movements. The simulation results show that the proposed algorithm can achieve high detection rate and small recognition time under different light changes, scales, rotation, and background.
Sendhy Rachmat Wurdianarto
Full Text Available Perkembangan ilmu pada dunia komputer sangatlah pesat. Salah satu yang menandai hal ini adalah ilmu komputer telah merambah pada dunia biometrik. Arti biometrik sendiri adalah karakter-karakter manusia yang dapat digunakan untuk membedakan antara orang yang satu dengan yang lainnya. Salah satu pemanfaatan karakter / organ tubuh pada setiap manusia yang digunakan untuk identifikasi (pengenalan adalah dengan memanfaatkan wajah. Dari permasalahan diatas dalam pengenalan lebih tentang aplikasi Matlab pada Face Recognation menggunakan metode Euclidean Distance dan Canberra Distance. Model pengembangan aplikasi yang digunakan adalah model waterfall. Model waterfall beriisi rangkaian aktivitas proses yang disajikan dalam proses analisa kebutuhan, desain menggunakan UML (Unified Modeling Language, inputan objek gambar diproses menggunakan Euclidean Distance dan Canberra Distance. Kesimpulan yang dapat ditarik adalah aplikasi face Recognation menggunakan metode euclidean Distance dan Canverra Distance terdapat kelebihan dan kekurangan masing-masing. Untuk kedepannya aplikasi tersebut dapat dikembangkan dengan menggunakan objek berupa video ataupun objek lainnya. Kata kunci : Euclidean Distance, Face Recognition, Biometrik, Canberra Distance
No milestone has proven as elusive as the always-approaching "year of the LAN," but the "year of the scanner" might claim the silver medal. Desktop scanners have been around almost as long as personal computers. And everyone thinks they are used for obvious desktop-publishing and business tasks like scanning business documents, magazine articles and other pages, and translating those words into files your computer understands. But, until now, the reality fell far short of the promise. Because it's true that scanners deliver an accurate image of the page to your computer, but the software to recognize this text has been woefully disappointing. Old optical-character recognition (OCR) software recognized such a limited range of pages as to be virtually useless to real users. (For example, one OCR vendor specified 12-point Courier font from an IBM Selectric typewriter: the same font in 10-point, or from a Diablo printer, was unrecognizable!) Computer dealers have told me the chasm between OCR expectations and reality is so broad and deep that nine out of ten prospects leave their stores in disgust when they learn the limitations. And this is a very important, very unfortunate gap. Because the promise of recognition -- what people want it to do -- carries with it tremendous improvements in our productivity and ability to get tons of written documents into our computers where we can do real work with it. The good news is that a revolutionary new development effort has led to the new technology of "page recognition," which actually does deliver the promise we've always wanted from OCR. I'm sure every reader appreciates the breakthrough represented by the laser printer and page-makeup software, a combination so powerful it created new reasons for buying a computer. A similar breakthrough is happening right now in page recognition: the Macintosh (and, I must admit, other personal computers) equipped with a moderately priced scanner and OmniPage software (from Caere
Spencer, Brenda H.
Notes that a text map is an instructional approach designed to help students gain fluency in reading content area materials. Discusses how the goal is to teach students about the important features of the material and how the maps can be used to build new understandings. Presents the procedures for preparing and using a text map. (SG)
Full Text Available Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.
There has been a phenomenal growth in video applications over the past few years. An accurate traffic model of Variable Bit Rate (VBR) video is necessary for performance evaluation of a network design and for generating synthetic traffic that can be used for benchmarking a network. A large number of models for VBR video traffic have been proposed in the literature for different types of video in the past 20 years. Here, the authors have classified and surveyed these models and have also evaluated the models for H.264 AVC and MVC encoded video and discussed their findings.
Belonging to the wider academic field of computer vision, video analytics has aroused a phenomenal surge of interest since the current millennium. Video analytics is intended to solve the problem of the incapability of exploiting video streams in real time for the purpose of detection or anticipation. It involves analyzing the videos using algorithms that detect and track objects of interest over time and that indicate the presence of events or suspect behavior involving these objects.The aims of this book are to highlight the operational attempts of video analytics, to identify possi
Full Text Available We build a set of human behavior recognition system based on the convolution neural network constructed for the specific human behavior in public places. Firstly, video of human behavior data set will be segmented into images, then we process the images by the method of background subtraction to extract moving foreground characters of body. Secondly, the training data sets are trained into the designed convolution neural network, and the depth learning network is constructed by stochastic gradient descent. Finally, the various behaviors of samples are classified and identified with the obtained network model, and the recognition results are compared with the current mainstream methods. The result show that the convolution neural network can study human behavior model automatically and identify human’s behaviors without any manually annotated trainings.
Full Text Available Building indoors reconstruction is an active research topic due to the importance of the wide range of applications to which they can be subjected, from architecture and furniture design, to movies and video games editing, or even crime scene investigation. Among the constructive elements defining the inside of a building, doors are important entities in applications like routing and navigation, and their automated recognition is advantageous e.g. in case of large multi-storey buildings with many office rooms. The inherent complexity of the automation of the recognition process is increased by the presence of clutter and occlusions, difficult to avoid in indoor scenes. In this work, we present a pipeline of techniques used for the reconstruction and interpretation of building interiors using information acquired in the form of point clouds and images. The methodology goes in depth with door detection and labelling as either opened, closed or furniture (false positive
In this paper, we develop a novel framework for action recognition in videos. The framework is based on automatically learning the discriminative trajectory groups that are relevant to an action. Different from previous approaches, our method does not require complex computation for graph matching or complex latent models to localize the parts. We model a video as a structured bag of trajectory groups with latent class variables. We model action recognition problem in a weakly supervised setting and learn discriminative trajectory groups by employing multiple instance learning (MIL) based Support Vector Machine (SVM) using pre-computed kernels. The kernels depend on the spatio-temporal relationship between the extracted trajectory groups and their associated features. We demonstrate both quantitatively and qualitatively that the classification performance of our proposed method is superior to baselines and several state-of-the-art approaches on three challenging standard benchmark datasets.
Jacco R. Taal
Full Text Available Wireless and Internet video applications are inherently subjected to bit errors and packet errors, respectively. This is especially so if constraints on the end-to-end compression and transmission latencies are imposed. Therefore, it is necessary to develop methods to optimize the video compression parameters and the rate allocation of these applications that take into account residual channel bit errors. In this paper, we study the behavior of a predictive (interframe video encoder and model the encoders behavior using only the statistics of the original input data and of the underlying channel prone to bit errors. The resulting data-driven behavior models are then used to carry out group-of-pictures partitioning and to control the rate of the video encoder in such a way that the overall quality of the decoded video with compression and channel errors is optimized.
The full-color guide to shooting great video with the Flip Video camera. The inexpensive Flip Video camera is currently one of the hottest must-have gadgets. It's portable and connects easily to any computer to transfer video you shoot onto your PC or Mac. Although the Flip Video camera comes with a quick-start guide, it lacks a how-to manual, and this full-color book fills that void! Packed with full-color screen shots throughout, Flip Video For Dummies shows you how to shoot the best possible footage in a variety of situations. You'll learn how to transfer video to your computer and then edi
Full Text Available This paper explores video game trailers, their various forms and the roles they play within video game industry and culture. It offers an overview of the current practice of video game trailer differentiation and proposes a new typology of video game trailers based on their relation to ludic and cinematic aspects of a video game, combining the theory of paratexts, video game performance framework, the interface effect concept, as well as the concept of transmedia storytelling. This typology reflects the historical evolution of a video game trailer and also takes into account current trends in the audiovisual paratexts of video games.
D. R. Marković
Full Text Available From the perspective of average viewer, high definition video streams such as HD (High Definition and UHD (Ultra HD are increasing their internet presence year over year. This is not surprising, having in mind expansion of HD streaming services, such as YouTube, Netflix etc. Therefore, high definition video streams are starting to challenge network resource allocation with their bandwidth requirements and statistical characteristics. Need for analysis and modeling of this demanding video traffic has essential importance for better quality of service and experience support. In this paper we use an easy-to-apply statistical model for prediction of 4K video traffic. Namely, seasonal autoregressive modeling is applied in prediction of 4K video traffic, encoded with HEVC (High Efficiency Video Coding. Analysis and modeling were performed within R programming environment using over 17.000 high definition video frames. It is shown that the proposed methodology provides good accuracy in high definition video traffic modeling.
Gîlcă, Gheorghe; Bîzdoacă, Nicu-George
This article deals with an emotion recognition system based on the fuzzy sets. Human faces are detected in images with the Viola - Jones algorithm and for its tracking in video sequences we used the Camshift algorithm. The detected human faces are transferred to the decisional fuzzy system, which is based on the variable fuzzyfication measurements of the face: eyebrow, eyelid and mouth. The system can easily determine the emotional state of a person.
Full Text Available ABSTRACT Designing fitness exercise tutorial level beginner as learning and promotion media for life gym was designed to provide guidelines of good movement in the fitness training sessions for beginners, especially the gym because life member will be distributed free of charge for new members sign up. For the process of editing video tutorial software and hardware needed adequate for smooth production. The results also depend on the ability of either constituent knowledge of a general nature and especially directing, editing, creativity, and the ability of hardware, software and technology / computer. Excess video guide allows members to understand the movement is good and right to avoid unwanted injury. Not only guides the movement are presented in this video project but also the member is given petuntuk diet and proper diet for target practice can be easily achieved. Excess video guide allows members to understand the movement is good and right to avoid unwanted injury. Not only guides the movement are presented in this video project but also the member is given guide of diet and proper diet for target practice can be easily achieved. The presence of video editing technology offers convenience to an agency to educate the public through video learning and served as media promotion of a service or related agency theme of the video.
Full Text Available Ambient Video is an emergent cultural phenomenon, with roots that go deeply into the history of experimental film and video art. Ambient Video, like Brian Eno's ambient music, is video that "must be as easy to ignore as notice" . This minimalist description conceals the formidable aesthetic challenge that faces this new form. Ambient video art works will hang on the walls of our living rooms, corporate offices, and public spaces. They will play in the background of our lives, living video paintings framed by the new generation of elegant, high-resolution flat-panel display units. However, they cannot command attention like a film or television show. They will patiently play in the background of our lives, yet they must always be ready to justify our attention in any given moment. In this capacity, ambient video works need to be equally proficient at rewarding a fleeting glance, a more direct look, or a longer contemplative gaze. This paper connects a series of threads that collectively illuminate the aesthetics of this emergent form: its history as a popular culture phenomenon, its more substantive artistic roots in avant-garde cinema and video art, its relationship to new technologies, the analysis of the viewer's conditions of reception, and the work of current artists who practice within this form.
Full Text Available Recognition of visual patterns is one of significant applications of Artificial Neural Networks, which partially emulate human thinking in the domain of artificial intelligence. In the paper, a simplified neural approach to recognition of visual patterns is portrayed and discussed. This paper is dedicated for investigators in visual patterns recognition, Artificial Neural Networking and related disciplines. The document describes also MemBrain application environment as a powerful and easy to use neural networks’ editor and simulator supporting ANN.
Titus Felix FURTUNA
Full Text Available In a system of speech recognition containing words, the recognition requires the comparison between the entry signal of the word and the various words of the dictionary. The problem can be solved efficiently by a dynamic comparison algorithm whose goal is to put in optimal correspondence the temporal scales of the two words. An algorithm of this type is Dynamic Time Warping. This paper presents two alternatives for implementation of the algorithm designed for recognition of the isolated words.
Thi, Tuan Hue; Wang, Li; Ye, Ning; Zhang, Jian; Maurer-Stroh, Sebastian; Cheng, Li
Vision-based surveillance and monitoring is a potential alternative for early detection of respiratory disease outbreaks in urban areas complementing molecular diagnostics and hospital and doctor visit-based alert systems. Visible actions representing typical flu-like symptoms include sneeze and cough that are associated with changing patterns of hand to head distances, among others. The technical difficulties lie in the high complexity and large variation of those actions as well as numerous similar background actions such as scratching head, cell phone use, eating, drinking and so on. In this paper, we make a first attempt at the challenging problem of recognizing flu-like symptoms from videos. Since there was no related dataset available, we created a new public health dataset for action recognition that includes two major flu-like symptom related actions (sneeze and cough) and a number of background actions. We also developed a suitable novel algorithm by introducing two types of Action Matching Kernels, where both types aim to integrate two aspects of local features, namely the space-time layout and the Bag-of-Words representations. In particular, we show that the Pyramid Match Kernel and Spatial Pyramid Matching are both special cases of our proposed kernels. Besides experimenting on standard testbed, the proposed algorithm is evaluated also on the new sneeze and cough set. Empirically, we observe that our approach achieves competitive performance compared to the state-of-the-arts, while recognition on the new public health dataset is shown to be a non-trivial task even with simple single person unobstructed view. Our sneeze and cough video dataset and newly developed action recognition algorithm is the first of its kind and aims to kick-start the field of action recognition of flu-like symptoms from videos. It will be challenging but necessary in future developments to consider more complex real-life scenario of detecting these actions simultaneously from
Lucas do Vale Manue
Full Text Available The use of computational vision plays an important role for security purposes, nowadays it is largely used and, day by day, is earning new resources due to the technological progress. The majority of current security systems are based in a monitoring where the analysis is made by the user's interpretation. To improve this process, it has been proposed models of analysis and recognition of personal characteristics, like height, skin tone, facial recognition, among others, in order to provide the search for recognitions more efficient. In this work, there are used different algorithms like HOG and SVM for identification and tracking of people, color spaces RGB, HSV and YCrCb, that are combined and used for the identification and extraction of characteristics like the type and color of clothes of each person for monitoring along the video.
Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu
Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Full Text Available Behavioral studies of spoken word memory have shown that context congruency facilitates both word and source recognition, though the level at which context exerts its influence remains equivocal. We measured event-related potentials (ERPs while participants performed both types of recognition task with words spoken in four voices. Two voice parameters (i.e., gender and accent varied between speakers, with the possibility that none, one or two of these parameters was congruent between study and test. Results indicated that reinstating the study voice at test facilitated both word and source recognition, compared to similar or no context congruency at test. Behavioral effects were paralleled by two ERP modulations. First, in the word recognition test, the left parietal old/new effect showed a positive deflection reflective of context congruency between study and test words. Namely, the same speaker condition provided the most positive deflection of all correctly identified old words. In the source recognition test, a right frontal positivity was found for the same speaker condition compared to the different speaker conditions, regardless of response success. Taken together, the results of this study suggest that the benefit of context congruency is reflected behaviorally and in ERP modulations traditionally associated with recognition memory.
Gerald, Rex E. II; Sanchez, Jairo; Rathke, Jerome W.
A video toroid cavity imager for in situ measurement of electrochemical properties of an electrolytic material sample includes a cylindrical toroid cavity resonator containing the sample and employs NMR and video imaging for providing high-resolution spectral and visual information of molecular characteristics of the sample on a real-time basis. A large magnetic field is applied to the sample under controlled temperature and pressure conditions to simultaneously provide NMR spectroscopy and video imaging capabilities for investigating electrochemical transformations of materials or the evolution of long-range molecular aggregation during cooling of hydrocarbon melts. The video toroid cavity imager includes a miniature commercial video camera with an adjustable lens, a modified compression coin cell imager with a fiat circular principal detector element, and a sample mounted on a transparent circular glass disk, and provides NMR information as well as a video image of a sample, such as a polymer film, with micrometer resolution.
Is video becoming “the new black” in academia, if so, what are the challenges? The integration of video in research methodology (for collection, analysis) is well-known, but the use of “academic video” for dissemination is relatively new (Eriksson and Sørensen). The focus of this paper is academic......). In the video, I appear (along with other researchers) and two Danish film directors, and excerpts from their film. My challenges included how to edit the academic video and organize the collaborative effort. I consider video editing as a semiotic, transformative process of “reassembling” voices....... In the discussion, I review academic video in terms of relevance and implications for research practice. The theoretical background is social constructivist, combining social semiotics (Kress, van Leeuwen, McCloud), visual anthropology (Banks, Pink) and dialogic theory (Bakhtin). The Bakhtinian notion of “voices...
Ylirisku, Salu Pekka
Digital video for user-centered co-design is an emerging field of design, gaining increasing interest in both industry and academia. It merges the techniques and approaches of design ethnography, participatory design, interaction analysis, scenario-based design, and usability studies. This book covers the complete user-centered design project. It illustrates in detail how digital video can be utilized throughout the design process, from early user studies to making sense of video content and envisioning the future with video scenarios to provoking change with video artifacts. The text includes
Jolly Shah; Vikas Saxena
Multimedia data security is becoming important with the continuous increase of digital communications on internet. The encryption algorithms developed to secure text data are not suitable for multimedia application because of the large data size and real time constraint. In this paper, classification and description of various video encryption algorithms are presented. Analysis and Comparison of these algorithms with respect to various parameters like visual degradation, encryption ratio, spe...
One major reason for this shortcoming is the myopic and pedestrian conceptualization of literacy as a cognitive skill, which entails merely being able to read and write. Nevertheless, with greater recognition and extolment of cultural diversities, sensitivity to the socio-linguistic propensities of video literacy is beginning to ...
Willmott, Christopher J. R.
There is growing recognition that science is not conducted in a vacuum and that advances in the biosciences have ethical and social implications for the wider community. An exercise is described in which undergraduate students work in teams to produce short videos about the science and ethical dimensions of current developments in biomedicine.…
Full Text Available Brood parasitism frequently leads to a total loss of host fitness, which selects for the evolution of defensive traits in host species. Experimental studies have demonstrated that recognition and rejection of the parasite egg is the most common and efficient defence used by host species. Egg-recognition experiments have advanced our knowledge of the evolutionary and coevolutionary implications of egg recognition and rejection. However, our understanding of the proximate mechanisms underlying both processes remains poor. Egg rejection is a complex behavioural process consisting of three stages: egg recognition, the decision whether or not to reject the putative parasitic egg and the act of ejection itself. We have used the blackbird (Turdus merula as a model species to explore the relationship between egg recognition and the act of egg ejection. We have manipulated the two main characteristics of parasitic eggs affecting egg ejection in this grasp-ejector species: the degree of colour mimicry (mimetic and non-mimetic, which mainly affects the egg-recognition stage of the egg-rejection process and egg size (small, medium and large, which affects the decision to eject, while maintaining a control group of non-parasitized nests. The behaviour of the female when confronted with an experimental egg was filmed using a video camera. Our results show that egg touching is an indication of egg recognition and demonstrate that blackbirds recognized (i.e., touched non-mimetic experimental eggs significantly more than mimetic eggs. However, twenty per cent of the experimental eggs were touched but not subsequently ejected, which confirms that egg recognition does not necessarily mean egg ejection and that accepting parasitic eggs, at least sometimes, is the consequence of acceptance decisions. Regarding proximate mechanisms, our results show that the delay in egg ejection is not only due to recognition problems as usually suggested, given that experimental
Suddendorf, Thomas; Butler, David L
Visual self-recognition is often controversially cited as an indicator of self-awareness and assessed with the mirror-mark test. Great apes and humans, unlike small apes and monkeys, have repeatedly passed mirror tests, suggesting that the underlying brain processes are homologous and evolved 14-18 million years ago. However, neuroscientific, developmental, and clinical dissociations show that the medium used for self-recognition (mirror vs photograph vs video) significantly alters behavioral and brain responses, likely due to perceptual differences among the different media and prior experience. On the basis of this evidence and evolutionary considerations, we argue that the visual self-recognition skills evident in humans and great apes are a byproduct of a general capacity to collate representations, and need not index other aspects of self-awareness. Copyright © 2013 Elsevier Ltd. All rights reserved.
Achieve professional quality sound on a limited budget! Harness all new, Hollywood style audio techniques to bring your independent film and video productions to the next level.In Sound for Digital Video, Second Edition industry experts Tomlinson Holman and Arthur Baum give you the tools and knowledge to apply recent advances in audio capture, video recording, editing workflow, and mixing to your own film or video with stunning results. This fresh edition is chockfull of techniques, tricks, and workflow secrets that you can apply to your own projects from preproduction
Full Text Available BACKGROUND: Experimental psychology has only recently provided supporting evidence for Freud's and Janet's description of unconscious phenomena. Here, we aimed to assess whether specific abilities, such as personal psychodynamic experience, enhance the ability to recognize unconscious phenomena in peers - in other words, to better detect implicit knowledge related to individual self-experience. METHODOLOGY AND PRINCIPAL FINDINGS: First, we collected 14 videos from seven healthy adults who had experienced a sibling's cancer during childhood and seven matched controls. Subjects and controls were asked to give a 5-minute spontaneous free-associating speech following specific instructions created in order to activate a buffer zone between fantasy and reality. Then, 18 raters (three psychoanalysts, six medical students, three oncologists, three cognitive behavioral therapists and three individuals with the same experience of trauma were randomly shown the videos and asked to blindly classify them according to whether the speaker had a sibling with cancer using a Likert scale. Using a permutation test, we found a significant association between group and recognition score (ANOVA: p = .0006. Psychoanalysts were able to recognize, above chance levels, healthy adults who had experienced sibling cancer during childhood without explicit knowledge of this history (Power = 88%; p = .002. In contrast, medical students, oncologists, cognitive behavioral therapists and individuals who had the same history of a sibling's cancer were unable to do so. CONCLUSION: This experiment supports the view that implicit recognition of a subject's history depends on the rater's specific abilities. In the case of subjects who did have a sibling with cancer during childhood, psychoanalysts appear better able to recognize this particular history.
Jamie S. North
Full Text Available We used a novel approach to examine whether it is possible to improve the perceptual–cognitive skill of pattern recognition using a video-based training intervention. Moreover, we investigated whether any improvements in pattern recognition transfer to an improved ability to make anticipation judgments. Finally, we compared the relative effectiveness of verbal and visual guidance interventions compared to a group that merely viewed the same sequences without any intervention and a control group that only completed pre- and post-tests. We found a significant effect for time of testing. Participants were more sensitive in their ability to perceive patterns and distinguish between novel and familiar sequences at post- compared to pre-test. However, this improvement was not influenced by the nature of the intervention, despite some trends in the data. An analysis of anticipation accuracy showed no change from pre- to post-test following the pattern recognition training intervention, suggesting that the link between pattern perception and anticipation may not be strong. We present a series of recommendations for scientists and practitioners when employing training methods to improve pattern recognition and anticipation.
Full Text Available Facial micro-expression is a brief involuntary facial movement and can reveal the genuine emotion that people try to conceal. Traditional methods of spontaneous micro-expression recognition rely excessively on sophisticated hand-crafted feature design and the recognition rate is not high enough for its practical application. In this paper, we proposed a Dual Temporal Scale Convolutional Neural Network (DTSCNN for spontaneous micro-expressions recognition. The DTSCNN is a two-stream network. Different of stream of DTSCNN is used to adapt to different frame rate of micro-expression video clips. Each stream of DSTCNN consists of independent shallow network for avoiding the overfitting problem. Meanwhile, we fed the networks with optical-flow sequences to ensure that the shallow networks can further acquire higher-level features. Experimental results on spontaneous micro-expression databases (CASME I/II showed that our method can achieve a recognition rate almost 10% higher than what some state-of-the-art method can achieve.
Full Text Available Real-time image processing and object detection techniques have a great potential to be applied in digital assistive tools for the blind and visually impaired persons. In this paper, algorithm for crosswalk detection and walk light recognition is proposed with the main aim to help blind person when crossing the road. The proposed algorithm is optimized to work in real-time on portable devices using standard cameras. Images captured by camera are processed while person is moving and decision about detected crosswalk is provided as an output along with the information about walk light if one is present. Crosswalk detection method is based on multiresolution morphological image processing, while the walk light recognition is performed by proposed 6-stage algorithm. The main contributions of this paper are accurate crosswalk detection with small processing time due to multiresolution processing and the recognition of the walk lights covering only small amount of pixels in image. The experiment is conducted using images from video sequences captured in realistic situations on crossings. The results show 98.3% correct crosswalk detections and 89.5% correct walk lights recognition with average processing speed of about 16 frames per second.
Wade, P.; Courtney, A. R.
In science, the use of digital video to document phenomena, experiments and demonstrations has rapidly increased during the last decade. The use of digital video for science education also has become common with the wide availability of video over the internet. However, as with using any technology as a teaching tool, some questions should be asked: What science is being learned from watching a YouTube clip of a volcanic eruption or an informational video on hydroelectric power generation? What are student preferences (e.g. multimedia versus traditional mode of delivery) with regard to their learning? This study describes 1) the efficacy of watching digital video in the science classroom to enhance student learning, 2) student preferences of instruction with regard to multimedia versus traditional delivery modes, and 3) the use of creating digital video as a project-based educational strategy to enhance learning. Undergraduate non-science majors were the primary focus group in this study. Students were asked to view video segments and respond to a survey focused on what they learned from the segments. Additionally, they were asked about their preference for instruction (e.g. text only, lecture-PowerPoint style delivery, or multimedia-video). A majority of students indicated that well-made video, accompanied with scientific explanations or demonstration of the phenomena was most useful and preferred over text-only or lecture instruction for learning scientific information while video-only delivery with little or no explanation was deemed not very useful in learning science concepts. The use of student generated video projects as learning vehicles for the creators and other class members as viewers also will be discussed.
Jamal Ahmad Dargham
Full Text Available Face recognition is an important biometric method because of its potential applications in many fields, such as access control, surveillance, and human-computer interaction. In this paper, a face recognition system that fuses the outputs of three face recognition systems based on Gabor jets is presented. The first system uses the magnitude, the second uses the phase, and the third uses the phase-weighted magnitude of the jets. The jets are generated from facial landmarks selected using three selection methods. It was found out that fusing the facial features gives better recognition rate than either facial feature used individually regardless of the landmark selection method.
Risko, Victoria J.; Walker-Dalhouse, Doris
Students read multiple-genre texts such as graphic novels, poetry, brochures, digitized texts with videos, and informational and narrative texts. Features such as overlapping illustrations and implied cause-and-effect relationships can affect students' comprehension. Teaching with these texts and drawing attention to organizational features hold…
Moisés Henrique Ramos Pereira
Full Text Available This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance with the distribution of audiovisual indicators extracted over a TV newscast, demonstrating the potential of the approach to support the TV journalistic discourse analysis.This article addresses a multimodal approach to automatic emotion recognition in participants of TV newscasts (presenters, reporters, commentators and others able to assist the tension levels study in narratives of events in this television genre. The methodology applies state-of-the-art computational methods to process and analyze facial expressions, as well as speech signals. The proposed approach contributes to semiodiscoursive study of TV newscasts and their enunciative praxis, assisting, for example, the identification of the communication strategy of these programs. To evaluate the effectiveness of the proposed approach was applied it in a video related to a report displayed on a Brazilian TV newscast great popularity in the state of Minas Gerais. The experimental results are promising on the recognition of emotions on the facial expressions of tele journalists and are in accordance
Text-Fabric is a Python3 package for Text plus Annotations. It provides a data model, a text file format, and a binary format for (ancient) text plus (linguistic) annotations. The emphasis of this all is on: data processing; sharing data; and contributing modules. A defining characteristic is that
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Riggs, Ken Roger
Discusses problems with marking free text, text that is either natural language or semigrammatical but unstructured, that prevent well-formed XML from marking text for readily available meaning. Proposes a solution to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML. (Author/LRW)
Full Text Available Candidate smoke region segmentation is the key link of smoke video detection; an effective and prompt method of candidate smoke region segmentation plays a significant role in a smoke recognition system. However, the interference of heavy fog and smoke-color moving objects greatly degrades the recognition accuracy. In this paper, a novel method of candidate smoke region segmentation based on rough set theory is presented. First, Kalman filtering is used to update video background in order to exclude the interference of static smoke-color objects, such as blue sky. Second, in RGB color space smoke regions are segmented by defining the upper approximation, lower approximation, and roughness of smoke-color distribution. Finally, in HSV color space small smoke regions are merged by the definition of equivalence relation so as to distinguish smoke images from heavy fog images in terms of V component value variety from center to edge of smoke region. The experimental results on smoke region segmentation demonstrated the effectiveness and usefulness of the proposed scheme.
Haiam A. Abdul-Azim
Full Text Available Recognizing human actions in video sequences has been a challenging problem in the last few years due to its real-world applications. A lot of action representation approaches have been proposed to improve the action recognition performance. Despite the popularity of local features-based approaches together with “Bag-of-Words” model for action representation, it fails to capture adequate spatial or temporal relationships. In an attempt to overcome this problem, a trajectory-based local representation approaches have been proposed to capture the temporal information. This paper introduces an improvement of trajectory-based human action recognition approaches to capture discriminative temporal relationships. In our approach, we extract trajectories by tracking the detected spatio-temporal interest points named “cuboid features” with matching its SIFT descriptors over the consecutive frames. We, also, propose a linking and exploring method to obtain efficient trajectories for motion representation in realistic conditions. Then the volumes around the trajectories’ points are described to represent human actions based on the Bag-of-Words (BOW model. Finally, a support vector machine is used to classify human actions. The effectiveness of the proposed approach was evaluated on three popular datasets (KTH, Weizmann and UCF sports. Experimental results showed that the proposed approach yields considerable performance improvement over the state-of-the-art approaches.
Bulling, Andreas; Ward, Jamie A; Gellersen, Hans; Tröster, Gerhard
In this work, we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data were recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals-saccades, fixations, and blinks-and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance (mRMR) feature selection. We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1 percent and recall of 70.5 percent over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.
Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Provenzo, Eugene F., Jr.
Video games are neither neutral nor harmless but represent very specific social and symbolic constructs. Research on the social content of today's video games reveals that sex bias and gender stereotyping are widely evident throughout the Nintendo games. Violence and aggression also pervade the great majority of the games. (MLF)
van der Meij, Hans
This study investigates the effectiveness of a video tutorial for software training whose construction was based on a combination of insights from multimedia learning and Demonstration-Based Training. In the videos, a model of task performance was enhanced with instructional features that were
Legislative Liaison Small Business Programs Social Media State Websites Videos Featured Videos On Every Front 2:17 Always Ready, Always There National Guard Bureau Diversity and Inclusion Play Button 1:04 National Guard Bureau Diversity and Inclusion The ChalleNGe Ep.5 [Graduation] Play Button 3:51 The
James D. Ivory
Full Text Available Although there is a vast and useful body of quantitative social science research dealing with the social role and impact of video games, it is difficult to compare studies dealing with various dimensions of video games because they are informed by different perspectives and assumptions, employ different methodologies, and address different problems. Studies focusing on different social dimensions of video games can produce varied findings about games’ social function that are often difficult to reconcile— or even contradictory. Research is also often categorized by topic area, rendering a comprehensive view of video games’ social role across topic areas difficult. This interpretive review presents a novel typology of four identified approaches that categorize much of the quantitative social science video game research conducted to date: “video games as stimulus,” “video games as avocation,” “video games as skill,” and “video games as social environment.” This typology is useful because it provides an organizational structure within which the large and growing number of studies on video games can be categorized, guiding comparisons between studies on different research topics and aiding a more comprehensive understanding of video games’ social role. Categorizing the different approaches to video game research provides a useful heuristic for those critiquing and expanding that research, as well as an understandable entry point for scholars new to video game research. Further, and perhaps more importantly, the typology indicates when topics should be explored using different approaches than usual to shed new light on the topic areas. Lastly, the typology exposes the conceptual disconnects between the different approaches to video game research, allowing researchers to consider new ways to bridge gaps between the different approaches’ strengths and limitations with novel methods.
Full Text Available The paper presents an NP-video rendering system based on natural phenomena. It provides a simple nonphotorealistic video synthesis system in which user can obtain a flow-like stylization painting and infinite video scene. Firstly, based on anisotropic Kuwahara filtering in conjunction with line integral convolution, the phenomena video scene can be rendered to flow-like stylization painting. Secondly, the methods of frame division, patches synthesis, will be used to synthesize infinite playing video. According to selection examples from different natural video texture, our system can generate stylized of flow-like and infinite video scenes. The visual discontinuities between neighbor frames are decreased, and we also preserve feature and details of frames. This rendering system is easy and simple to implement.
... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... Your Arthritis Managing Chronic Pain and Depression in Arthritis Nutrition & Rheumatoid Arthritis Arthritis and Health-related Quality of Life ...
Finnemann, Niels Ole
text can be defined by taking as point of departure the digital format in which everything is represented in the binary alphabet. While the notion of text, in most cases, lends itself to be independent of medium and embodiment, it is also often tacitly assumed that it is, in fact, modeled around...... the print medium, rather than written text or speech. In late 20th century, the notion of text was subject to increasing criticism as in the question raised within literary text theory: is there a text in this class? At the same time, the notion was expanded by including extra linguistic sign modalities...
Wang, Zhi; Zhu, Wenwu
This brief presents new architecture and strategies for distribution of social video content. A primary framework for socially-aware video delivery and a thorough overview of the possible approaches is provided. The book identifies the unique characteristics of socially-aware video access and social content propagation, revealing the design and integration of individual modules that are aimed at enhancing user experience in the social network context. The change in video content generation, propagation, and consumption for online social networks, has significantly challenged the traditional video delivery paradigm. Given the massive amount of user-generated content shared in online social networks, users are now engaged as active participants in the social ecosystem rather than as passive receivers of media content. This revolution is being driven further by the deep penetration of 3G/4G wireless networks and smart mobile devices that are seamlessly integrated with online social networking and media-sharing s...
have large influence on their own teaching, learning and curriculum. The programme offers streamed videos in combination with other learning resources. It is a concept which offers video as pure presentation - video lectures - but also as an instructional tool which gives the students the possibility...... to construct their knowledge, collaboration and communication. In its first years the programme has used Skype video communication for collaboration and communication within and between groups, group members and their facilitators. Also exams have been mediated with the help of Skype and have for all students......, examiners and external examiners been a challenge and opportunity and has brought new knowledge and experience. This paper brings results from a questionnaire focusing on how the students experience the video examination....
B2 • Filing: 2005/03/10 • Issued: 2009/02/17 • Assignee: George Mason intellectual Properties Inc. • Inventors: Fayin Li, Harry Wechsler The open set...surveillance system • Agency: United States Patent • Number: US 7,634,662 B2 • Filing: 2003/11/21 • Issued: 2009/12/15 • Assignee: David A. Monroe...7800 IH-10 West, #700, San Antonio, TX (US) 78230. • Inventors: David A. Monroe The basic embodiment of this patent is detailed in Fig. 9, where a
El reconocimiento de caras, junto con la identificación de las acciones y gestos humanos, es en la actualidad una de las aplicaciones informáticas, más exitosas de análisis automatizado del comportamiento humano. Durante los últimos diez años aproximadamente, se ha convertido en un área muy popular de la investigación en computer vision y ha recibido mucha atención por parte de las organizaciones internacionales (Thumos, ChaLearn, etc).  El sistema de reconocimiento facial es una aplicació...
van Rootseler, R.T.A.; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.
The 3D Morphable Face Model (3DMM) has been used for over a decade for creating 3D models from single images of faces. This model is based on a PCA model of the 3D shape and texture generated from a limited number of 3D scans. The goal of fitting a 3DMM to an image is to find the model coefficients,
Kruithof, M.C.; Bouma, H.; Fischer, N.M.; Schutte, K.
Object recognition is important to understand the content of video and allow flexible querying in a large number of cameras, especially for security applications. Recent benchmarks show that deep convolutional neural networks are excellent approaches for object recognition. This paper describes an
Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya
This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…
Zuo, F.; With, de P.H.N.; Ebrahimi, T.; Sikora, T.
In a home environment, video surveillance employing face detection and recognition is attractive for new applications. Facial feature (e.g. eyes and mouth) localization in the face is an essential task for face recognition because it constitutes an indispensable step for face geometry normalization.
Guo, Jian-xin; Zhao, Ji-chun; Gong, Jing; Chun, Yang
As 3G (3rd-generation) networks evolve worldwide, the rising demand for mobile video services and the enormous growth of video on the internet is creating major new revenue opportunities for mobile network operators and application developers. The text introduced a method of mobile video transmission based on J2ME, giving the method of video compressing, then describing the video compressing standard, and then describing the software design. The proposed mobile video method based on J2EE is a typical mobile multimedia application, which has a higher availability and a wide range of applications. The users can get the video through terminal devices such as phone.
... text. What's the Big Deal? The problem is multitasking. No matter how young and agile we are, ... on something other than the road. In fact, driving while texting (DWT) can be more dangerous than ...
Soleymani, Mohammad; Lichtenauer, Jeroen; Pun, Thierry; Pantic, Maja
MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system
Poppe, Ronald Walter
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion
Du, Yingzi; Arslanturk, Emrah; Zhou, Zhi; Belcher, Craig
In this paper, we propose a video-based noncooperative iris image segmentation scheme that incorporates a quality filter to quickly eliminate images without an eye, employs a coarse-to-fine segmentation scheme to improve the overall efficiency, uses a direct least squares fitting of ellipses method to model the deformed pupil and limbic boundaries, and develops a window gradient-based method to remove noise in the iris region. A remote iris acquisition system is set up to collect noncooperative iris video images. An objective method is used to quantitatively evaluate the accuracy of the segmentation results. The experimental results demonstrate the effectiveness of this method. The proposed method would make noncooperative iris recognition or iris surveillance possible.
In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…
Smith, Carolynn L; Taubert, Jessica; Weldon, Kimberly; Evans, Christopher S
Correctly directing social behaviour towards a specific individual requires an ability to discriminate between conspecifics. The mechanisms of individual recognition include phenotype matching and familiarity-based recognition. Communication-based recognition is a subset of familiarity-based recognition wherein the classification is based on behavioural or distinctive signalling properties. Male fowl (Gallus gallus) produce a visual display (tidbitting) upon finding food in the presence of a female. Females typically approach displaying males. However, males may tidbit without food. We used the distinctiveness of the visual display and the unreliability of some males to test for communication-based recognition in female fowl. We manipulated the prior experience of the hens with the males to create two classes of males: S(+) wherein the tidbitting signal was paired with a food reward to the female, and S (-) wherein the tidbitting signal occurred without food reward. We then conducted a sequential discrimination test with hens using a live video feed of a familiar male. The results of the discrimination tests revealed that hens discriminated between categories of males based on their signalling behaviour. These results suggest that fowl possess a communication-based recognition system. This is the first demonstration of live-to-video transfer of recognition in any species of bird. Copyright © 2016 Elsevier B.V. All rights reserved.
Jain, Anil K
This report describes research efforts towards developing algorithms for a robust face recognition system to overcome many of the limitations found in existing two-dimensional facial recognition systems...
Ismail Amin Ali
Full Text Available A compressed video bitstream can be partitioned according to the coding priority of the data, allowing prioritized wireless communication or selective dropping in a congested channel. Known as data partitioning in the H.264/Advanced Video Coding (AVC codec, this paper introduces a further sub-partition of one of the H.264/AVC codec’s three data-partitions. Results show a 5 dB improvement in Peak Signal-to-Noise Ratio (PSNR through this innovation. In particular, the data partition containing intra-coded residuals is sub-divided into data from: those macroblocks (MBs naturally intra-coded, and those MBs forcibly inserted for non-periodic intra-refresh. Interactive user-to-user video streaming can benefit, as then HTTP adaptive streaming is inappropriate and the High Efficiency Video Coding (HEVC codec is too energy demanding.
Full Text Available Big data makes cloud computing more and more popular in various fields. Video resources are very useful and important to education, security monitoring, and so on. However, issues of their huge volumes, complex data types, inefficient processing performance, weak security, and long times for loading pose challenges in video resource management. The Hadoop Distributed File System (HDFS is an open-source framework, which can provide cloud-based platforms and presents an opportunity for solving these problems. This paper presents video resource management architecture based on HDFS to provide a uniform framework and a five-layer model for standardizing the current various algorithms and applications. The architecture, basic model, and key algorithms are designed for turning video resources into a cloud computing environment. The design was tested by establishing a simulation system prototype.
Ruxandra Claudia CHIRCA (NEACȘU
Full Text Available In nowadays' world, technological assistance is no longer confined to its primary purpose of communication or informational support and the boundaries between real and virtual world are becoming increasingly harder to be defined. This is the world of digital natives, today's children, who grow up in a technology-brimming environment and who spend most of their time playing video games. Are these video games constructive in any way? Scientific studies state they are. Video games help children in setting their goals, provide constant feedback and offer immediate rewards, along with the opportunity to collaborate with other players. Furthermore, video games can generate strong emotional reactions, such as joy or fear, and they have a captivating story line, which reveals itself within a realm of elaborate graphics.
Full Text Available Many algorithms for temporal video partitioning rely on the analysis of uncompressed video features. Since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream, higher efficiency can be achieved utilizing information from the MPEG compressed domain. This paper introduces a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence. Results of computer simulations are reported.
K.C. , Santosh; Wendling , Laurent
International audience; The chapter focuses on one of the key issues in document image processing i.e., graphical symbol recognition. Graphical symbol recognition is a sub-field of a larger research domain: pattern recognition. The chapter covers several approaches (i.e., statistical, structural and syntactic) and specially designed symbol recognition techniques inspired by real-world industrial problems. It, in general, contains research problems, state-of-the-art methods that convey basic s...
Heilbron, Fabian Caba; Niebles, Juan Carlos; Ghanem, Bernard
In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos. Current approaches for activity detection still struggle to handle large-scale video collections and the task remains relatively unexplored. This is in part due to the computational complexity of current action recognition approaches and the lack of a method that proposes fewer intervals in the video, where activity processing can be focused. In this paper, we introduce a proposal method that aims to recover temporal segments containing actions in untrimmed videos. Building on techniques for learning sparse dictionaries, we introduce a learning framework to represent and retrieve activity proposals. We demonstrate the capabilities of our method in not only producing high quality proposals but also in its efficiency. Finally, we show the positive impact our method has on recognition performance when it is used for action detection, while running at 10FPS.
Heilbron, Fabian Caba
In many large-scale video analysis scenarios, one is interested in localizing and recognizing human activities that occur in short temporal intervals within long untrimmed videos. Current approaches for activity detection still struggle to handle large-scale video collections and the task remains relatively unexplored. This is in part due to the computational complexity of current action recognition approaches and the lack of a method that proposes fewer intervals in the video, where activity processing can be focused. In this paper, we introduce a proposal method that aims to recover temporal segments containing actions in untrimmed videos. Building on techniques for learning sparse dictionaries, we introduce a learning framework to represent and retrieve activity proposals. We demonstrate the capabilities of our method in not only producing high quality proposals but also in its efficiency. Finally, we show the positive impact our method has on recognition performance when it is used for action detection, while running at 10FPS.
Full Text Available Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures.We first evaluated an off-the-shelf Optical Character Recognition (OCR tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons.The evaluation on 382 figures (9,643 figure texts in total randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for
Full Text Available In this research work, we proposed a most effective noble approach for Human activity recognition in real-time environments. We recognize several distinct dynamic human activity actions using kinect. A 3D skeleton data is processed from real-time video gesture to sequence of frames and getter skeleton joints (Energy Joints, orientation, rotations of joint angles from selected setof frames. We are using joint angle and orientations, rotations information from Kinect therefore less computation required. However, after extracting the set of frames we implemented several classification techniques Principal Component Analysis (PCA with several distance based classifiers and Artificial Neural Network (ANN respectively with some variants for classify our all different gesture models. However, we conclude that use very less number of frame (10-15% for train our system efficiently from the entire set of gesture frames. Moreover, after successfully completion of our classification methods we clinch an excellent overall accuracy 94%, 96% and 98% respectively. We finally observe that our proposed system is more useful than comparing to other existing system, therefore our model is best suitable for real-time application such as in video games for player action/gesture recognition.
Muhammad Latif Anjum
Full Text Available We present a robust algorithm for complex human activity recognition for natural human-robot interaction. The algorithm is based on tracking the position of selected joints in human skeleton. For any given activity, only a few skeleton joints are involved in performing the activity, so a subset of joints contributing the most towards the activity is selected. Our approach of tracking a subset of skeleton joints (instead of tracking the whole skeleton is computationally efficient and provides better recognition accuracy. We have developed both manual and automatic approaches for the selection of these joints. The position of the selected joints is tracked for the duration of the activity and is used to construct feature vectors for each activity. Once the feature vectors have been constructed, we use a Support Vector Machines (SVM multiclass classifier for training and testing the algorithm. The algorithm has been tested on a purposely built dataset of depth videos recorded using Kinect camera. The dataset consists of 250 videos of 10 different activities being performed by different users. Experimental results show classification accuracy of 83% when tracking all skeleton joints, 95% when using manual selection of subset joints, and 89% when using automatic selection of subset joints.
Muhammad Hameed Siddiqi
Full Text Available Research in video based FER systems has exploded in the past decade. However, most of the previous methods work well when they are trained and tested on the same dataset. Illumination settings, image resolution, camera angle, and physical characteristics of the people differ from one dataset to another. Considering a single dataset keeps the variance, which results from differences, to a minimum. Having a robust FER system, which can work across several datasets, is thus highly desirable. The aim of this work is to design, implement, and validate such a system using different datasets. In this regard, the major contribution is made at the recognition module which uses the maximum entropy Markov model (MEMM for expression recognition. In this model, the states of the human expressions are modeled as the states of an MEMM, by considering the video-sensor observations as the observations of MEMM. A modified Viterbi is utilized to generate the most probable expression state sequence based on such observations. Lastly, an algorithm is designed which predicts the expression state from the generated state sequence. Performance is compared against several existing state-of-the-art FER systems on six publicly available datasets. A weighted average accuracy of 97% is achieved across all datasets.
Chang, Edward Y.; Wang, Yuan-Fang; Rodoplu, Volkan
This paper presents an event sensing paradigm for intelligent event-analysis in a wireless, ad hoc, multi-camera, video surveillance system. In particilar, we present statistical methods that we have developed to support three aspects of event sensing: 1) energy-efficient, resource-conserving, and robust sensor data fusion and analysis, 2) intelligent event modeling and recognition, and 3) rapid deployment, dynamic configuration, and continuous operation of the camera networks. We outline our preliminary results, and discuss future directions that research might take.
CERN video productions
"What's new @ CERN?", a new monthly video programme, will be broadcast on the Monday of every month on webcast.cern.ch. Aimed at the general public, the programme will cover the latest CERN news, with guests and explanatory features. Tune in on Monday 3 October at 4 pm (CET) to see the programme in English, and then at 4:20 pm (CET) for the French version. var flash_video_player=get_video_player_path(); insert_player_for_external('Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0753-kbps-640x360-25-fps-audio-64-kbps-44-kHz-stereo', 'mms://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-Multirate-200-to-753-kbps-640x360-25-fps.wmv', 'false', 480, 360, 'https://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-posterframe-640x360-at-10-percent.jpg', '1383406', true, 'Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0600-kbps-maxH-360-25-fps-...
Spampinato, Concetto; Palazzo, Simone; Giordano, Daniela
Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare the performance of automated methods with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging and large-scale tasks. In particular, our method relies on a game with a purpose to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided location priors. Performance analysis carried out on complex video benchmarks, and exploiting data provided by over 60 users, demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches.
Ratakonda, Krishna; Sezan, M. Ibrahim; Crinon, Regis J.
We address the problem of key-frame summarization of vide in the absence of any a priori information about its content. This is a common problem that is encountered in home videos. We propose a hierarchical key-frame summarization algorithm where a coarse-to-fine key-frame summary is generated. A hierarchical key-frame summary facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of the video with increasingly more detail. At the finest level, the summary is generated on the basis of color features of video frames, using an extension of a recently proposed key-frame extraction algorithm. The finest level key-frames are recursively clustered using a novel pairwise K-means clustering approach with temporal consecutiveness constraint. We also address summarization of MPEG-2 compressed video without fully decoding the bitstream. We also propose efficient mechanisms that facilitate decoding the video when the hierarchical summary is utilized in browsing and playback of video segments starting at selected key-frames.
Dańda, Jacek; Juszkiewicz, Krzysztof; Leszczuk, Mikołaj; Loziak, Krzysztof; Papir, Zdzisław; Sikora, Marek; Watza, Rafal
The paper discusses two implementation options for a Digital Video Library, a repository used for archiving, accessing, and browsing of video medical records. Two crucial issues to be decided on are a video compression format and a video streaming platform. The paper presents numerous decision factors that have to be taken into account. The compression formats being compared are DICOM as a format representative for medical applications, both MPEGs, and several new formats targeted for an IP networking. The comparison includes transmission rates supported, compression rates, and at least options for controlling a compression process. The second part of the paper presents the ISDN technique as a solution for provisioning of tele-consultation services between medical parties that are accessing resources uploaded to a digital video library. There are several backbone techniques (like corporate LANs/WANs, leased lines or even radio/satellite links) available, however, the availability of network resources for hospitals was the prevailing choice criterion pointing to ISDN solutions. Another way to provide access to the Digital Video Library is based on radio frequency domain solutions. The paper describes possibilities of both, wireless and cellular network's data transmission service to be used as a medical video server transport layer. For the cellular net-work based solution two communication techniques are used: Circuit Switched Data and Packet Switched Data.
Full Text Available Near Set Theory has various applications in the literature. In this paper, using the concept descriptive nearness, we show how to perform iris recognition. This process has a few algorithms given via Mathematica Script Language.
Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.Â It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.
This study investigated the associations between emotion recognition ability and autistic traits in a sample of non-clinical young adults. Two hundred and forty nine individuals took part in an emotion recognition test, which assessed recognition of 12 emotions portrayed by actors. Emotion portrayals were presented as short video clips, both with and without sound, and as sound only. Autistic traits were assessed using the Autism Spectrum Quotient (ASQ) questionnaire. Results showed that men ...
Panda, Rameswar; Roy-Chowdhury, Amit K.
Networks of vision sensors are deployed in many settings, ranging from security needs to disaster response to environmental monitoring. Many of these setups have hundreds of cameras and tens of thousands of hours of video. The difficulty of analyzing such a massive volume of video data is apparent whenever there is an incident that requires foraging through vast video archives to identify events of interest. As a result, video summarization, that automatically extract a brief yet informative summary of these videos, has attracted intense attention in the recent years. Much progress has been made in developing a variety of ways to summarize a single video in form of a key sequence or video skim. However, generating a summary from a set of videos captured in a multi-camera network still remains as a novel and largely under-addressed problem. In this paper, with the aim of summarizing videos in a camera network, we introduce a novel representative selection approach via joint embedding and capped l21-norm minimization. The objective function is two-fold. The first is to capture the structural relationships of data points in a camera network via an embedding, which helps in characterizing the outliers and also in extracting a diverse set of representatives. The second is to use a capped l21-norm to model the sparsity and to suppress the influence of data outliers in representative selection. We propose to jointly optimize both of the objectives, such that embedding can not only characterize the structure, but also indicate the requirements of sparse representative selection. Extensive experiments on standard multi-camera datasets well demonstrate the efficacy of our method over state-of-the-art methods.
Fuertes-Olivera, Pedro; Bergenholtz, Henning
Dictionaries for Text Production are information tools that are designed and constructed for helping users to produce (i.e. encode) texts, both oral and written texts. These can be broadly divided into two groups: (a) specialized text production dictionaries, i.e., dictionaries that only offer...... a small amount of lexicographic data, most or all of which are typically used in a production situation, e.g. synonym dictionaries, grammar and spelling dictionaries, collocation dictionaries, concept dictionaries such as the Longman Language Activator, which is advertised as the World’s First Production...... Dictionary; (b) general text production dictionaries, i.e., dictionaries that offer all or most of the lexicographic data that are typically used in a production situation. A review of existing production dictionaries reveals that there are many specialized text production dictionaries but only a few general...
Full Text Available A curated selection of remix videos that edit pop culture texts and recut them into new works that explore themes of gender and sexual representation, or create new LGBTQ narratives from the original source material.
Full Text Available ... About About the Veterans Crisis Line FAQs Veteran Suicide Spread the Word Videos Homeless Resources Additional Information ... About About the Veterans Crisis Line FAQs Veteran Suicide The Veterans Crisis Line text-messaging service does ...
Joshi, V.M.; Agashe, Alok; Bairi, B.R.
This report provides technical description regarding the Video Frame Processor (VFP) developed at Bhabha Atomic Research Centre. The instrument provides capture of video images available in CCIR format. Two memory planes each with a capacity of 512 x 512 x 8 bit data enable storage of two video image frames. The stored image can be processed on-line and on-line image subtraction can also be carried out for image comparisons. The VFP is a PC Add-on board and is I/O mapped within the host IBM PC/AT compatible computer. (author). 9 refs., 4 figs., 19 photographs
This book presents a complete pipeline forHDR image and video processing fromacquisition, through compression and quality evaluation, to display. At the HDR image and video acquisition stage specialized HDR sensors or multi-exposure techniques suitable for traditional cameras are discussed. Then, we present a practical solution for pixel values calibration in terms of photometric or radiometric quantities, which are required in some technically oriented applications. Also, we cover the problem of efficient image and video compression and encoding either for storage or transmission purposes, in
Lucas, Laurent; Loscos, Céline
While 3D vision has existed for many years, the use of 3D cameras and video-based modeling by the film industry has induced an explosion of interest for 3D acquisition technology, 3D content and 3D displays. As such, 3D video has become one of the new technology trends of this century.The chapters in this book cover a large spectrum of areas connected to 3D video, which are presented both theoretically and technologically, while taking into account both physiological and perceptual aspects. Stepping away from traditional 3D vision, the authors, all currently involved in these areas, provide th
Henningsen, Birgitte; Gundersen, Peter Bukovica; Hautopp, Heidi
This paper introduces to what we define as a collaborative video sketching process. This process links various sketching techniques with digital storytelling approaches and creative reflection processes in video productions. Traditionally, sketching has been used by designers across various...... findings: 1) They are based on a collaborative approach. 2) The sketches act as a mean to externalizing hypotheses and assumptions among the participants. Based on our analysis we present an overview of factors involved in collaborative video sketching and shows how the factors relate to steps, where...... the participants: shape, record, review and edit their work, leading the participants to new insights about their work....
Westerberg, Andreas Rytter; Schoenau-Fog, Henrik
they can use audio in video games. The conclusion of this study is that the current models' view of the diegetic spaces, used to categorize video game audio, is not t to categorize all sounds. This can however possibly be changed though a rethinking of how the player interprets audio.......This paper dives into the subject of video game audio and how it can be categorized in order to deliver a message to a player in the most precise way. A new categorization, with a new take on the diegetic spaces, can be used a tool of inspiration for sound- and game-designers to rethink how...
Djoni Haryadi Setiabudi
Full Text Available Technology development had given people the chance to capture their memorable moments in video format. A high quality digital video is a result of a good editing process. Which in turn, arise the new need of an editor application. In accordance to the problem, here the process of making a simple application for video editing needs. The application development use the programming techniques often applied in multimedia applications, especially video. First part of the application will begin with the video file compression and decompression, then we'll step into the editing part of the digital video file. Furthermore, the application also equipped with the facilities needed for the editing processes. The application made with Microsoft Visual C++ with DirectX technology, particularly DirectShow. The application provides basic facilities that will help the editing process of a digital video file. The application will produce an AVI format file after the editing process is finished. Through the testing process of this application shows the ability of this application to do the 'cut' and 'insert' of video files in AVI, MPEG, MPG and DAT formats. The 'cut' and 'insert' process only can be done in static order. Further, the aplication also provide the effects facility for transition process in each clip. Lastly, the process of saving the new edited video file in AVI format from the application. Abstract in Bahasa Indonesia : Perkembangan teknologi memberi kesempatan masyarakat untuk mengabadikan saat - saat yang penting menggunakan video. Pembentukan video digital yang baik membutuhkan proses editing yang baik pula. Untuk melakukan proses editing video digital dibutuhkan program editor. Berdasarkan permasalahan diatas maka pada penelitian ini dibuat prototipe editor sederhana untuk video digital. Pembuatan aplikasi memakai teknik pemrograman di bidang multimedia, khususnya video. Perencanaan dalam pembuatan aplikasi tersebut dimulai dengan pembentukan
A starter which teaches the basic tasks to be performed with Sublime Text with the necessary practical examples and screenshots. This book requires only basic knowledge of the Internet and basic familiarity with any one of the three major operating systems, Windows, Linux, or Mac OS X. However, as Sublime Text 2 is primarily a text editor for writing software, many of the topics discussed will be specifically relevant to software development. That being said, the Sublime Text 2 Starter is also suitable for someone without a programming background who may be looking to learn one of the tools of
Zheng, Jingjing; Jiang, Zhuolin; Chellappa, Rama
Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.
Peña, Raul; Ávila, Alfonso; Muñoz, David; Lavariega, Juan
The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal's samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of -2.6196% for high and medium motion video sequences.
Chiang, Jeffrey N.; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin
Automated Target Recognition (ATR) systems aim to automate target detection, recognition, and tracking. The current project applies a JPL ATR system to low-resolution sonar and camera videos taken from unmanned vehicles. These sonar images are inherently noisy and difficult to interpret, and pictures taken underwater are unreliable due to murkiness and inconsistent lighting. The ATR system breaks target recognition into three stages: 1) Videos of both sonar and camera footage are broken into frames and preprocessed to enhance images and detect Regions of Interest (ROIs). 2) Features are extracted from these ROIs in preparation for classification. 3) ROIs are classified as true or false positives using a standard Neural Network based on the extracted features. Several preprocessing, feature extraction, and training methods are tested and discussed in this paper.
Full Text Available The paper deals with utilization of common Virtual Path (VP for variable bit rate (VBR video service. Video service is one of the main services for broadband networks. Research is oriented to statistical properties of common and separate VPs. Separate VP means that for each VBR traffic source one VP will be allocated. Common VP means that for multiple VBR sources one common VP is allocated. VBR video traffic source is modeled by discrete Markov chain.
Full Text Available Biometric systems are getting more attention in the present era. Iris recognition is one of the most secure and authentic among the other biometrics and this field demands more authentic, reliable and fast algorithms to implement these biometric systems in real time. In this paper, an efficient localization technique is presented to identify pupil and iris boundaries using histogram of the iris image. Two small portions of iris have been used for polar transformation to reduce computational time and to increase the efficiency of the system. Wavelet transform is used for feature vector generation. Rotation of iris is compensated without shifts in the iris code. System is tested on Multimedia University Iris Database and results show that proposed system has encouraging performance.
Recognition and toleration are ways of relating to the diversity characteristic of multicultural societies. The article concerns the possible meanings of toleration and recognition, and the conflict that is often claimed to exist between these two approaches to diversity. Different forms...... or interpretations of recognition and toleration are considered, confusing and problematic uses of the terms are noted, and the compatibility of toleration and recognition is discussed. The article argues that there is a range of legitimate and importantly different conceptions of both toleration and recognition...
Full Text Available This paper seeks to provide an approach for subjective video quality assessment in the H.264/AVC standard. For this purpose a special software program for the subjective assessment of quality of all the tested video sequences is developed. It was developed in accordance with recommendation ITU-T P.910, since it is suitable for the testing of multimedia applications. The obtained results show that in the proposed selective intra prediction and optimized inter prediction algorithm there is a small difference in picture quality (signal-to-noise ratio between decoded original and modified video sequences.
A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'.......A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'....
Cejuela, Juan Miguel; Vinchurkar, Shrikant; Goldberg, Tatyana
trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast...
To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies.......To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies....
A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…
This is a review of the web resource 'Text 2 Mind Map' www.Text2MindMap.com. It covers what the resource is, and how it might be used in Library and education context, in particular for School Librarians.
Kotler, R. S.
File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.
Video has been a game-changer in how US forces are able to find, track and defeat its adversaries. With millions of minutes of video being generated from an increasing number of sensor platforms, the DOD has stated that the rapid increase in video is overwhelming their analysts. The manpower required to view and garner useable information from the flood of video is unaffordable, especially in light of current fiscal restraints. "Search" within full-motion video has traditionally relied on human tagging of content, and video metadata, to provision filtering and locate segments of interest, in the context of analyst query. Our approach utilizes a novel machine-vision based approach to index FMV, using object recognition & tracking, events and activities detection. This approach enables FMV exploitation in real-time, as well as a forensic look-back within archives. This approach can help get the most information out of video sensor collection, help focus the attention of overburdened analysts form connections in activity over time and conserve national fiscal resources in exploiting FMV.
Siti Murtiningsih, Joko Siswanto, M. Mukhtasar Syamsudin
Full Text Available Abstract: Education Problems of Video Games in the Perspective of Jean Baudrillard’s Theory of Simulacra. In the era of digital age, we are witnessing how video games penetrate children’s daily life and it is believed to have some impacts on their cognitive and affective processes. Referring to hermeneutical approach, the present library research seeks to answer the question whether video games create a real identity or simply forge false consciousness in children. In the first step, the data were collected from bibliographical sources that related to data. In the second step, the data were analyzed to examine the pedagogical-philosophical properties of the video games. The results indicate that video games change the way children view the world. Video games present the world as hiper-reality. Bu putting aside the negative values and maximizing the positive ones, the understanding of hiper-reality allows for the inculcation of children. Abstrak: Problem Pendidikan Video Games dalam Perspectif Teori Simulacra Jean Baudrillard. Permainan video games diyakini berdampak positif sekaligus negatif pada proses kognitif dan afektif anak.-anak. Terutama, video games berpengaruh pada proses internalisasi nilai-nilai dan pembentukan identitas mereka. Teknologi virtual yang disajikan oleh video games, seperti didekati oleh teori simulacra Jean Baudrillard, menyuguhkan jebakan akan realitas palsu. Melalui riset pustaka dengan metode "filsafat hermeneutis", dianalisis data untuk membangun refleksi filsafat pendidikan atas permainan video games itu. Hasil penelitian ini menyatakan video games menyuguhkan sebuah hiper-realitas dari simulasi realitas, atau simulacra dalam teori Jean Baudrillard. Simulacra adalah dunia yang terbentuk dari salinan realitas, yang menjadi acuan melebihi realitas asli. Disimpulkan bahwa video games menjadi semacam “ruang konseptual”, yang dibentuk oleh simulacra. Dengan mengenali hakikat hiper-realitas, video games dapat
Addresses the role of video games in the lives of adolescents. Considers the significance of video games both as cultural "texts" and organized social activities. Examines motivations, practical interests, and behavior of suppliers and consumers of these products. (CMG)
National Oceanic and Atmospheric Administration, Department of Commerce — Observers are required to take photos and/or videos of all incidentally caught sea turtles, marine mammals, seabirds and unusual or rare fish. On the first 3...