Suominen, Olli; Gotchev, Atanas
Capturing images in low light intensity, and preserving ambient light in such conditions pose significant problems in terms of achievable image quality. Either the sensitivity of the sensor must be increased, filling the resulting image with noise, or the scene must be lit with artificial light, destroying the aesthetic quality of the image. While the issue has been previously tackled for still imagery using cross-bilateral filtering, the same problem exists in capturing video. We propose a method of illuminating the scene with a strobe light synchronized to every other frame captured by the camera, and merging the information from consecutive frames alternating between high gain and high intensity lighting. The motion between the frames is compensated using motion estimation based on block matching between strobe-illuminated frames. The uniform lighting conditions between every other frame make it possible to utilize conventional motion estimation methods, circumventing the image registration challenges faced in fusing flash/non-flash pairs from non-stationary images. The results of the proposed method are shown to closely resemble those computed using the same filter based on reference images captured at perfect camera alignment. The method can be applied starting from a simple set of three frames to video streams of arbitrary lengths with the only requirements being sufficiently accurate syncing between the imaging device and the lighting unit, and the capability to switch states (sensor gain high/low, illumination on/off) fast enough.
Seo, Young-Ho; Lee, Yoon-Hyuk; Koo, Ja-Myung; Kim, Woo-Youl; Yoo, Ji-Sang; Kim, Dong-Wook
We propose a new system that can generate digital holograms using natural color information. The system consists of a camera system for capturing images (object points) and software (S/W) for various image processing. The camera system uses a vertical rig, which is equipped with two depth and RGB cameras and a cold mirror, which has different reflectances according to wavelength for obtaining images with the same viewpoint. The S/W is composed of the engines for processing the captured images and executing computer-generated hologram for generating digital holograms using general-purpose graphics processing units. Each algorithm was implemented using C/C++ and CUDA languages, and all engines in the form of library were integrated in LabView environment. The proposed system can generate about 10 digital holographic frames per second using about 6 K object points.
Full Text Available This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.
Winterlich, Anthony; Denny, Patrick; Kilmartin, Liam; Glavin, Martin; Jones, Edward
We evaluate the effects of transmission artifacts such as JPEG compression and additive white Gaussian noise on the performance of a state-of-the-art pedestrian detection algorithm, which is based on integral channel features. Integral channel features combine the diversity of information obtained from multiple image channels with the computational efficiency of the Viola and Jones detection framework. We utilize "quality aware" spatial image statistics to blindly categorize distorted video frames by distortion type and level without the use of an explicit reference. We combine quality statistics with a multiclassifier detection framework for optimal pedestrian detection performance across varying image quality. Our detection method provides statistically significant improvements over current approaches based on single classifiers, on two large pedestrian databases containing a wide variety of artificially added distortion. The improvement in detection performance is further demonstrated on real video data captured from multiple cameras containing varying levels of sensor noise and compression. The results of our research have the potential to be used in real-time in-vehicle networks to improve pedestrian detection performance across a wide range of image and video quality.
Azizi, Elham; Abel, Larry A; Stainer, Matthew J
Action game playing has been associated with several improvements in visual attention tasks. However, it is not clear how such changes might influence the way we overtly select information from our visual world (i.e. eye movements). We examined whether action-video-game training changed eye movement behaviour in a series of visual search tasks including conjunctive search (relatively abstracted from natural behaviour), game-related search, and more naturalistic scene search. Forty nongamers were trained in either an action first-person shooter game or a card game (control) for 10 hours. As a further control, we recorded eye movements of 20 experienced action gamers on the same tasks. The results did not show any change in duration of fixations or saccade amplitude either from before to after the training or between all nongamers (pretraining) and experienced action gamers. However, we observed a change in search strategy, reflected by a reduction in the vertical distribution of fixations for the game-related search task in the action-game-trained group. This might suggest learning the likely distribution of targets. In other words, game training only skilled participants to search game images for targets important to the game, with no indication of transfer to the more natural scene search. Taken together, these results suggest no modification in overt allocation of attention. Either the skills that can be trained with action gaming are not powerful enough to influence information selection through eye movements, or action-game-learned skills are not used when deciding where to move the eyes.
Zhou, Wensheng; Shen, Ye; Vellaikal, Asha; Kuo, C.-C. Jay
Many multimedia applications, such as multimedia data management systems and communication systems, require efficient representation of multimedia content. Thus semantic interpretation of video content has been a popular research area. Currently, most content-based video representation involves the segmentation of video based on key frames which are generated using scene change detection techniques as well as camera/object motion. Then, video features can be extracted from key frames. However most of such research performs off-line video processing in which the whole video scope is known as a priori which allows multiple scans of the stored video files during video processing. In comparison, relatively not much research has been done in the area of on-line video processing, which is crucial in video communication applications such as on-line collaboration, news broadcasts and so on. Our research investigates on-line real-time scene change detection of multicast video over the Internet. Our on-line processing system are designed to meet the requirements of real-time video multicasting over the Internet and to utilize the successful video parsing techniques available today. The proposed algorithms extract key frames from video bitstreams sent through the MBone network, and the extracted key frames are multicasted as annotations or metadata over a separate channel to assist in content filtering such as those anticipated to be in use by on-line filtering proxies in the Internet. The performance of the proposed algorithms are demonstrated and discussed in this paper.
Lewicki, Michael S; Olshausen, Bruno A; Surlykke, Annemarie
that hinder further progress. Here we take the view that scene analysis is a universal problem solved by all animals, and that we can gain new insight by studying the problems that animals face in complex natural environments. In particular, the jumping spider, songbird, echolocating bat, and electric fish......, all exhibit behaviors that require robust solutions to scene analysis problems encountered in the natural environment. By examining the behaviors of these seemingly disparate animals, we emerge with a framework for studying scene analysis comprising four essential properties: (1) the ability to solve...
Hansen, Morten; Sørensen, Helge Bjarup Dissing; Birkemark, Christian M.
This paper concerns automatic video surveillance of outdoor scenes using a single camera. The first step in automatic interpretation of the video stream is activity detection based on background subtraction. Usually, this process will generate a large number of false alarms in outdoor scenes due ...... if a detected object shows a pattern of movement consistent with predefined rules. The method is tested on a number of video sequences and a substantial reduction in the number of false alarms is demonstrated.......This paper concerns automatic video surveillance of outdoor scenes using a single camera. The first step in automatic interpretation of the video stream is activity detection based on background subtraction. Usually, this process will generate a large number of false alarms in outdoor scenes due...... to e.g. movement of thicket and changes in illumination. To reduce the number of false alarms a Track Before Detect (TBD) approach is suggested. In this TBD implementation all objects detected in the background subtraction process are followed over a number of frames. An alarm is given only...
Full Text Available When we explore a visual scene, our eyes make saccades to jump rapidly from one area to another and fixate regions of interest to extract useful information. While the role of fixation eye movements in vision has been widely studied, their random nature has been a hitherto neglected issue. Here we conducted two experiments to examine the Maxwellian nature of eye movements during fixation. In Experiment 1, eight participants were asked to perform free viewing of natural scenes displayed on a computer screen while their eye movements were recorded. For each participant, the probability density function (PDF of eye movement amplitude during fixation obeyed the law established by Maxwell for describing molecule velocity in gas. Only the mean amplitude of eye movements varied with expertise, which was lower in experts than novice participants. In Experiment 2, two participants underwent fixed time, free viewing of natural scenes and of their scrambled version while their eye movements were recorded. Again, the PDF of eye movement amplitude during fixation obeyed Maxwell’s law for each participant and for each scene condition (normal or scrambled. The results suggest that eye fixation during natural scene perception describes a random motion regardless of top-down or of bottom-up processes.
Full Text Available In fixed video scenes, scene motion patterns can be a very useful prior knowledge for pedestrian detection which is still a challenge at present. A new approach of cascade pedestrian detection using an orthogonal scene motion pattern model in a general density video is developed in this paper. To statistically model the pedestrian motion pattern, a probability grid overlaying the whole scene is set up to partition the scene into paths and holding areas. Features extracted from different pattern areas are classified by a group of specific strategies. Instead of using a unitary classifier, the employed classifier is composed of two directional subclassifiers trained, respectively, with different samples which are selected by two orthogonal directions. Considering that the negative images from the detection window scanning are much more than the positive ones, the cascade AdaBoost technique is adopted by the subclassifiers to reduce the negative image computations. The proposed approach is proved effectively by static classification experiments and surveillance video experiments.
Michael S Lewicki
Full Text Available The problem of scene analysis has been studied in a number of different fields over the past decades. These studies have led to a number of important insights into problems of scene analysis, but not all of these insights are widely appreciated. Despite this progress, there are also critical shortcomings in current approaches that hinder further progress. Here we take the view that scene analysis is a universal problem solved by all animals, and that we can gain new insight by studying the problems that animals face in complex natural environments. In particular, the jumping spider, songbird, echolocating bat, and electric fish, all exhibit behaviors that require robust solutions to scene analysis problems encountered in the natural environment. By examining the behaviors of these seemingly disparate animals, we emerge with a framework for studying analysis comprising four essential properties: 1 the ability to solve ill-posed problems, 2 the ability to integrate and store information across time and modality, 3 efficient recovery and representation of 3D scene structure, and 4 the use of optimal motor actions for acquiring information to progress towards behavioral goals.
Howard, Christina J; Gilchrist, Iain D; Troscianko, Tom; Behera, Ardhendu; Hogg, David C
Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382-390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and where the 'target' for the visual search is abstract and semantic in nature. Here, we investigate this issue when participants continuously search an array of four closed-circuit television (CCTV) screens for suspicious events. We recorded eye movements whilst participants watched real CCTV footage and moved a joystick to continuously indicate perceived suspiciousness. We find that when multiple areas of a display compete for attention, gaze is allocated according to relative levels of reported suspiciousness. Furthermore, this measure of task relevance accounted for twice the amount of variance in gaze likelihood as the amount of low-level visual changes over time in the video stimuli.
Peng, Xiulian; Xu, Jizheng; Sullivan, Gary J.
Perspective motion is commonly represented in video content that is captured and compressed for various applications including cloud gaming, vehicle and aerial monitoring, etc. Existing approaches based on an eight-parameter homography motion model cannot deal with this efficiently, either due to low prediction accuracy or excessive bit rate overhead. In this paper, we consider the camera motion model and scene structure in such video content and propose a joint global and local homography motion coding approach for video with perspective motion. The camera motion is estimated by a computer vision approach, and camera intrinsic and extrinsic parameters are globally coded at the frame level. The scene is modeled as piece-wise planes, and three plane parameters are coded at the block level. Fast gradient-based approaches are employed to search for the plane parameters for each block region. In this way, improved prediction accuracy and low bit costs are achieved. Experimental results based on the HEVC test model show that up to 9.1% bit rate savings can be achieved (with equal PSNR quality) on test video content with perspective motion. Test sequences for the example applications showed a bit rate savings ranging from 3.7 to 9.1%.
Niu, Feng; Goela, Naveen; Divakaran, Ajay; Abdel-Mottaleb, Mohamed
In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
Anderson, Allison P; Mayer, Michael D; Fellows, Abigail M; Cowan, Devin R; Hegel, Mark T; Buckey, Jay C
Virtual reality (VR) can provide exposure to nature for those living in isolated confined environments. We evaluated VR-presented natural settings for reducing stress and improving mood. There were 18 participants (9 men, 9 women), ages 32 ± 12 yr, who viewed three 15-min 360° scenes (an indoor control, rural Ireland, and remote beaches). Subjects were mentally stressed with arithmetic before scenes. Electrodermal activity (EDA) and heart rate variability measured psycho-physiological arousal. The Positive and Negative Affect Schedule and the 15-question Modified Reality Judgment and Presence Questionnaire (MRJPQ) measured mood and scene quality. Reductions in EDA from baseline were greater at the end of the natural scenes compared to the control scene (-0.59, -0.52, and 0.32 μS, respectively). The natural scenes reduced negative affect from baseline ( 1.2 and 1.1 points), but the control scene did not ( 0.4 points). MRJPQ scores for the control scene were lower than both natural scenes (4.9, 6.7, and 6.5 points, respectively). Within the two natural scenes, the preferred scene reduced negative affect ( 2.4 points) more than the second choice scene ( 1.8 points) and scored higher on the MRJPQ (6.8 vs. 6.4 points). Natural scene VR provided relaxation both objectively and subjectively, and scene preference had a significant effect on mood and perception of scene quality. VR may enable relaxation for people living in isolated confined environments, particularly when matched to personal preferences.Anderson AP, Mayer MD, Fellows AM, Cowan DR, Hegel MT, Buckey JC. Relaxation with immersive natural scenes presented using virtual reality. Aerosp Med Hum Perform. 2017; 88(6):520526.
Meka, Abhimitra; Fox, Gereon; Zollhofer, Michael; Richardt, Christian; Theobalt, Christian
We present a novel real-time approach for user-guided intrinsic decomposition of static scenes captured by an RGB-D sensor. In the first step, we acquire a three-dimensional representation of the scene using a dense volumetric reconstruction framework. The obtained reconstruction serves as a proxy to densely fuse reflectance estimates and to store user-provided constraints in three-dimensional space. User constraints, in the form of constant shading and reflectance strokes, can be placed directly on the real-world geometry using an intuitive touch-based interaction metaphor, or using interactive mouse strokes. Fusing the decomposition results and constraints in three-dimensional space allows for robust propagation of this information to novel views by re-projection. We leverage this information to improve on the decomposition quality of existing intrinsic video decomposition techniques by further constraining the ill-posed decomposition problem. In addition to improved decomposition quality, we show a variety of live augmented reality applications such as recoloring of objects, relighting of scenes and editing of material appearance.
Jung Han-Seung; Lee Young-Yoon; Lee Sang Uk
Watermarking for video sequences should consider additional attacks, such as frame averaging, frame-rate change, frame shuffling or collusion attacks, as well as those of still images. Also, since video is a sequence of analogous images, video watermarking is subject to interframe collusion. In order to cope with these attacks, we propose a scene-based temporal watermarking algorithm. In each scene, segmented by scene-change detection schemes, a watermark is embedded temporally to one-dimens...
Zuo, Xuguang; Yu, Lu; Yu, Hualong; Mao, Jue; Zhao, Yin
In movies and TV shows, it is common that several scenes repeat alternately. These videos are characterized with the long-term temporal correlation, which can be exploited to improve video coding efficiency. However, in applications supporting random access (RA), a video is typically divided into a number of RA segments (RASs) by RA points (RAPs), and different RASs are coded independently. In such a way, the long-term temporal correlation among RASs with similar scenes cannot be used. We present a scene-library-based video coding scheme for the coding of videos with repeated scenes. First, a compact scene library is built by clustering similar scenes and extracting representative frames in encoding video. Then, the video is coded using a layered scene-library-based coding structure, in which the library frames serve as long-term reference frames. The scene library is not cleared by RAPs so that the long-term temporal correlation between RASs from similar scenes can be exploited. Furthermore, the RAP frames are coded as interframes by only referencing library frames so as to improve coding efficiency while maintaining RA property. Experimental results show that the coding scheme can achieve significant coding gain over state-of-the-art methods.
Shopovska, Ivana; Jovanov, Ljubomir; Goossens, Bart; Philips, Wilfried
High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera.
Khan, L.; Israël, Menno; Petrushin, V.A.; van den Broek, Egon; van der Putten, Peter
This paper introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a
Bernard Marius e’t Hart
Full Text Available The relation of selective attention to understanding of natural scenes has been subject to intense behavioral research and computational modeling, and gaze is often used as a proxy for such attention. The probability of an image region to be fixated typically correlates with its contrast. However, this relation does not imply a causal role of contrast. Rather, contrast may relate to an object’s importance for a scene, which in turn drives attention. Here we operationalize importance by the probability that an observer names the object as characteristic for a scene. We modify luminance contrast of either a frequently named (common/important or a rarely named (rare/unimportant object, track the observers’ eye movements during scene viewing and ask them to provide keywords describing the scene immediately after.When no object is modified relative to the background, important objects draw more fixations than unimportant ones. Increases of contrast make an object more likely to be fixated, irrespective of whether it was important for the original scene, while decreases in contrast have little effect on fixations. Any contrast modification makes originally unimportant objects more important for the scene. Finally, important objects are fixated more centrally than unimportant objects, irrespective of contrast.Our data suggest a dissociation between object importance (relevance for the scene and salience (relevance for attention. If an object obeys natural scene statistics, important objects are also salient. However, when natural scene statistics are violated, importance and salience are differentially affected. Object salience is modulated by the expectation about object properties (e.g., formed by context or gist, and importance by the violation of such expectations. In addition, the dependence of fixated locations within an object on the object’s importance suggests an analogy to the effects of word frequency on landing positions in reading.
Yao, Guangle; Lei, Tao; Zhong, Jiandan; Jiang, Ping; Jia, Wenwu
Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR) sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG) for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: https://github.com/JerryYaoGl/BSEvaluationRemoteSceneIR. PMID:28837112
Full Text Available Background subtraction (BS is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: https://github.com/JerryYaoGl/BSEvaluationRemoteSceneIR.
Codispoti, Maurizio; De Cesarei, Andrea; Ferrari, Vera
Is color a critical factor when processing the emotional content of natural scenes? Under challenging perceptual conditions, such as when pictures are briefly presented, color might facilitate scene segmentation and/or function as a semantic cue via association with scene-relevant concepts (e.g., red and blood/injury). To clarify the influence of color on affective picture perception, we compared the late positive potentials (LPP) to color versus grayscale pictures, presented for very brief (24 ms) and longer (6 s) exposure durations. Results indicated that removing color information had no effect on the affective modulation of the LPP, regardless of exposure duration. These findings imply that the recognition of the emotional content of scenes, even when presented very briefly, does not critically rely on color information. Copyright © 2011 Society for Psychophysiological Research.
Talavera Martínez, Estefanía
Nowadays, there is an upsurge of interest in using lifelogging devices. Such devices generate huge amounts of image data; consequently, the need for automatic methods for analyzing and summarizing these data is drastically increasing. We present a new method for familiar scene recognition in
Nijboer, Tanja C W; Van Der Smagt, Maarten J; Van Zandvoort, Martine J E; De Haan, Edward H F
Scene recognition can be enhanced by appropriate colour information, yet the level of visual processing at which colour exerts its effects is still unclear. It has been suggested that colour supports low-level sensory processing, while others have claimed that colour information aids semantic categorization and recognition of objects and scenes. We investigated the effect of colour on scene recognition in a case of colour agnosia, M.A.H. In a scene identification task, participants had to name images of natural or non-natural scenes in six different formats. Irrespective of scene format, M.A.H. was much slower on the natural than on the non-natural scenes. As expected, neither M.A.H. nor control participants showed any difference in performance for the non-natural scenes. However, for the natural scenes, appropriate colour facilitated scene recognition in control participants (i.e., shorter reaction times), whereas M.A.H.'s performance did not differ across formats. Our data thus support the hypothesis that the effect of colour occurs at the level of learned associations.
Amano, Kinjiro; Uchikawa, Keiji; Kuriki, Ichiro
To study the characteristics of color memory for natural images, a memory-identification task was performed with differing color contrasts; three of the contrasts were defined by chromatic and luminance components of the image, and the others were defined with respect to the categorical colors. After observing a series of pictures successively, subjects identified the pictures using a confidence rating. Detection of increased contrasts tended to be harder than detection of decreased contrasts, suggesting that the chromaticness of pictures is enhanced in memory. Detecting changes within each color category was more difficult than across the categories. A multiple mechanism that processes color differences and categorical colors is briefly considered.
Full Text Available Our visual system has the ability to adapt to the color characteristics of environment and maintain stable color appearance. Many researches on chromatic adaptation and color constancy suggested that the different levels of visual processes involve the adaptation mechanism. In the case of colorfulness perception, it has been shown that the perception changes with adaptation to chromatic contrast modulation and to surrounding chromatic variance. However, it is still not clear how the perception changes in natural scenes and what levels of visual mechanisms contribute to the perception. Here, I will mainly present our recent work on colorfulness-adaptation in natural images. In the experiment, we examined whether the colorfulness perception of an image was influenced by the adaptation to natural images with different degrees of saturation. Natural and unnatural (shuffled or phase-scrambled images are used for adapting and test images, and all combinations of adapting and test images were tested (e.g., the combination of natural adapting images and a shuffled test image. The results show that colorfulness perception was influenced by adaptation to the saturation of images. A test image appeared less colorful after adaptation to saturated images, and vice versa. The effect of colorfulness adaptation was the strongest for the combination of natural adapting and natural test images. The fact that the naturalness of the spatial structure in an image affects the strength of the adaptation effect implies that the recognition of natural scene would play an important role in the adaptation mechanism.
In the early stages of image analysis, visual cortex represents scenes as spatially organized maps of locally defined features (e.g., edge orientation). As image reconstruction unfolds and features are assembled into larger constructs, cortex attempts to recover semantic content for object recognition. It is conceivable that higher level representations may feed back onto early processes and retune their properties to align with the semantic structure projected by the scene; however, there is no clear evidence to either support or discard the applicability of this notion to the human visual system. Obtaining such evidence is challenging because low and higher level processes must be probed simultaneously within the same experimental paradigm. We developed a methodology that targets both levels of analysis by embedding low-level probes within natural scenes. Human observers were required to discriminate probe orientation while semantic interpretation of the scene was selectively disrupted via stimulus inversion or reversed playback. We characterized the orientation tuning properties of the perceptual process supporting probe discrimination; tuning was substantially reshaped by semantic manipulation, demonstrating that low-level feature detectors operate under partial control from higher level modules. The manner in which such control was exerted may be interpreted as a top-down predictive strategy whereby global semantic content guides and refines local image reconstruction. We exploit the novel information gained from data to develop mechanistic accounts of unexplained phenomena such as the classic face inversion effect.
Tashlinskii, A. G.; Smirnov, P. V.; Tsaryov, M. G.
The paper considers the effectiveness of motion estimation in video using pixel-by-pixel recurrent algorithms. The algorithms use stochastic gradient decent to find inter-frame shifts of all pixels of a frame. These vectors form shift vectors' field. As estimated parameters of the vectors the paper studies their projections and polar parameters. It considers two methods for estimating shift vectors' field. The first method uses stochastic gradient descent algorithm to sequentially process all nodes of the image row-by-row. It processes each row bidirectionally i.e. from the left to the right and from the right to the left. Subsequent joint processing of the results allows compensating inertia of the recursive estimation. The second method uses correlation between rows to increase processing efficiency. It processes rows one after the other with the change in direction after each row and uses obtained values to form resulting estimate. The paper studies two criteria of its formation: gradient estimation minimum and correlation coefficient maximum. The paper gives examples of experimental results of pixel-by-pixel estimation for a video with a moving object and estimation of a moving object trajectory using shift vectors' field.
Petek, Rok; Jurc, Maja; Kalan, Janko; Batič, Franc
This masters thesis presents and describes modern methods of optical character recognition in natural scenes. Methods with high classification results and are robust to illumination and geometric transformations were selected for the thesis. Our work is based on the implementation of three different methods for obtaining features. The basic HOG method, which also underlies the other two methods is one of the most popular feature extraction methods in object detection and character recognition...
Sebastian, Stephen; Abrams, Jared; Geisler, Wilson S
A fundamental everyday visual task is to detect target objects within a background scene. Using relatively simple stimuli, vision science has identified several major factors that affect detection thresholds, including the luminance of the background, the contrast of the background, the spatial similarity of the background to the target, and uncertainty due to random variations in the properties of the background and in the amplitude of the target. Here we use an experimental approach based on constrained sampling from multidimensional histograms of natural stimuli, together with a theoretical analysis based on signal detection theory, to discover how these factors affect detection in natural scenes. We sorted a large collection of natural image backgrounds into multidimensional histograms, where each bin corresponds to a particular luminance, contrast, and similarity. Detection thresholds were measured for a subset of bins spanning the space, where a natural background was randomly sampled from a bin on each trial. In low-uncertainty conditions, both the background bin and the amplitude of the target were fixed, and, in high-uncertainty conditions, they varied randomly on each trial. We found that thresholds increase approximately linearly along all three dimensions and that detection accuracy is unaffected by background bin and target amplitude uncertainty. The results are predicted from first principles by a normalized matched-template detector, where the dynamic normalizing gain factor follows directly from the statistical properties of the natural backgrounds. The results provide an explanation for classic laws of psychophysics and their underlying neural mechanisms.
Oropesa Morales, Lester Arturo; Montoya Obeso, Abraham; Hernández García, Rosaura; Cocolán Almeda, Sara Ivonne; García Vázquez, Mireya Saraí; Benois-Pineau, Jenny; Zamudio Fuentes, Luis Miguel; Martinez Nuño, Jesús A.; Ramírez Acosta, Alejandro Alvaro
Multimedia content production and storage in repositories are now an increasingly widespread practice. Indexing concepts for search in multimedia libraries are very useful for users of the repositories. However the search tools of content-based retrieval and automatic video tagging, still do not have great consistency. Regardless of how these systems are implemented, it is of vital importance to possess lots of videos that have concepts tagged with ground truth (training and testing sets). This paper describes a novel methodology to make complex annotations on video resources through ELAN software. The concepts are annotated and related to Mexican nature in a High Level Features (HLF) from development set of TRECVID 2014 in a collaborative environment. Based on this set, each nature concept observed is tagged on each video shot using concepts of the TRECVid 2014 dataset. We also propose new concepts, -like tropical settings, urban scenes, actions, events, weather, places for name a few. We also propose specific concepts that best describe video content of Mexican culture. We have been careful to get the database tagged with concepts of nature and ground truth. It is evident that a collaborative environment is more suitable for annotation of concepts related to ground truth and nature. As a result a Mexican nature database was built. It also is the basis for testing and training sets to automatically classify new multimedia content of Mexican nature.
Fiori, Elisabetta; Galizia, Antonella; Danovaro, Emanuele; Clematis, Andrea; Bedrina, Tatiana; Parodi, Antonio
Forecasting severe storms and floods is one of the main challenges of 21th century. Floods are the most dangerous meteorological hazard in the Mediterranean basins due to both the number of people affected and to the relatively high frequency by which human activities and goods suffer damages and losses. The numerical simulations of extreme events which happen over small basins as the Mediterranean ones are need a very fine-resolution in space and time and as a consequence considerable memory and computational power are required. Since the resources provided by the PRACE project represent the solution for satisfying such requirements, the Super Computing of Extreme Natural Events (SCENE) project has been proposed. SCENE aims to provide an advanced understanding of the intrinsic predictability of severe precipitation processes and the associated predictive ability of high-resolution meteorological models with a special focus on flash flood-producing storms in regions of complex orography (e.g. Mediterranean area) through the assessment of the role of both the convective and microphysical processes. The meteorological model considered in the project is the Weather Research and Forecasting (WRF) model, a state of the art mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. Thus, among all the parameterizations available in the WRF model, the WRF Single-Moment 6-Class Scheme and the Thompson microphysics scheme will be adopted for the numerical simulations in combination with three different approaches for the treatment of the convective processes, that is the use of explicit method, Betts-Miller-Janjic Scheme and Kain-Fritsch. As for flash-flood producing storms, the project considers the recent sequence of extreme events occurred in the north-western portion of the Mediterranean sea; some of these events are the so-called critical cases of the DRIHM project (www.drihm.eu), i.e. selected severe
Full Text Available Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.
Leech, Robert; Gygi, Brian; Aydelott, Jennifer; Dick, Frederic
In a non-linguistic analog of the "cocktail-party" scenario, informational and contextual factors were found to affect the recognition of everyday environmental sounds embedded in naturalistic auditory scenes. Short environmental sound targets were presented in a dichotic background scene composed of either a single stereo background scene or a composite background scene created by playing different background scenes to the different ears. The side of presentation, time of onset, and number of target sounds were varied across trials to increase the uncertainty for the participant. Half the sounds were contextually congruent with the background sound (i.e., consistent with the meaningful real-world sound environment represented in the auditory scene) and half were incongruent. The presence of a single competing background scene decreased identification accuracy, suggesting an informational masking effect. In tandem, there was a contextual pop-out effect, with contextually incongruent sounds identified more accurately. However, when targets were incongruent with the real-world context of the background scene, informational masking was reduced. Acoustic analyses suggested that this contextual pop-out effect was driven by a mixture of perceptual differences between the target and background, as well as by higher-level cognitive factors. These findings indicate that identification of environmental sounds in naturalistic backgrounds is an active process that requires integrating perceptual, attentional, and cognitive resources.
Hamel, Johanna; De Beukelaer, Sophie; Kraft, Antje; Ohl, Sven; Audebert, Heinrich J; Brandt, Stephan A
Diverse cognitive functions decline with increasing age, including the ability to process central and peripheral visual information in a laboratory testing situation (useful visual field of view). To investigate whether and how this influences activities of daily life, we studied age-related changes in visual exploratory behavior in a natural scene setting: a driving simulator paradigm of variable complexity was tested in subjects of varying ages with simultaneous eye- and head-movement recordings via a head-mounted camera. Detection and reaction times were also measured by visual fixation and manual reaction. We considered video computer game experience as a possible influence on performance. Data of 73 participants of varying ages were analyzed, driving two different courses. We analyzed the influence of route difficulty level, age, and eccentricity of test stimuli on oculomotor and driving behavior parameters. No significant age effects were found regarding saccadic parameters. In the older subjects head-movements increasingly contributed to gaze amplitude. More demanding courses and more peripheral stimuli locations induced longer reaction times in all age groups. Deterioration of the functionally useful visual field of view with increasing age was not suggested in our study group. However, video game-experienced subjects revealed larger saccade amplitudes and a broader distribution of fixations on the screen. They reacted faster to peripheral objects suggesting the notion of a general detection task rather than perceiving driving as a central task. As the video game-experienced population consisted of younger subjects, our study indicates that effects due to video game experience can easily be misinterpreted as age effects if not accounted for. We therefore view it as essential to consider video game experience in all testing methods using virtual media.
Full Text Available Diverse cognitive functions decline with increasing age, including the ability to process central and peripheral visual information in a laboratory testing situation (useful visual field of view. To investigate whether and how this influences activities of daily life, we studied age-related changes in visual exploratory behavior in a natural scene setting: a driving simulator paradigm of variable complexity was tested in subjects of varying ages with simultaneous eye- and head-movement recordings via a head-mounted camera. Detection and reaction times were also measured by visual fixation and manual reaction. We considered video computer game experience as a possible influence on performance. Data of 73 participants of varying ages were analyzed, driving two different courses. We analyzed the influence of route difficulty level, age and eccentricity of test stimuli on oculomotor and driving behavior parameters. No significant age effects were found regarding saccadic parameters. In the older subjects head-movements increasingly contributed to gaze amplitude. More demanding courses and more peripheral stimuli locations, induced longer reaction times in all age groups. Deterioration of the functionally useful visual field of view with increasing age was not suggested in our study group. However, video game-experienced subjects revealed larger saccade amplitudes and a broader distribution of fixations on the screen. They reacted faster to peripheral objects suggesting the notion of a general detection task rather than perceiving driving as a central task. As the video game experienced population consisted of younger subjects, our study indicates that effects due to video game experience can easily be misinterpreted as age effects if not accounted for. We therefore view it as essential to consider video game experience in all testing methods using virtual media.
Brown, Daniel K; Barton, Jo L; Gladwell, Valerie F
A randomized crossover study explored whether viewing different scenes prior to a stressor altered autonomic function during the recovery from the stressor. The two scenes were (a) nature (composed of trees, grass, fields) or (b) built (composed of man-made, urban scenes lacking natural characteristics) environments. Autonomic function was assessed using noninvasive techniques of heart rate variability; in particular, time domain analyses evaluated parasympathetic activity, using root-mean-square of successive differences (RMSSD). During stress, secondary cardiovascular markers (heart rate, systolic and diastolic blood pressure) showed significant increases from baseline which did not differ between the two viewing conditions. Parasympathetic activity, however, was significantly higher in recovery following the stressor in the viewing scenes of nature condition compared to viewing scenes depicting built environments (RMSSD; 50.0 ± 31.3 vs 34.8 ± 14.8 ms). Thus, viewing nature scenes prior to a stressor alters autonomic activity in the recovery period. The secondary aim was to examine autonomic function during viewing of the two scenes. Standard deviation of R-R intervals (SDRR), as change from baseline, during the first 5 min of viewing nature scenes was greater than during built scenes. Overall, this suggests that nature can elicit improvements in the recovery process following a stressor.
Maitre, Matthieu; Guillemot, Christine; Morin, Luce
This paper addresses the problem of side information extraction for distributed coding of videos captured by a camera moving in a 3-D static environment. Examples of targeted applications are augmented reality, remote-controlled robots operating in hazardous environments, or remote exploration by drones. It explores the benefits of the structure-from-motion paradigm for distributed coding of this type of video content. Two interpolation methods constrained by the scene geometry, based either on block matching along epipolar lines or on 3-D mesh fitting, are first developed. These techniques are based on a robust algorithm for sub-pel matching of feature points, which leads to semi-dense correspondences between key frames. However, their rate-distortion (RD) performances are limited by misalignments between the side information and the actual Wyner-Ziv (WZ) frames due to the assumption of linear motion between key frames. To cope with this problem, two feature point tracking techniques are introduced, which recover the camera parameters of the WZ frames. A first technique, in which the frames remain encoded separately, performs tracking at the decoder and leads to significant RD performance gains. A second technique further improves the RD performances by allowing a limited tracking at the encoder. As an additional benefit, statistics on tracks allow the encoder to adapt the key frame frequency to the video motion content.
Liu, Shuoyan; Xu, De; Yang, Xu
This paper proposes the Extended Bag-of-Visterms (EBOV) to represent semantic scenes. In previous methods, most representations are bag-of-visterms (BOV), where visterms referred to the quantized local texture information. Our new representation is built by introducing global texture information to extend standard bag-of-visterms. In particular we apply the adaptive weight to fuse the local and global information together in order to provide a better visterm representation. Given these representations, scene classification can be performed by pLSA (probabilistic Latent Semantic Analysis) model. The experiment results show that the appropriate use of global information improves the performance of scene classification, as compared with BOV representation that only takes the local information into account.
Bindemann, Markus; Scheepers, Christoph; Ferguson, Heather J.; Burton, A. Mike
Person detection is an important prerequisite of social interaction, but is not well understood. Following suggestions that people in the visual field can capture a viewer's attention, this study examines the role of the face and the body for person detection in natural scenes. We observed that viewers tend first to look at the center of a scene,…
Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng
Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
Shimada, Satoshi; Azuma, Shouzou; Teranaka, Sayaka; Kojima, Akira; Majima, Yukie; Maekawa, Yasuko
We developed the system that knowledge could be discovered and shared cooperatively in the organization based on the SECI model of knowledge management. This system realized three processes by the following method. (1)A video that expressed skill is segmented into a number of scenes according to its contents. Tacit knowledge is shared in each scene. (2)Tacit knowledge is extracted by bulletin board linked to each scene. (3)Knowledge is acquired by repeatedly viewing the video scene with the comment that shows the technical content to be practiced. We conducted experiments that the system was used by nurses working for general hospitals. Experimental results show that the nursing practical knack is able to be collected by utilizing bulletin board linked to video scene. Results of this study confirmed the possibility of expressing the tacit knowledge of nurses' empirical nursing skills sensitively with a clue of video images.
Bindemann, Markus; Lewis, Michael B
In this study, we examined whether the detection of frontal, ¾, and profile face views differs from their categorization as faces. In Experiment 1, we compared three tasks that required observers to determine the presence or absence of a face, but varied in the extents to which participants had to search for the faces in simple displays and in small or large scenes to make this decision. Performance was equivalent for all of the face views in simple displays and small scenes, but it was notably slower for profile views when this required the search for faces in extended scene displays. This search effect was confirmed in Experiment 2, in which we compared observers' eye movements with their response times to faces in visual scenes. These results demonstrate that the categorization of faces at fixation is dissociable from the detection of faces in space. Consequently, we suggest that face detection should be studied with extended visual displays, such as natural scenes.
Dong, Tianyang; Liu, Siyuan; Xia, Jiajia; Fan, Jing; Zhang, Ling
To automatically adapt to various hardware and software environments on different devices, this paper presents a time-critical adaptive approach for visualizing natural scenes. In this method, a simplified expression of a tree model is used for different devices. The best rendering scheme is intelligently selected to generate a particular scene by estimating the rendering time of trees based on their visual importance. Therefore, this approach can ensure the reality of natural scenes while maintaining a constant frame rate for their interactive display. To verify its effectiveness and flexibility, this method is applied in different devices, such as a desktop computer, laptop, iPad and smart phone. Applications show that the method proposed in this paper can not only adapt to devices with different computing abilities and system resources very well but can also achieve rather good visual realism and a constant frame rate for natural scenes.
Le, Minh Tuan; Nguyen, Congdu; Yoon, Dae-Il; Jung, Eun Ku; Jia, Jie; Kim, Hae-Kwang
In this paper, we propose a method of 3D graphics to video encoding and streaming that are embedded into a remote interactive 3D visualization system for rapidly representing a 3D scene on mobile devices without having to download it from the server. In particular, a 3D graphics to video framework is presented that increases the visual quality of regions of interest (ROI) of the video by performing more bit allocation to ROI during H.264 video encoding. The ROI are identified by projection 3D objects to a 2D plane during rasterization. The system offers users to navigate the 3D scene and interact with objects of interests for querying their descriptions. We developed an adaptive media streaming server that can provide an adaptive video stream in term of object-based quality to the client according to the user's preferences and the variation of network bandwidth. Results show that by doing ROI mode selection, PSNR of test sample slightly change while visual quality of objects increases evidently.
Anderson, Nicola C; Donk, Mieke
A change to an object in natural scenes attracts attention when it occurs during a fixation. However, when a change occurs during a saccade, and is masked by saccadic suppression, it typically does not capture the gaze in a bottom-up manner. In the present work, we investigated how the type and direction of salient changes to objects affect the prioritization and targeting of objects in natural scenes. We asked observers to look around a scene in preparation for a later memory test. After a period of time, an object in the scene was increased or decreased in salience either during a fixation (with a transient signal) or during a saccade (without transient signal), or it was not changed at all. Changes that were made during a fixation attracted the eyes both when the change involved an increase and a decrease in salience. However, changes that were made during a saccade only captured the eyes when the change was an increase in salience, relative to the baseline no-change condition. These results suggest that the prioritization of object changes can be influenced by the underlying salience of the changed object. In addition, object changes that occurred with a transient signal (which is itself a salient signal) resulted in more central object targeting. Taken together, our results suggest that salient signals in a natural scene are an important component in both object prioritization and targeting in natural scene viewing, insofar as they align with object locations.
Anderson, Nicola C.; Donk, Mieke
A change to an object in natural scenes attracts attention when it occurs during a fixation. However, when a change occurs during a saccade, and is masked by saccadic suppression, it typically does not capture the gaze in a bottom-up manner. In the present work, we investigated how the type and direction of salient changes to objects affect the prioritization and targeting of objects in natural scenes. We asked observers to look around a scene in preparation for a later memory test. After a period of time, an object in the scene was increased or decreased in salience either during a fixation (with a transient signal) or during a saccade (without transient signal), or it was not changed at all. Changes that were made during a fixation attracted the eyes both when the change involved an increase and a decrease in salience. However, changes that were made during a saccade only captured the eyes when the change was an increase in salience, relative to the baseline no-change condition. These results suggest that the prioritization of object changes can be influenced by the underlying salience of the changed object. In addition, object changes that occurred with a transient signal (which is itself a salient signal) resulted in more central object targeting. Taken together, our results suggest that salient signals in a natural scene are an important component in both object prioritization and targeting in natural scene viewing, insofar as they align with object locations. PMID:28222190
Adhikari, Srikar; Blaivas, Michael; Lyon, Matthew; Shiver, Stephen
Disaster management is a complex and difficult undertaking that may involve limited health care resources and evaluation of multiple victims. The objectives of this study were to evaluate the feasibility of real-time ultrasound video transmission from a simulated disaster triage location via commercially available video mobile phones and assess the ability of emergency physicians to accurately interpret the transmitted video of Focused Assessment with Sonography for Trauma (FAST) ultrasound examinations. This was a prospective, observational study that took place at a simulated disaster scene put on for an Advanced Disaster Life Support (ADLS) course. The second component occurred at a Level I academic urban emergency department (ED) with an annual census of 78,000. Nineteen subjects at a simulated disaster scene were scanned using a SonoSite Titan ultrasound system (Bothell, Washington USA). An off-the-shelf, basic, video-capable mobile phone was used to record each ultrasound examination; and then immediately transmit the videos to another mobile phone approximately 170 miles away. The transmitted video was received by three emergency physicians with hospital credentialing in emergency ultrasound. Each FAST examination video was assessed for pathology, such as free fluid. The reviewers graded the image quality and documented the overall confidence level regarding whether or not a complete and adequate examination was visualized. Spearman's rank correlation coefficient was used to examine the agreement between the reviewers and the sonologist who performed the ultrasound examinations. A total of 19 videos were transmitted. The median time for transmission of a video was 82.5 seconds (95% CI, 67.7 seconds-97.3 seconds). No video failed to transmit correctly on the first attempt. The image quality ratings for the three reviewers were 7.7, 7.5, and 7.4 on a 10-point Likert scale. There was a moderate agreement between the reviewers and sonologist in image quality
de Ridder, Huib; Blommaert, Frans J. J.; Fedorovskaya, Elena A.
The relation between perceptual image quality and naturalness was investigated by varying the colorfulness and hue of color images of natural scenes. These variations were created by digitizing the images, subsequently determining their color point distributions in the CIELUV color space and finally multiplying either the chroma value of the hue angle of each pixel by a constant. During the chroma/hue-angle transformation the lightness and hue-angle/chroma value of each pixel were kept constant. Ten subjects rated quality and naturalness on numerical scales. The results show that both quality and naturalness deteriorate as soon as hues start to deviate from the ones in the original image. Chroma variation affected the impression of quality and naturalness to a lesser extent than did hue variation. In general, a linear relation was found between image quality and naturalness. For chroma variation, however, a small but systematic deviation could be observed. This deviation reflects the subjects' preference for more colorful but, at the same time, somewhat unnatural images.
The structure of the physical world projects images onto our eyes. However, those images are often poorly representative of environmental structure: well-defined boundaries within the eye may correspond to irrelevant features of the physical world, while critical features of the physical world may be nearly invisible at the retinal projection. The challenge for the visual cortex is to sort these two types of features according to their utility in ultimately reconstructing percepts and interpreting the constituents of the scene. We describe a novel paradigm that enabled us to selectively evaluate the relative role played by these two feature classes in signal reconstruction from corrupted images. Our measurements demonstrate that this process is quickly dominated by the inferred structure of the environment, and only minimally controlled by variations of raw image content. The inferential mechanism is spatially global and its impact on early visual cortex is fast. Furthermore, it retunes local visual processing for more efficient feature extraction without altering the intrinsic transduction noise. The basic properties of this process can be partially captured by a combination of small-scale circuit models and large-scale network architectures. Taken together, our results challenge compartmentalized notions of bottom-up/top-down perception and suggest instead that these two modes are best viewed as an integrated perceptual mechanism.
Keane, Tommy P.; Cahill, Nathan D.; Tarduno, John A.; Jacobs, Robert A.; Pelz, Jeff B.
Mobile eye-tracking provides the fairly unique opportunity to record and elucidate cognition in action. In our research, we are searching for patterns in, and distinctions between, the visual-search performance of experts and novices in the geo-sciences. Traveling to regions resultant from various geological processes as part of an introductory field studies course in geology, we record the prima facie gaze patterns of experts and novices when they are asked to determine the modes of geological activity that have formed the scene-view presented to them. Recording eye video and scene video in natural settings generates complex imagery that requires advanced applications of computer vision research to generate registrations and mappings between the views of separate observers. By developing such mappings, we could then place many observers into a single mathematical space where we can spatio-temporally analyze inter- and intra-subject fixations, saccades, and head motions. While working towards perfecting these mappings, we developed an updated experiment setup that allowed us to statistically analyze intra-subject eye-movement events without the need for a common domain. Through such analyses we are finding statistical differences between novices and experts in these visual-search tasks. In the course of this research we have developed a unified, open-source, software framework for processing, visualization, and interaction of mobile eye-tracking and high-resolution panoramic imagery.
Ezaki, Nobuo; Bulacu, Marius; Schomaker, Lambert
We propose a system that reads the text encountered in natural scenes with the aim to provide assistance to the visually impaired persons. This paper describes the sys- tem design and evaluates several character extraction meth- ods. Automatic text recognition from natural images re- ceives a
Durant, Szonya; Wall, Matthew B; Zanker, Johannes M
Optic flow is one of the most important sources of information for enabling human navigation through the world. A striking finding from single-cell studies in monkeys is the rapid saturation of response of MT/MST areas with the density of optic flow type motion information. These results are reflected psychophysically in human perception in the saturation of motion aftereffects. We began by comparing responses to natural optic flow scenes in human visual brain areas to responses to the same scenes with inverted contrast (photo negative). This changes scene familiarity while preserving local motion signals. This manipulation had no effect; however, the response was only correlated with the density of local motion (calculated by a motion correlation model) in V1, not in MT/MST. To further investigate this, we manipulated the visible proportion of natural dynamic scenes and found that areas MT and MST did not increase in response over a 16-fold increase in the amount of information presented, i.e., response had saturated. This makes sense in light of the sparseness of motion information in natural scenes, suggesting that the human brain is well adapted to exploit a small amount of dynamic signal and extract information important for survival.
Stein, Timo; Peelen, Marius V
Humans are remarkably efficient in detecting highly familiar object categories in natural scenes, with evidence suggesting that such object detection can be performed in the (near) absence of attention. Here we systematically explored the influences of both spatial attention and category-based attention on the accuracy of object detection in natural scenes. Manipulating both types of attention additionally allowed for addressing how these factors interact: whether the requirement for spatial attention depends on the extent to which observers are prepared to detect a specific object category-that is, on category-based attention. The results showed that the detection of targets from one category (animals or vehicles) was better than the detection of targets from two categories (animals and vehicles), demonstrating the beneficial effect of category-based attention. This effect did not depend on the semantic congruency of the target object and the background scene, indicating that observers attended to visual features diagnostic of the foreground target objects from the cued category. Importantly, in three experiments the detection of objects in scenes presented in the periphery was significantly impaired when observers simultaneously performed an attentionally demanding task at fixation, showing that spatial attention affects natural scene perception. In all experiments, the effects of category-based attention and spatial attention on object detection performance were additive rather than interactive. Finally, neither spatial nor category-based attention influenced metacognitive ability for object detection performance. These findings demonstrate that efficient object detection in natural scenes is independently facilitated by spatial and category-based attention.
Moss, Cynthia F; Surlykke, Annemarie
Bats echolocating in the natural environment face the formidable task of sorting signals from multiple auditory objects, echoes from obstacles, prey, and the calls of conspecifics. Successful orientation in a complex environment depends on auditory information processing, along with adaptive vocal......-motor behaviors and flight path control, which draw upon 3-D spatial perception, attention, and memory. This article reviews field and laboratory studies that document adaptive sonar behaviors of echolocating bats, and point to the fundamental signal parameters they use to track and sort auditory objects...
James A. Roberts
Full Text Available Even during periods of fixation our eyes undergo small amplitude movements. These movements are thought to be essential to the visual system because neural responses rapidly fade when images are stabilized on the retina. The considerable recent interest in fixational eye movements (FEMs has thus far concentrated on idealized experimental conditions with artificial stimuli and restrained head movements, which are not necessarily a suitable model for natural vision. Natural dynamic stimuli, such as movies, offer the potential to move beyond restrictive experimental settings to probe the visual system with greater ecological validity. Here, we study FEMs recorded in humans during the unconstrained viewing of a dynamic and realistic visual environment, revealing that drift trajectories exhibit the properties of a random walk with memory. Drifts are correlated at short time scales such that the gaze position diverges from the initial fixation more quickly than would be expected for an uncorrelated random walk. We propose a simple model based on the premise that the eye tends to avoid retracing its recent steps to prevent photoreceptor adaptation. The model reproduces key features of the observed dynamics and enables estimation of parameters from data. Our findings show that FEM correlations thought to prevent perceptual fading exist even in highly dynamic real-world conditions.
Gao, Xinbo; Gao, Fei; Tao, Dacheng; Li, Xuelong
Universal blind image quality assessment (IQA) metrics that can work for various distortions are of great importance for image processing systems, because neither ground truths are available nor the distortion types are aware all the time in practice. Existing state-of-the-art universal blind IQA algorithms are developed based on natural scene statistics (NSS). Although NSS-based metrics obtained promising performance, they have some limitations: 1) they use either the Gaussian scale mixture model or generalized Gaussian density to predict the nonGaussian marginal distribution of wavelet, Gabor, or discrete cosine transform coefficients. The prediction error makes the extracted features unable to reflect the change in nonGaussianity (NG) accurately. The existing algorithms use the joint statistical model and structural similarity to model the local dependency (LD). Although this LD essentially encodes the information redundancy in natural images, these models do not use information divergence to measure the LD. Although the exponential decay characteristic (EDC) represents the property of natural images that large/small wavelet coefficient magnitudes tend to be persistent across scales, which is highly correlated with image degradations, it has not been applied to the universal blind IQA metrics; and 2) all the universal blind IQA metrics use the same similarity measure for different features for learning the universal blind IQA metrics, though these features have different properties. To address the aforementioned problems, we propose to construct new universal blind quality indicators using all the three types of NSS, i.e., the NG, LD, and EDC, and incorporating the heterogeneous property of multiple kernel learning (MKL). By analyzing how different distortions affect these statistical properties, we present two universal blind quality assessment models, NSS global scheme and NSS two-step scheme. In the proposed metrics: 1) we exploit the NG of natural images
Falomir, Zoe; Kluth, Thomas
The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%.
Collet, Anne-Claire; Fize, Denis; VanRullen, Rufin
Rapid visual categorization is a crucial ability for survival of many animal species, including monkeys and humans. In real conditions, objects (either animate or inanimate) are never isolated but embedded in a complex background made of multiple elements. It has been shown in humans and monkeys that the contextual background can either enhance or impair object categorization, depending on context/object congruency (for example, an animal in a natural vs. man-made environment). Moreover, a scene is not only a collection of objects; it also has global physical features (i.e phase and amplitude of Fourier spatial frequencies) which help define its gist. In our experiment, we aimed to explore and compare the contribution of the amplitude spectrum of scenes in the context-object congruency effect in monkeys and humans. We designed a rapid visual categorization task, Animal versus Non-Animal, using as contexts both real scenes photographs and noisy backgrounds built from the amplitude spectrum of real scenes but with randomized phase spectrum. We showed that even if the contextual congruency effect was comparable in both species when the context was a real scene, it differed when the foreground object was surrounded by a noisy background: in monkeys we found a similar congruency effect in both conditions, but in humans the congruency effect was absent (or even reversed) when the context was a noisy background. PMID:26207915
Foulsham, Tom; Barton, Jason J S; Kingstone, Alan; Dewhurst, Richard; Underwood, Geoffrey
Models of eye movement control in natural scenes often distinguish between stimulus-driven processes (which guide the eyes to visually salient regions) and those based on task and object knowledge (which depend on expectations or identification of objects and scene gist). In the present investigation, the eye movements of a patient with visual agnosia were recorded while she searched for objects within photographs of natural scenes and compared to those made by students and age-matched controls. Agnosia is assumed to disrupt the top-down knowledge available in this task, and so may increase the reliance on bottom-up cues. The patient's deficit in object recognition was seen in poor search performance and inefficient scanning. The low-level saliency of target objects had an effect on responses in visual agnosia, and the most salient region in the scene was more likely to be fixated by the patient than by controls. An analysis of model-predicted saliency at fixation locations indicated a closer match between fixations and low-level saliency in agnosia than in controls. These findings are discussed in relation to saliency-map models and the balance between high and low-level factors in eye guidance.
Bohlin, Gustav; Göransson, Andreas; Höst, Gunnar E.; Tibell, Lena A. E.
Educational videos on the Internet comprise a vast and highly diverse source of information. Online search engines facilitate access to numerous videos claiming to explain natural selection, but little is known about the degree to which the video content match key evolutionary content identified as important in evolution education research. In…
Schomaker, Judith; Walper, Daniel; Wittmann, Bianca C; Einhäuser, Wolfgang
In addition to low-level stimulus characteristics and current goals, our previous experience with stimuli can also guide attentional deployment. It remains unclear, however, if such effects act independently or whether they interact in guiding attention. In the current study, we presented natural scenes including every-day objects that differed in affective-motivational impact. In the first free-viewing experiment, we presented visually-matched triads of scenes in which one critical object was replaced that varied mainly in terms of motivational value, but also in terms of valence and arousal, as confirmed by ratings by a large set of observers. Treating motivation as a categorical factor, we found that it affected gaze. A linear-effect model showed that arousal, valence, and motivation predicted fixations above and beyond visual characteristics, like object size, eccentricity, or visual salience. In a second experiment, we experimentally investigated whether the effects of emotion and motivation could be modulated by visual salience. In a medium-salience condition, we presented the same unmodified scenes as in the first experiment. In a high-salience condition, we retained the saturation of the critical object in the scene, and decreased the saturation of the background, and in a low-salience condition, we desaturated the critical object while retaining the original saturation of the background. We found that highly salient objects guided gaze, but still found additional additive effects of arousal, valence and motivation, confirming that higher-level factors can also guide attention, as measured by fixations towards objects in natural scenes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Matteo Valsecchi; Karl R. Gegenfurtner
Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s) were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as ima...
Christian Johannes Kellner
Full Text Available In the retina of trichromatic primates, chromatic information is encoded in an opponent fashion and transmitted to the lateral geniculate nucleus (LGN and visual cortex via parallel pathways. Chromatic selectivities of neurons in the LGN form two separate clusters, corresponding to two classes of cone opponency. In the visual cortex, however, the chromatic selectivities are more distributed, which is in accordance with a population code for colour. Previous studies of cone signals in natural scenes typically found opponent codes with chromatic selectivities corresponding to two directions in colour space. Here we investigated how the nonlinear spatiochromatic filtering in the retina influences the encoding of colour signals. Cone signals were derived from hyperspectral images of natural scenes and pre-processed by centre-surround filtering and rectification, resulting in parallel ON and OFF channels. Independent Component Analysis on these signals yielded a highly sparse code with basis functions that showed spatio-chromatic selectivities. In contrast to previous analyses of linear transformations of cone signals, chromatic selectivities were not restricted to two main chromatic axes, but were more continuously distributed in colour space, similar to the population code of colour in the early visual cortex. Our results indicate that spatiochromatic processing in the retina leads to a more distributed and more efficient code for natural scenes.
Kellner, Christian J; Wachtler, Thomas
In the retina of trichromatic primates, chromatic information is encoded in an opponent fashion and transmitted to the lateral geniculate nucleus (LGN) and visual cortex via parallel pathways. Chromatic selectivities of neurons in the LGN form two separate clusters, corresponding to two classes of cone opponency. In the visual cortex, however, the chromatic selectivities are more distributed, which is in accordance with a population code for color. Previous studies of cone signals in natural scenes typically found opponent codes with chromatic selectivities corresponding to two directions in color space. Here we investigated how the non-linear spatio-chromatic filtering in the retina influences the encoding of color signals. Cone signals were derived from hyper-spectral images of natural scenes and preprocessed by center-surround filtering and rectification, resulting in parallel ON and OFF channels. Independent Component Analysis (ICA) on these signals yielded a highly sparse code with basis functions that showed spatio-chromatic selectivities. In contrast to previous analyses of linear transformations of cone signals, chromatic selectivities were not restricted to two main chromatic axes, but were more continuously distributed in color space, similar to the population code of color in the early visual cortex. Our results indicate that spatio-chromatic processing in the retina leads to a more distributed and more efficient code for natural scenes.
Sareen, Preeti; Ehinger, Krista A; Wolfe, Jeremy M
Change blindness has been a topic of interest in cognitive sciences for decades. Change detection experiments are frequently used for studying various research topics such as attention and perception. However, creating change detection stimuli is tedious and there is no open repository of such stimuli using natural scenes. We introduce the Change Blindness (CB) Database with object changes in 130 colored images of natural indoor scenes. The size and eccentricity are provided for all the changes as well as reaction time data from a baseline experiment. In addition, we have two specialized satellite databases that are subsets of the 130 images. In one set, changes are seen in rooms or in mirrors in those rooms (Mirror Change Database). In the other, changes occur in a room or out a window (Window Change Database). Both the sets have controlled background, change size, and eccentricity. The CB Database is intended to provide researchers with a stimulus set of natural scenes with defined stimulus parameters that can be used for a wide range of experiments. The CB Database can be found at http://search.bwh.harvard.edu/new/CBDatabase.html .
Martens, M.H.; Rook, A.M.
This paper illustrates simple techniques to approach traffic safety issues and traffic accidents. By means of birds-eye view video registration, a detailed and accurate overview can be gathered of the type, number and causation of conflicts and accidents that occur. This allows a good assessment of
Thompson, Catherine; Crundall, David
Three experiments explored the transference of visual scanning behaviour between two unrelated tasks. Participants first viewed letters presented horizontally, vertically, or as a random array. They then viewed still images (experiments 1 and 2) or video clips (experiment 3) of driving scenes, under varying task conditions. Despite having no relevance to the driving images, layout of stimuli in the letter task influenced scanning behaviour in this subsequent task. In the still images, a vertical letter search increased vertical scanning, and in the dynamic clips, a horizontal letter search decreased vertical scanning. This indicated that (i) models of scanning behaviour should account for the influence of a preceding unrelated task; (ii) carry-over is modulated by demand in the current task; and (iii) in situations where particular scanning strategies are important for primary task performance (eg driving safety), secondary task information should be displayed in a manner likely to produce a congruent scanning strategy.
Hobbs, Jennifer A; Towal, R Blythe; Hartmann, Mitra J Z
Analysis of natural scene statistics has been a powerful approach for understanding neural coding in the auditory and visual systems. In the field of somatosensation, it has been more challenging to quantify the natural tactile scene, in part because somatosensory signals are so tightly linked to the animal's movements. The present work takes a step towards quantifying the natural tactile scene for the rat vibrissal system by simulating rat whisking motions to systematically investigate the probabilities of whisker-object contact in naturalistic environments. The simulations permit an exhaustive search through the complete space of possible contact patterns, thereby allowing for the characterization of the patterns that would most likely occur during long sequences of natural exploratory behavior. We specifically quantified the probabilities of 'concomitant contact', that is, given that a particular whisker makes contact with a surface during a whisk, what is the probability that each of the other whiskers will also make contact with the surface during that whisk? Probabilities of concomitant contact were quantified in simulations that assumed increasingly naturalistic conditions: first, the space of all possible head poses; second, the space of behaviorally preferred head poses as measured experimentally; and third, common head poses in environments such as cages and burrows. As environments became more naturalistic, the probability distributions shifted from exhibiting a 'row-wise' structure to a more diagonal structure. Results also reveal that the rat appears to use motor strategies (e.g. head pitches) that generate contact patterns that are particularly well suited to extract information in the presence of uncertainty. © 2015. Published by The Company of Biologists Ltd.
Löw, Andreas; Bradley, Margaret M; Lang, Peter J
During rapid serial visual presentation (RSVP), the perceptual system is confronted with a rapidly changing array of sensory information demanding resolution. At rapid rates of presentation, previous studies have found an early (e.g., 150-280 ms) negativity over occipital sensors that is enhanced when emotional, as compared with neutral, pictures are viewed, suggesting facilitated perception. In the present study, we explored how picture composition and the presence of people in the image affect perceptual processing of pictures of natural scenes. Using RSVP, pictures that differed in perceptual composition (figure-ground or scenes), content (presence of people or not), and emotional content (emotionally arousing or neutral) were presented in a continuous stream for 330 ms each with no intertrial interval. In both subject and picture analyses, all three variables affected the amplitude of occipital negativity, with the greatest enhancement for figure-ground compositions (as compared with scenes), irrespective of content and emotional arousal, supporting an interpretation that ease of perceptual processing is associated with enhanced occipital negativity. Viewing emotional pictures prompted enhanced negativity only for pictures that depicted people, suggesting that specific features of emotionally arousing images are associated with facilitated perceptual processing, rather than all emotional content.
Segal, Irina Yonit; Giladi, Chen; Gedalin, Michael; Rucci, Michele; Ben-Tov, Mor; Kushinsky, Yam; Mokeichev, Alik; Segev, Ronen
Under natural viewing conditions the input to the retina is a complex spatiotemporal signal that depends on both the scene and the way the observer moves. It is commonly assumed that the retina processes this input signal efficiently by taking into account the statistics of the natural world. It has recently been argued that incessant microscopic eye movements contribute to this process by decorrelating the input to the retina. Here we tested this theory by measuring the responses of the salamander retina to stimuli replicating the natural input signals experienced by the retina in the presence and absence of fixational eye movements. Contrary to the predictions of classic theories of efficient encoding that do not take behavior into account, we show that the response characteristics of retinal ganglion cells are not sufficient in themselves to disrupt the broad correlations of natural scenes. Specifically, retinal ganglion cells exhibited strong and extensive spatial correlations in the absence of fixational eye movements. However, the levels of correlation in the neural responses dropped in the presence of fixational eye movements, resulting in effective decorrelation of the channels streaming information to the brain. These observations confirm the predictions that microscopic eye movements act to reduce correlations in retinal responses and contribute to visual information processing.
Anit V. Manjaly
Full Text Available In Text Information Extraction (TIE process, the text regions are localized and extracted from the images. It is an active research problem in computer vision applications. Diversity in text is due to the differences in size, style, orientation, alignment of text, low image contrast and complex backgrounds. The semantic information provided by an image can be used in different applications such as content based image retrieval, sign board identification etc. Text information extraction comprises of text image classification, text detection, localization, segmentation, enhancement and recognition. This paper contains a quick review on various text localization methods for localizing texts from natural scene images.
Dubreu, Christine; Manzanera, Antoine; Bohain, Eric
As target tracking is arousing more and more interest, the necessity to reliably assess tracking algorithms in any conditions is becoming essential. The evaluation of such algorithms requires a database of sequences representative of the whole range of conditions in which the tracking system is likely to operate, together with its associated ground truth. However, building such a database with real sequences, and collecting the associated ground truth appears to be hardly possible and very time-consuming. Therefore, more and more often, synthetic sequences are generated by complex and heavy simulation platforms to evaluate the performance of tracking algorithms. Some methods have also been proposed using simple synthetic sequences generated without such complex simulation platforms. These sequences are generated from a finite number of discriminating parameters, and are statistically representative, as regards these parameters, of real sequences. They are very simple and not photorealistic, but can be reliably used for low-level tracking algorithms evaluation in any operating conditions. The aim of this paper is to assess the reliability of these non-photorealistic synthetic sequences for evaluation of tracking systems on complex-textured objects, and to show how the number of parameters can be increased to synthesize more elaborated scenes and deal with more complex dynamics, including occlusions and three-dimensional deformations.
Schweinhart, April M; Essock, Edward A
Natural scenes tend to be biased in both scale (1/f) and orientation (H > V > O; horizontal > vertical > oblique), and the human visual system has similar biases that serve to partially 'undo' (ie whiten) the resultant representation. The present approach to investigating this relationship considers content in works of art-scenes produced for processing by the human visual system. We analyzed the content of images by a method that minimizes errors inherent in some prior analysis methods. In the first experiment museum paintings were considered by comparing the amplitude spectrum of landscape paintings, natural scene photos, portrait paintings, and photos of faces. In the second experiment we obtained photos of paintings at the time they were produced by local artists and compared structural content in matched photos which contained the same scenes that the artists had painted. Results show that artists produce paintings with both the 1/f bias of scale and the horizontal-effect bias of orientation (H > V > O). More importantly, results from both experiments show that artists overregularize the structure in their works: they impose the natural-scene horizontal effect at all structural scales and in all types of subject matter even though, in the real world, the pattern of anisotropy differs considerably across spatial scale and between faces and natural scenes. It appears that artists unconsciously overregularize the oriented structure in their works to make it conform more uniformly to the 'expected' canonical ideal.
Romero, Javier; Luzón-González, Raúl; Nieves, Juan L; Hernández-Andrés, Javier
We have analyzed the changes in the color of objects in natural scenes due to atmospheric scattering according to changes in the distance of observation. Hook-shaped curves were found in the chromaticity diagram when the object moved from zero distance to long distances, where the object chromaticity coordinates approached the color coordinates of the horizon. This trend is the result of the combined effect of attenuation in the direct light arriving to the observer from the object and the airlight added during its trajectory. Atmospheric scattering leads to a fall in the object's visibility, which is measurable as a difference in color between the object and the background (taken here to be the horizon). Focusing on color difference instead of luminance difference could produce different visibility values depending on the color tolerance used. We assessed the cone-excitation ratio constancy for several objects at different distances. Affine relationships were obtained when an object's cone excitations were represented both at zero distance and increasing distances. These results could help to explain color constancy in natural scenes for objects at different distances, a phenomenon that has been pointed out by different authors.
Full Text Available Visual cortex analyzes images by first extracting relevant details (e.g. edges via a large array of specialized detectors. The resulting edge map is then relayed to a processing pipeline, the final goal of which is to attribute meaning to the scene. As this process unfolds, does the global interpretation of the image affect how local feature detectors operate? We characterized the local properties of human edge detectors while we manipulated the extent to which the statistical properties of the surrounding image conformed to those encountered in natural vision. Although some aspects of local processing were unaffected by contextual manipulations, we observed significant alterations in the operating characteristics of the detector which were solely attributable to a higher-level semantic interpretation of the scene, unrelated to lower-level aspects of image statistics. Our results suggest that it may be inaccurate to regard early feature detectors as operating outside the domain of higher-level vision; although there is validity in this approach, a full understanding of their properties requires the inclusion of knowledge-based effects specific to the statistical regularities found in the natural environment.
Bosworth, Rain G.; Bartlett, Marian Stewart; Dobkins, Karen R.
Several lines of evidence suggest that the image statistics of the environment shape visual abilities. To date, the image statistics of natural scenes and faces have been well characterized using Fourier analysis. We employed Fourier analysis to characterize images of signs in American Sign Language (ASL). These images are highly relevant to signers who rely on ASL for communication, and thus the image statistics of ASL might influence signers' visual abilities. Fourier analysis was conducted on 105 static images of signs, and these images were compared with analyses of 100 natural scene images and 100 face images. We obtained two metrics from our Fourier analysis: mean amplitude and entropy of the amplitude across the image set (which is a measure from information theory) as a function of spatial frequency and orientation. The results of our analyses revealed interesting differences in image statistics across the three different image sets, setting up the possibility that ASL experience may alter visual perception in predictable ways. In addition, for all image sets, the mean amplitude results were markedly different from the entropy results, which raises the interesting question of which aspect of an image set (mean amplitude or entropy of the amplitude) is better able to account for known visual abilities.
Turki, Houssem; Ben Halima, Mohamed; Alimi, Adel M.
Text detection in natural scenes holds great importance in the field of research and still remains a challenge and an important task because of size, various fonts, line orientation, different illumination conditions, weak characters and complex backgrounds in image. The contribution of our proposed method is to filtering out complex backgrounds by combining three strategies. These are enhancing the edge candidate detection in HSV space color, then using MSER candidate detection to get different masks applied in HSV space color as well as gray color. After that, we opt for the Stroke Width Transform (SWT) and heuristic filtering. Such strategies are followed so as to maximize the capacity of zones text pixels candidates and distinguish between text boxes and the rest of the image. The non-text components are filtered by classifying the characters candidates based on Support Vector Machines (SVM) using Histogram of Oriented Gradients (HOG) features. Finally we apply boundary box localization after a stage of word grouping where false positives are eliminated by geometrical properties of text blocks. The proposed method has been evaluated on ICDAR 2013 scene text detection competition dataset and the encouraging experiments results demonstrate the robustness of our method.
Keefe, Bruce D; Wincenciak, Joanna; Jellema, Tjeerd; Ward, James W; Barraclough, Nick E
When observing another individual's actions, we can both recognize their actions and infer their beliefs concerning the physical and social environment. The extent to which visual adaptation influences action recognition and conceptually later stages of processing involved in deriving the belief state of the actor remains unknown. To explore this we used virtual reality (life-size photorealistic actors presented in stereoscopic three dimensions) to see how visual adaptation influences the perception of individuals in naturally unfolding social scenes at increasingly higher levels of action understanding. We presented scenes in which one actor picked up boxes (of varying number and weight), after which a second actor picked up a single box. Adaptation to the first actor's behavior systematically changed perception of the second actor. Aftereffects increased with the duration of the first actor's behavior, declined exponentially over time, and were independent of view direction. Inferences about the second actor's expectation of box weight were also distorted by adaptation to the first actor. Distortions in action recognition and actor expectations did not, however, extend across different actions, indicating that adaptation is not acting at an action-independent abstract level but rather at an action-dependent level. We conclude that although adaptation influences more complex inferences about belief states of individuals, this is likely to be a result of adaptation at an earlier action recognition stage rather than adaptation operating at a higher, more abstract level in mentalizing or simulation systems.
Yao, Angela Y J; Einhäuser, Wolfgang
Color has an unresolved role in natural scene recognition. Whereas rapid serial visual presentation paradigms typically find no advantage for colored over grayscale scenes, color seems to play a decisive role for recognition memory. The distinction between detection and memorization has not been addressed directly in one paradigm. Here we asked ten observers to detect animals in 2-s 20 Hz sequences. Each sequence consisted of two 1-s segments, one of grayscale images and one of colored; each segment contained one or no target, totaling zero, one, or two targets per sequence. In one-target sequences, hit rates were virtually the same for targets appearing in the first or second segment, as well as for grayscale and colored targets, though observers were more confident about detecting colored targets. In two-target sequences, observers preferentially reported the second of two identical targets, in comparison to categorically related (same-species animals) or unrelated (different-species animals) targets. Observers also showed a strong preference for reporting colored targets, though only when targets were of different species. Our findings suggest that color has little effect on detection, but is used in later stages of processing. We may speculate that color ensures preferential access to or retrieval from memory when distinct items must be rapidly remembered.
Valsecchi, Matteo; Gegenfurtner, Karl R
Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s) were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as images containing cars and houses, which are mostly identified by the presence of distinctive artifacts rather than by their spatial layout. Evidence from a further experiment indicates that observers do not retain a trace of stereo presentation in long-term memory.
Full Text Available Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as images containing cars and houses, which are mostly identified by the presence of distinctive artifacts rather than by their spatial layout. Evidence from a further experiment indicates that observers do not retain a trace of stereo presentation in long-term memory.
Valsecchi, Matteo; Gegenfurtner, Karl R.
Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s) were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as images containing cars and houses, which are mostly identified by the presence of distinctive artifacts rather than by their spatial layout. Evidence from a further experiment indicates that observers do not retain a trace of stereo presentation in long-term memory. PMID:23166799
Ishida, Taiichiro; Ikeda, Mitsuo
We conducted an experiment in which subjects observed a picture of a natural scene while the picture was displaced according to a subject’s saccades. The threshold displacement ratio (the length of picture displacement/the length of saccade) that allowed subjects to perceive the stable picture was measured. In experiment 1 the threshold ratio was measured for each of five pictures when the picture was displaced in the same direction as each saccade. In experiment 2 the direction of the picture displacement was set in the same, opposite or orthogonal to the movement of the eye to examine effects of the relative displacement direction. The results showed that the subjects perceived a picture as stable despite fairly large displacement during saccades; the threshold ratio was influenced by the pictures and ranged from 18% to 26%. It was also found that the displacement in the same direction as the eye was more detectable than that in the opposite direction.
Lovell, P. G.; Tolhurst, D. J.; Párraga, C. A.; Baddeley, R.; Leonards, U.; Troscianko, J.; Troscianko, T.
Illumination varies greatly both across parts of a natural scene and as a function of time, whereas the spectral reflectance function of surfaces remains more stable and is of much greater relevance when searching for specific targets. This study investigates the functional properties of postreceptoral opponent-channel responses, in particular regarding their stability against spatial and temporal variation in illumination. We studied images of natural scenes obtained in UK and Uganda with digital cameras calibrated to produce estimated L-, M-, and S-cone responses of trichromatic primates (human) and birds (starling). For both primates and birds we calculated luminance and red-green opponent (RG) responses. We also calculated a primate blue-yellow-opponent (BY) response. The BY response varies with changes in illumination, both across time and across the image, rendering this factor less invariant. The RG response is much more stable than the BY response across such changes in illumination for primates, less so for birds. These differences between species are due to the greater separation of bird L and M cones in wavelength and the narrower bandwidth of the cone action spectra. This greater separation also produces a larger chromatic signal for a given change in spectral reflectance. Thus bird vision seems to suffer a greater degree of spatiotemporal ``clutter'' than primate vision, but also enhances differences between targets and background. Therefore, there may be a trade-off between the degree of chromatic clutter in a visual system versus the degree of chromatic difference between a target and its background. Primate and bird visual systems have found different solutions to this trade-off.
Selam W. Habtegiorgis
Full Text Available Image skew is one of the prominent distortions that exist in optical elements, such as in spectacle lenses. The present study evaluates adaptation to image skew in dynamic natural images. Moreover, the cortical levels involved in skew coding were probed using retinal specificity of skew adaptation aftereffects. Left and right skewed natural image sequences were shown to observers as adapting stimuli. The point of subjective equality (PSE, i.e., the skew amplitude in simple geometrical patterns that is perceived to be unskewed, was used to quantify the aftereffect of each adapting skew direction. The PSE, in a two-alternative forced choice paradigm, shifted toward the adapting skew direction. Moreover, significant adaptation aftereffects were obtained not only at adapted, but also at non-adapted retinal locations during fixation. Skew adaptation information was transferred partially to non-adapted retinal locations. Thus, adaptation to skewed natural scenes induces coordinated plasticity in lower and higher cortical areas of the visual pathway.
computer vision and natural language processing ( NLP ) and leveraging transformative advances in “deep” machine learning. Most prior work on NL-description of...generate descriptions of videos. 2.1 Background: Language and Vision Both natural language processing ( NLP ) and computer vision (CV) have made great strides...of work at the intersection of NLP and CV on topics like connecting words to pictures [8, 9, 22], describing images in natural language (NL) [30, 53
Naber, Marnix; Hilger, Maximilian; Einhäuser, Wolfgang
Humans process natural scenes rapidly and accurately. Low-level image features and emotional valence affect such processing but have mostly been studied in isolation. At which processing stage these factors operate and how they interact has remained largely unaddressed. Here, we briefly presented natural images and asked observers to report the presence or absence of an animal (detection), species of the detected animal (identification), and their confidence. In a second experiment, the same observers rated images with respect to their emotional affect and estimated their anxiety when imagining a real-life encounter with the depicted animal. We found that detection and identification improved with increasing image luminance, background contrast, animal saturation, and luminance plus color contrast between target and background. Surprisingly, animals associated with lower anxiety were detected faster and identified with higher confidence, and emotional affect was a better predictor of performance than anxiety. Pupil size correlated with detection, identification, and emotional valence judgments at different time points after image presentation. Remarkably, images of threatening animals induced smaller pupil sizes, and observers with higher mean anxiety ratings had smaller pupils on average. In sum, rapid visual processing depends on contrasts between target and background features rather than overall visual context, is negatively affected by anxiety, and finds its processing stages differentially reflected in the pupillary response.
Vasserman, Genadiy; Schneidman, Elad; Segev, Ronen
The visual system continually adjusts its sensitivity to the statistical properties of the environment through an adaptation process that starts in the retina. Colour perception and processing is commonly thought to occur mainly in high visual areas, and indeed most evidence for chromatic colour contrast adaptation comes from cortical studies. We show that colour contrast adaptation starts in the retina where ganglion cells adjust their responses to the spectral properties of the environment. We demonstrate that the ganglion cells match their responses to red-blue stimulus combinations according to the relative contrast of each of the input channels by rotating their functional response properties in colour space. Using measurements of the chromatic statistics of natural environments, we show that the retina balances inputs from the two (red and blue) stimulated colour channels, as would be expected from theoretical optimal behaviour. Our results suggest that colour is encoded in the retina based on the efficient processing of spectral information that matches spectral combinations in natural scenes on the colour processing level. PMID:24205373
Full Text Available The visual system continually adjusts its sensitivity to the statistical properties of the environment through an adaptation process that starts in the retina. Colour perception and processing is commonly thought to occur mainly in high visual areas, and indeed most evidence for chromatic colour contrast adaptation comes from cortical studies. We show that colour contrast adaptation starts in the retina where ganglion cells adjust their responses to the spectral properties of the environment. We demonstrate that the ganglion cells match their responses to red-blue stimulus combinations according to the relative contrast of each of the input channels by rotating their functional response properties in colour space. Using measurements of the chromatic statistics of natural environments, we show that the retina balances inputs from the two (red and blue stimulated colour channels, as would be expected from theoretical optimal behaviour. Our results suggest that colour is encoded in the retina based on the efficient processing of spectral information that matches spectral combinations in natural scenes on the colour processing level.
Full Text Available One of the most important issues in the study of cognition is to understand which are the factors determining internal representation of the external world. Previous literature has started to highlight the impact of low-level sensory features (indexed by saliency-maps in driving attention selection, hence increasing the probability for objects presented in complex and natural scenes to be successfully encoded into working memory(WM and then correctly remembered. Here we asked whether the probability of retrieving high-saliency objects modulates the overall contents of WM, by decreasing the probability of retrieving other, lower-saliency objects. We presented pictures of natural scenes for 4 secs. After a retention period of 8 secs, we asked participants to verbally report as many objects/details as possible of the previous scenes. We then computed how many times the objects located at either the peak of maximal or minimal saliency in the scene (as indexed by a saliency-map; Itti et al., 1998 were recollected by participants. Results showed that maximal-saliency objects were recollected more often and earlier in the stream of successfully reported items than minimal-saliency objects. This indicates that bottom-up sensory salience increases the recollection probability and facilitates the access to memory representation at retrieval, respectively. Moreover, recollection of the maximal- (but not the minimal- saliency objects predicted the overall amount of successfully recollected objects: The higher the probability of having successfully reported the most-salient object in the scene, the lower the amount of recollected objects. These findings highlight that bottom-up sensory saliency modulates the current contents of WM during recollection of objects from natural scenes, most likely by reducing available resources to encode and then retrieve other (lower saliency objects.
Full Text Available Though atrophy of the medial temporal lobe, including structures (hippocampus and parahippocampal cortex that support scene perception and the binding of an object to its context, appears early in Alzheimer disease (AD few studies have investigated scene perception in people with AD. We assessed the ability to find a target object within a natural scene in people with typical AD and in people with atypical AD (posterior cortical atrophy. Pairs of colored photographs were displayed left and right of fixation for one second. Participants were asked to categorize the target (an animal either in moving their eyes toward the photograph containing the target (saccadic choice task or in pressing a key corresponding to the location of the target (manual choice task in separate blocks of trials. For both tasks performance was compared in two conditions: with isolated objects and with objects in scenes. Patients with atypical AD were more impaired to detect a target within a scene than people with typical AD who exhibited a pattern of performance more similar to that of age-matched controls in terms of accuracy, saccade latencies and benefit from contextual information. People with atypical AD benefited less from contextual information in both the saccade and the manual choice tasks suggesting a higher sensitivity to crowding and deficits in figure/ground segregation in people with lesions in posterior areas of the brain.
Marx, Svenja; Hansen-Goos, Onno; Thrun, Michael; Einhäuser, Wolfgang
The exact function of color vision for natural-scene perception has remained puzzling. In rapid serial visual presentation (RSVP) tasks, categorically defined targets (e.g., animals) are detected typically slightly better for color than for grayscale stimuli. Here we test the effect of color on animal detection, recognition, and the attentional blink. We present color and grayscale RSVP sequences with up to two target images (animals) embedded. In some conditions, we modify either the hue or the intensity of each pixel. We confirm a benefit of color over grayscale images for animal detection over a range of stimulus onset asynchronies (SOAs), with improved hit rates from 50 to 120 ms and overall improved performance from 90 to 120 ms. For stimuli in which the hue is inverted, performance is similar to grayscale for small SOAs and indistinguishable from original color only for large SOAs. For subordinate category discrimination, color provides no additional benefit. Color and grayscale sequences show an attentional blink, but differences between color and grayscale are fully explained by single-target differences, ruling out the possibility that the color benefit is purely attentional. © 2014 ARVO.
Full Text Available The present study investigates the influence of goalkeeper displacement on goal-side selection in soccer penalty kicking. Facing a penalty situation, participants viewed photo-realistic images of a goalkeeper and a soccer goal. In the action selection task, they were asked to kick to the greater goal side, and in the perception task, they indicated the position of the goalkeeper on the goal line. To this end, the goalkeeper was depicted in a regular goalkeeping posture, standing either in the exact middle of the goal or being displaced at different distances to the left or right of the goal’s center. Results showed that the goalkeeper’s position on the goal line systematically affected goal-side selection, even when participants were not mindful of the displacement. These findings provide further support for the notion that the implicit processing of the stimulus layout in natural scenes can effect action selection in complex environments, such in soccer penalty shooting.
Bohlin, Gustav; Göransson, Andreas; Höst, Gunnar E.; Tibell, Lena A. E.
Educational videos on the Internet comprise a vast and highly diverse source of information. Online search engines facilitate access to numerous videos claiming to explain natural selection, but little is known about the degree to which the video content match key evolutionary content identified as important in evolution education research. In this study, we therefore analyzed the content of 60 videos accessed through the Internet, using a criteria catalog with 38 operationalized variables derived from research literature. The variables were sorted into four categories: (a) key concepts (e.g. limited resources and inherited variation), (b) threshold concepts (abstract concepts with a transforming and integrative function), (c) misconceptions (e.g. that evolution is driven by need), and (d) organismal context (e.g. animal or plant). The results indicate that some concepts are frequently communicated, and certain taxa are commonly used to illustrate concepts, while others are seldom included. In addition, evolutionary phenomena at small temporal and spatial scales, such as subcellular processes, are rarely covered. Rather, the focus is on population-level events over time scales spanning years or longer. This is consistent with an observed lack of explanations regarding how randomly occurring mutations provide the basis for variation (and thus natural selection). The findings imply, among other things, that some components of natural selection warrant far more attention in biology teaching and science education research.
Gygi, Brian; Shafiro, Valeriy
Previously, Gygi and Shafiro (2011) found that when environmental sounds are semantically incongruent with the background scene (e.g., horse galloping in a restaurant), they can be identified more accurately by young normal-hearing listeners (YNH) than sounds congruent with the scene (e.g., horse galloping at a racetrack). This study investigated how age and high-frequency audibility affect this Incongruency Advantage (IA) effect. In Experiments 1a and 1b, elderly listeners ( N = 18 for 1a; N = 10 for 1b) with age-appropriate hearing (EAH) were tested on target sounds and auditory scenes in 5 sound-to-scene ratios (So/Sc) between -3 and -18 dB. Experiment 2 tested 11 YNH on the same sound-scene pairings lowpass-filtered at 4 kHz (YNH-4k). The EAH and YNH-4k groups exhibited an almost identical pattern of significant IA effects, but both were at approximately 3.9 dB higher So/Sc than the previously tested YNH listeners. However, the psychometric functions revealed a shallower slope for EAH listeners compared with YNH listeners for the congruent stimuli only, suggesting a greater difficulty for the EAH listeners in attending to sounds expected to occur in a scene. These findings indicate that semantic relationships between environmental sounds in soundscapes are mediated by both audibility and cognitive factors and suggest a method for dissociating these factors.
Madden, Christopher S.; Richards, Noel J.; Culpepper, Joanne B.
This paper investigates the ability to develop synthetic scenes in an image generation tool, E-on Vue, and a gaming engine, Unity 3D, which can be used to generate synthetic imagery of target objects across a variety of conditions in land environments. Developments within these tools and gaming engines have allowed the computer gaming industry to dramatically enhance the realism of the games they develop; however they utilise short cuts to ensure that the games run smoothly in real-time to create an immersive effect. Whilst these short cuts may have an impact upon the realism of the synthetic imagery, they do promise a much more time efficient method of developing imagery of different environmental conditions and to investigate the dynamic aspect of military operations that is currently not evaluated in signature analysis. The results presented investigate how some of the common image metrics used in target acquisition modelling, namely the Δμ1, Δμ2, Δμ3, RSS, and Doyle metrics, perform on the synthetic scenes generated by E-on Vue and Unity 3D compared to real imagery of similar scenes. An exploration of the time required to develop the various aspects of the scene to enhance its realism are included, along with an overview of the difficulties associated with trying to recreate specific locations as a virtual scene. This work is an important start towards utilising virtual worlds for visible signature evaluation, and evaluating how equivalent synthetic imagery is to real photographs.
Kwon, M-W; Kim, S-C; Yoon, S-E; Ho, Y-S; Kim, E-S
A new object tracking mask-based novel-look-up-table (OTM-NLUT) method is proposed and implemented on graphics-processing-units (GPUs) for real-time generation of holographic videos of three-dimensional (3-D) scenes. Since the proposed method is designed to be matched with software and memory structures of the GPU, the number of compute-unified-device-architecture (CUDA) kernel function calls and the computer-generated hologram (CGH) buffer size of the proposed method have been significantly reduced. It therefore results in a great increase of the computational speed of the proposed method and enables real-time generation of CGH patterns of 3-D scenes. Experimental results show that the proposed method can generate 31.1 frames of Fresnel CGH patterns with 1,920 × 1,080 pixels per second, on average, for three test 3-D video scenarios with 12,666 object points on three GPU boards of NVIDIA GTX TITAN, and confirm the feasibility of the proposed method in the practical application of electro-holographic 3-D displays.
Full Text Available Extensive research has been conducted on the effects of natural environment on people’s well-being, starting with the short term restoring effects on the brain, and continuing with the long-term effects on the emotional self-regulating processes. In the present research we have focused on the latter, trying to connect two of the problems in our world: the violent behavior, and the preservation of natural environment. Thus, the objective was to study the effects of watching a video from nature wild life on anger (the feeling and its expression, and state-anxiety. The statistical analysis indicated that, while there were no significant differences regarding anxiety (worry, internal tension or general mechanisms in dealing with fury, watching the video significantly decreased the feeling of anger, and the tendency to express it either verbally or physically. As a main conclusion we highlight the link between the accessibility of natural environment, and the violent expressions of anger.
Full Text Available Synaptic Noise is thought to be a limiting factor for computational efficiency in the Brain. In visual cortex (V1, ongoing activity is present in vivo, and spiking responses to simple stimuli are highly unreliable across trials. Stimulus statistics used to plot receptive fields, however, are quite different from those experienced during natural visuomotor exploration. We recorded V1 neurons intracellularly in the anaesthetized and paralyzed cat and compared their spiking and synaptic responses to full field natural images animated by simulated eye-movements to those evoked by simpler (grating or higher dimensionality statistics (dense noise. In most cells, natural scene animation was the only condition where high temporal precision (in the 10-20 ms range was maintained during sparse and reliable activity. At the subthreshold level, irregular but highly reproducible membrane potential dynamics were observed, even during long (several 100 ms spike-less periods. We showed that both the spatial structure of natural scenes and the temporal dynamics of eye-movements increase the signal-to-noise ratio by a non linear amplification of the signal combined with a reduction of the subthreshold contextual noise. These data support the view that the sparsening and the time precision of the neural code in V1 may depend primarily on three factors: 1 broadband input spectrum: the bandwidth must be rich enough for recruiting optimally the diversity of spatial and time constants during recurrent processing; 2 tight temporal interplay of excitation and inhibition: conductance measurements demonstrate that natural scene statistics narrow selectively the duration of the spiking opportunity window during which the balance between excitation and inhibition changes transiently and reversibly; 3 signal energy in the lower frequency band: a minimal level of power is needed below 10 Hz to reach consistently the spiking threshold, a situation rarely reached with visual
Mathiak, Klaus; Weber, René
Modern video games represent highly advanced virtual reality simulations and often contain virtual violence. In a significant amount of young males, playing video games is a quotidian activity, making it an almost natural behavior. Recordings of brain activation with functional magnetic resonance imaging (fMRI) during gameplay may reflect neuronal correlates of real-life behavior. We recorded 13 experienced gamers (18-26 years; average 14 hrs/week playing) while playing a violent first-person shooter game (a violent computer game played in self-perspective) by means of distortion and dephasing reduced fMRI (3 T; single-shot triple-echo echo-planar imaging [EPI]). Content analysis of the video and sound with 100 ms time resolution achieved relevant behavioral variables. These variables explained significant signal variance across large distributed networks. Occurrence of violent scenes revealed significant neuronal correlates in an event-related design. Activation of dorsal and deactivation of rostral anterior cingulate and amygdala characterized the mid-frontal pattern related to virtual violence. Statistics and effect sizes can be considered large at these areas. Optimized imaging strategies allowed for single-subject and for single-trial analysis with good image quality at basal brain structures. We propose that virtual environments can be used to study neuronal processes involved in semi-naturalistic behavior as determined by content analysis. Importantly, the activation pattern reflects brain-environment interactions rather than stimulus responses as observed in classical experimental designs. We relate our findings to the general discussion on social effects of playing first-person shooter games. (c) 2006 Wiley-Liss, Inc.
Foulsham, Tom; Kingstone, Alan
Research investigating scene perception normally involves laboratory experiments using static images. Much has been learned about how observers look at pictures of the real world and the attentional mechanisms underlying this behaviour. However, the use of static, isolated pictures as a proxy for studying everyday attention in real environments has led to the criticism that such experiments are artificial. We report a new study that tests the extent to which the real world can be reduced to simpler laboratory stimuli. We recorded the gaze of participants walking on a university campus with a mobile eye tracker, and then showed static frames from this walk to new participants, in either a random or sequential order. The aim was to compare the gaze of participants walking in the real environment with fixations on pictures of the same scene. The data show that picture order affects interobserver fixation consistency and changes looking patterns. Critically, while fixations on the static images overlapped significantly with the actual real-world eye movements, they did so no more than a model that assumed a general bias to the centre. Remarkably, a model that simply takes into account where the eyes are normally positioned in the head-independent of what is actually in the scene-does far better than any other model. These data reveal that viewing patterns to static scenes are a relatively poor proxy for predicting real world eye movement behaviour, while raising intriguing possibilities for how to best measure attention in everyday life. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Juan Manuel Galeazzi
Full Text Available Neurons that respond to visual targets in a hand-centred frame of reference have been found within various areas of the primate brain. We investigate how hand-centred visual representations may develop in a neural network model of the primate visual system called VisNet, when the model is trained on images of the hand seen against natural visual scenes. The simulations show how such neurons may develop through a biologically plausible process of unsupervised competitive learning and self-organisation. In an advance on our previous work, the visual scenes consisted of multiple targets presented simultaneously with respect to the hand. Three experiments are presented. First, VisNet was trained with computerized images consisting of a realistic image of a hand and and a variety of natural objects, presented in different textured backgrounds during training. The network was then tested with just one textured object near the hand in order to verify if the output cells were capable of building hand-centered representations with a single localised receptive field. We explain the underlying principles of the statistical decoupling that allows the output cells of the network to develop single localised receptive fields even when the network is trained with multiple objects. In a second simulation we examined how some of the cells with hand-centred receptive fields decreased their shape selectivity and started responding to a localised region of hand-centred space as the number of objects presented in overlapping locations during training increases. Lastly, we explored the same learning principles training the network with natural visual scenes collected by volunteers. These results provide an important step in showing how single, localised, hand-centered receptive fields could emerge under more ecologically realistic visual training conditions.
Full Text Available The paper presents an NP-video rendering system based on natural phenomena. It provides a simple nonphotorealistic video synthesis system in which user can obtain a flow-like stylization painting and infinite video scene. Firstly, based on anisotropic Kuwahara filtering in conjunction with line integral convolution, the phenomena video scene can be rendered to flow-like stylization painting. Secondly, the methods of frame division, patches synthesis, will be used to synthesize infinite playing video. According to selection examples from different natural video texture, our system can generate stylized of flow-like and infinite video scenes. The visual discontinuities between neighbor frames are decreased, and we also preserve feature and details of frames. This rendering system is easy and simple to implement.
Full Text Available Atmospheric turbulence is a naturally occurring phenomenon that can severely degrade the quality of long-range surveillance video footage. Major effects include image blurring, image warping and temporal wavering of objects in the scene. Mitigating...
Hunter, MaryCarol R; Askarinejad, Ali
It is well-established that the experience of nature produces an array of positive benefits to mental well-being. Much less is known about the specific attributes of green space which produce these effects. In the absence of translational research that links theory with application, it is challenging to design urban green space for its greatest restorative potential. This translational research provides a method for identifying which specific physical attributes of an environmental setting are most likely to influence preference and restoration responses. Attribute identification was based on a triangulation process invoking environmental psychology and aesthetics theories, principles of design founded in mathematics and aesthetics, and empirical research on the role of specific physical attributes of the environment in preference or restoration responses. From this integration emerged a list of physical attributes defining aspects of spatial structure and environmental content found to be most relevant to the perceptions involved with preference and restoration. The physical attribute list offers a starting point for deciphering which scene stimuli dominate or collaborate in preference and restoration responses. To support this, functional definitions and metrics-efficient methods for attribute quantification are presented. Use of these research products and the process for defining place-based metrics can provide (a) greater control in the selection and interpretation of the scenes/images used in tests of preference and restoration and (b) an expanded evidence base for well-being designers of the built environment.
Vision is only a part of a system that converts visual information into knowledge structures. These structures drive the vision process, resolving ambiguity and uncertainty via feedback, and provide image understanding, which is an interpretation of visual information in terms of these knowledge models. These mechanisms provide a reliable recognition if the object is occluded or cannot be recognized as a whole. It is hard to split the entire system apart, and reliable solutions to the target recognition problems are possible only within the solution of a more generic Image Understanding Problem. Brain reduces informational and computational complexities, using implicit symbolic coding of features, hierarchical compression, and selective processing of visual information. Biologically inspired Network-Symbolic representation, where both systematic structural/logical methods and neural/statistical methods are parts of a single mechanism, is the most feasible for such models. It converts visual information into relational Network-Symbolic structures, avoiding artificial precise computations of 3-dimensional models. Network-Symbolic Transformations derive abstract structures, which allows for invariant recognition of an object as exemplar of a class. Active vision helps creating consistent models. Attention, separation of figure from ground and perceptual grouping are special kinds of network-symbolic transformations. Such Image/Video Understanding Systems will be reliably recognizing targets.
Hickey, Clayton; Peelen, Marius V
be affected by outcome that occurs later in time? Here, we show that reward acts on lingering representations of environmental stimuli that sustain through the interval between stimulus and outcome. Using naturalistic scene stimuli and multivariate pattern analysis of fMRI data, we show that reward boosts the representation of attended objects and reduces the representation of unattended objects. This interaction of attention and reward processing acts to prime vision for stimuli that may serve to predict outcome. Copyright © 2017 the authors 0270-6474/17/377297-08$15.00/0.
Bluff, Lucas A; Rutz, Christian
Video tracking is a powerful new tool for studying natural undisturbed behaviour in a wide range of birds, mammals and reptiles. Using integrated animal-borne video tags, video footage and positional data are recorded simultaneously from wild free-ranging animals. At the analysis stage, video scenes are linked to radio fixes, yielding an animal's eye view of resource use and social interactions along a known movement trajectory. Here, we provide a brief description of our basic equipment and ...
Pongakkasira, Kaewmart; Bindemann, Markus
Human face detection might be driven by skin-coloured face-shaped templates. To explore this idea, this study compared the detection of faces for which the natural height-to-width ratios were preserved with distorted faces that were stretched vertically or horizontally. The impact of stretching on detection performance was not obvious when faces were equated to their unstretched counterparts in terms of their height or width dimension (Experiment 1). However, stretching impaired detection when the original and distorted faces were matched for their surface area (Experiment 2), and this was found with both vertically and horizontally stretched faces (Experiment 3). This effect was evident in accuracy, response times, and also observers' eye movements to faces. These findings demonstrate that height-to-width ratios are an important component of the cognitive template for face detection. The results also highlight important differences between face detection and face recognition. Copyright © 2015 Elsevier Ltd. All rights reserved.
Zhu, Yingying; Zhou, Dongru
Scene change detection is an essential step to automatic and content-based video indexing, retrieval and browsing. In this paper, a robust scene change detection and classification approach is presented, which analyzes audio, visual and textual sources and accounts for their inter-relations and coincidence to semantically identify and classify video scenes. Audio analysis focuses on the segmentation of audio stream into four types of semantic data such as silence, speech, music and environmental sound. Further processing on speech segments aims at locating speaker changes. Video analysis partitions visual stream into shots. Text analysis can provide a supplemental source of clues for scene classification and indexing information. We integrate the video and audio analysis results to identify video scenes and use the text information detected by the video OCR technology or derived from transcripts available to refine scene classification. Results from single source segmentation are in some cases suboptimal. By combining visual, aural features adn the accessorial text information, the scence extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to rather promising.
Surlykke, Annemarie; Ghose, Kaushik; Moss, Cynthia F
without fixating each important object separately. We tested this idea by measuring the directional aim and duration of the bat's sonar beam as it performed in a dual task, obstacle avoidance and insect capture. Bats were trained to fly through one of two openings in a fine net to take a tethered insect...... pointed and shifted their sonar gaze to sequentially inspect closely spaced objects in a manner similar to visual animals using saccades and fixations to scan a scene. The findings presented here from a specialized orientation system, echolocation, offer insights into general principles of active sensing...
Full Text Available Video inpainting or completion is a vital video improvement technique used to repair or edit digital videos. This paper describes a framework for temporally consistent video completion. The proposed method allows to remove dynamic objects or restore missing or tainted regions presented in a video sequence by utilizing spatial and temporal information from neighboring scenes. Masking algorithm is used for detection of scratches or damaged portions in video frames. The algorithm iteratively performs the following operations: achieve frame; update the scene model; update positions of moving objects; replace parts of the frame occupied by the objects marked for remove by using a background model. In this paper, we extend an image inpainting algorithm based texture and structure reconstruction by incorporating an improved strategy for video. Our algorithm is able to deal with a variety of challenging situations which naturally arise in video inpainting, such as the correct reconstruction of dynamic textures, multiple moving objects and moving background. Experimental comparisons to state-of-the-art video completion methods demonstrate the effectiveness of the proposed approach. It is shown that the proposed spatio-temporal image inpainting method allows restoring a missing blocks and removing a text from the scenes on videos.
Quehl, Bernhard; Yang, Haojin; Sack, Harald
Video texts are closely related to the content of a video. They provide a valuable source for indexing and interpretation of video data. Text detection and recognition task in images or videos typically distinguished between overlay and scene text. Overlay text is artificially superimposed on the image at the time of editing and scene text is text captured by the recording system. Typically, OCR systems are specialized on one kind of text type. However, in video images both types of text can be found. In this paper, we propose a method to automatically distinguish between overlay and scene text to dynamically control and optimize post processing steps following text detection. Based on a feature combination a Support Vector Machine (SVM) is trained to classify scene and overlay text. We show how this distinction in overlay and scene text improves the word recognition rate. Accuracy of the proposed methods has been evaluated by using publicly available test data sets.
Abdollahian, Golnaz; Delp, Edward J.
Although considerable work has been done in management of "structured" video such as movies, sports, and television programs that has known scene structures, "unstructured" video analysis is still a challenging problem due to its unrestricted nature. The purpose of this paper is to address issues in the analysis of unstructured video and in particular video shot by a typical unprofessional user (i.e home video). We describe how one can make use of camera motion information for unstructured video analysis. A new concept, "camera viewing direction," is introduced as the building block of home video analysis. Motion displacement vectors are employed to temporally segment the video based on this concept. We then find the correspondence between the camera behavior with respect to the subjective importance of the information in each segment and describe how different patterns in the camera motion can indicate levels of interest in a particular object or scene. By extracting these patterns, the most representative frames, keyframes, for the scenes are determined and aggregated to summarize the video sequence.
MaryCarol R. Hunter
Full Text Available It is well-established that the experience of nature produces an array of positive benefits to mental wellbeing. Much less is known about the specific attributes of green space which produce these effects. In the absence of translational research that links theory with application, it is challenging to design urban green space for its greatest restorative potential. This translational research provides a method for identifying which specific physical attributes of an environmental setting are most likely to influence preference and restoration responses. Attribute identification was based on a triangulation process invoking environmental psychology and aesthetics theories, principles of design founded in mathematics and aesthetics, and empirical research on the role of specific physical attributes of the environment in preference or restoration responses. From this integration emerged a list of physical attributes defining aspects of spatial structure and environmental content found to be most relevant to the perceptions involved with preference and restoration. The physical attribute list offers a starting point for deciphering which scene stimuli dominate or collaborate in preference and restoration responses. To support this, functional definitions and metrics - efficient methods for attribute quantification are presented. Use of these research products can provide a greater control in the selection and interpretation of the scenes/images used in tests of preference and restoration and b an expanded evidence base for wellbeing designers of the built environment.
Full Text Available Several studies have reported that task instructions influence eye-movement behavior during static image observation. In contrast, during dynamic scene observation we show that while the specificity of the goal of a task influences observers' beliefs about where they look, the goal does not in turn influence eye-movement patterns. In our study observers watched short video clips of a single tennis match and were asked to make subjective judgments about the allocation of visual attention to the items presented in the clip (e.g., ball, players, court lines, and umpire. However, before attending to the clips, observers were either told to simply watch clips (non-specific goal, or they were told to watch the clips with a view to judging which of the two tennis players was awarded the point (specific goal. The results of subjective reports suggest that observers believed that they allocated their attention more to goal-related items (e.g. court lines if they performed the goal-specific task. However, we did not find the effect of goal specificity on major eye-movement parameters (i.e., saccadic amplitudes, inter-saccadic intervals, and gaze coherence. We conclude that the specificity of a task goal can alter observer's beliefs about their attention allocation strategy, but such task-driven meta-attentional modulation does not necessarily correlate with eye-movement behavior.
Moezzi, Saied; Katkere, Arun L.; Jain, Ramesh C.
Interactive video and television viewers should have the power to control their viewing position. To make this a reality, we introduce the concept of Immersive Video, which employs computer vision and computer graphics technologies to provide remote users a sense of complete immersion when viewing an event. Immersive Video uses multiple videos of an event, captured from different perspectives, to generate a full 3D digital video of that event. That is accomplished by assimilating important information from each video stream into a comprehensive, dynamic, 3D model of the environment. Using this 3D digital video, interactive viewers can then move around the remote environment and observe the events taking place from any desired perspective. Our Immersive Video System currently provides interactive viewing and `walkthrus' of staged karate demonstrations, basketball games, dance performances, and typical campus scenes. In its full realization, Immersive Video will be a paradigm shift in visual communication which will revolutionize television and video media, and become an integral part of future telepresence and virtual reality systems.
Full Text Available This paper presents a novel object detection method using a single instance from the object category. Our method uses biologically inspired global scene context criteria to check whether every individual location of the image can be naturally replaced by the query instance, which indicates whether there is a similar object at this location. Different from the traditional detection methods that only look at individual locations for the desired objects, our method evaluates the consistency of the entire scene. It is therefore robust to large intra-class variations, occlusions, a minor variety of poses, low-revolution conditions, background clutter etc., and there is no off-line training. The experimental results on four datasets and two video sequences clearly show the superior robustness of the proposed method, suggesting that global scene context is important for visual detection/localization.
Devillez, Hélène; Guyader, Nathalie; Guérin-Dugué, Anne
The P300 event-related potential has been extensively studied in electroencephalography with classical paradigms that force observers to not move their eyes. This potential is classically used to infer whether a target or a task-relevant stimulus was presented. Few researches have studied this potential through more ecological paradigms where observers were able to move their eyes. In this study, we examined with an ecological paradigm and an adapted methodology the P300 potential using a visual search task that involves eye movements to actively explore natural scenes and during which eye movements and electroencephalographic activity were coregistered. Averaging the electroencephalography signal time-locked to fixation onsets, a P300 potential was observed for fixations onto the target object but not for other fixations recorded for the same visual search or for fixations recorded during the free viewing without any task. Our approach consists of using control experimental conditions with similar eye movements to ensure that the P300 potential was attributable to the fact that the observer gazed at the target rather than to other factors such as eye movement pattern (the size of the previous saccade) or the "overlap issue" between the potentials elicited by two successive fixations. We also proposed to model the time overlap issue of the potentials elicited by consecutive fixations with various durations. Our results show that the P300 potential can be studied in ecological situations without any constraint on the type of visual exploration, with some precautions in the interpretation of results due to the overlap issue.
Marín Arraiza, Paloma; Plank, Margret; Löwe, Peter
Scientific audiovisual media such as videos of research, interactive displays or computer animations has become an important part of scientific communication and education. Dynamic phenomena can be described better by audiovisual media than by words and pictures. For this reason, scientific videos help us to understand and discuss environmental phenomena more efficiently. Moreover, the creation of scientific videos is easier than ever, thanks to mobile devices and open source editing software. Video-clips, webinars or even the interactive part of a PICO are formats of scientific audiovisual media used in the Geosciences. This type of media translates the location-referenced Science Communication such as environmental interpretation into computed-based Science Communication. A new way of Science Communication is video abstracting. A video abstract is a three- to five-minute video statement that provides background information about a research paper. It also gives authors the opportunity to present their research activities to a wider audience. Since this kind of media have become an important part of scientific communication there is a need for reliable infrastructures which are capable of managing the digital assets researchers generate. Using the reference of the usecase of video abstracts this paper gives an overview over the activities by the German National Library of Science and Technology (TIB) regarding publishing and linking audiovisual media in a scientifically sound way. The German National Library of Science and Technology (TIB) in cooperation with the Hasso Plattner Institute (HPI) developed a web-based portal (av.tib.eu) that optimises access to scientific videos in the fields of science and technology. Videos from the realms of science and technology can easily be uploaded onto the TIB|AV Portal. Within a short period of time the videos are assigned a digital object identifier (DOI). This enables them to be referenced, cited, and linked (e.g. to the
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/v/ ... the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Full Text Available ... for help. Bittersweet More Videos from Veterans Health Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see more ...
Full Text Available ... out for help. Bittersweet More Videos from Veterans Health Administration Embedded YouTube video: https://www.youtube.com/ ... Behind the Scenes see more videos from Veterans Health Administration Be There: Help Save a Life see ...
Iersel, M. van; Veerman, H.E.T.; Mark, W. van der
Once a crime has been perpetrated, forensic traces will only be persevered in the crime scene for a limited time frame. It is therefore necessary to record a crime scene meticulously. Usually, photographs and/or videos are taken at the scene to document it, so that later on one will know the exact
Thurman, James T
.... The book also uses examples of chain of custody and scene administration forms, diagrams and tables, methods of equipment decontamination, explosives residue collection procedures and spread sheets...
Raabe, Ellen A.; D'Anjou, Robert; Pope, Domonique K.; Robbins, Lisa L.
This project combines underwater video with maps and descriptions to illustrate diverse seafloor habitats from Tampa Bay, Florida, to Mobile Bay, Alabama. A swath of seafloor was surveyed with underwater video to 100 meters (m) water depth in 1999 and 2000 as part of the Gulfstream Natural Gas System Survey. The U.S. Geological Survey (USGS) in St. Petersburg, Florida, in cooperation with Eckerd College and the Florida Department of Environmental Protection (FDEP), produced an archive of analog-to-digital underwater movies. Representative clips of seafloor habitats were selected from hundreds of hours of underwater footage. The locations of video clips were mapped to show the distribution of habitat and habitat transitions. The numerous benthic habitats in the northeastern Gulf of Mexico play a vital role in the region's economy, providing essential resources for tourism, natural gas, recreational water sports (fishing, boating, scuba diving), materials, fresh food, energy, a source of sand for beach renourishment, and more. These submerged natural resources are important to the economy but are often invisible to the general public. This product provides a glimpse of the seafloor with sample underwater video, maps, and habitat descriptions. It was developed to depict the range and location of seafloor habitats in the region but is limited by depth and by the survey track. It should not be viewed as comprehensive, but rather as a point of departure for inquiries and appreciation of marine resources and seafloor habitats. Further information is provided in the Resources section.
This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV
This paper explores how movement can be used as a compositional element in installations of multiplex holograms. My holographic images are created from montages of hand-held video and photo-sequences. These spatially dynamic compositions are visually complex but anchored to landmarks and hints of the capturing process - such as the appearance of the photographer's shadow - to establish a sense of connection to the holographic scene. Moving around in front of the hologram, the viewer animates the holographic scene. A perception of motion then results from the viewer's bodily awareness of physical motion and the visual reading of dynamics within the scene or movement of perspective through a virtual suggestion of space. By linking and transforming the physical motion of the viewer with the visual animation, the viewer's bodily awareness - including proprioception, balance and orientation - play into the holographic composition. How multiplex holography can be a tool for exploring coupled, cross-referenced and transformed perceptions of movement is demonstrated with a number of holographic image installations. Through this process I expanded my creative composition practice to consider how dynamic and spatial scenes can be conveyed through the fragmented view of a multiplex hologram. This body of work was developed through an installation art practice and was the basis of my recently completed doctoral thesis: 'The Emergent Holographic Scene — compositions of movement and affect using multiplex holographic images'.
Ball, Felix; Elzemann, Anne; Busch, Niko A
The change blindness paradigm, in which participants often fail to notice substantial changes in a scene, is a popular tool for studying scene perception, visual memory, and the link between awareness and attention. Some of the most striking and popular examples of change blindness have been demonstrated with digital photographs of natural scenes; in most studies, however, much simpler displays, such as abstract stimuli or "free-floating" objects, are typically used. Although simple displays have undeniable advantages, natural scenes remain a very useful and attractive stimulus for change blindness research. To assist researchers interested in using natural-scene stimuli in change blindness experiments, we provide here a step-by-step tutorial on how to produce changes in natural-scene images with a freely available image-processing tool (GIMP). We explain how changes in a scene can be made by deleting objects or relocating them within the scene or by changing the color of an object, in just a few simple steps. We also explain how the physical properties of such changes can be analyzed using GIMP and MATLAB (a high-level scientific programming tool). Finally, we present an experiment confirming that scenes manipulated according to our guidelines are effective in inducing change blindness and demonstrating the relationship between change blindness and the physical properties of the change and inter-individual differences in performance measures. We expect that this tutorial will be useful for researchers interested in studying the mechanisms of change blindness, attention, or visual memory using natural scenes.
Omodei, M M; McLennan, J
Head-mounted video recording is described as a potentially powerful method for studying decision making in natural settings. Most alternative data-collection procedures are intrusive and disruptive of the decision-making processes involved while conventional video-recording procedures are either impractical or impossible. As a severe test of the robustness of the methodology we studied the decision making of 6 experienced orienteers who carried a head-mounted light-weight video camera as they navigated, running as fast as possible, around a set of control points in a forest. Use of the Wilcoxon matched-pairs signed-ranks test indicated that compared with free recall, video-assisted recall evoked (a) significantly greater experiential immersion in the recall, (b) significantly more specific recollections of navigation-related thoughts and feelings, (c) significantly more realizations of map and terrain features and aspects of running speed which were not noticed at the time of actual competition, and (d) significantly greater insight into specific navigational errors and the intrusion of distracting thoughts into the decision-making process. Potential applications of the technique in (a) the environments of emergency services, (b) therapeutic contexts, (c) education and training, and (d) sports psychology are discussed.
Thida, Myo; Monekosso, Dorothy
Video context analysis is an active and vibrant research area, which provides means for extracting, analyzing and understanding behavior of a single target and multiple targets. Over the last few decades, computer vision researchers have been working to improve the accuracy and robustness of algorithms to analyse the context of a video automatically. In general, the research work in this area can be categorized into three major topics: 1) counting number of people in the scene 2) tracking individuals in a crowd and 3) understanding behavior of a single target or multiple targets in the scene.
Lord, Eric; Shand, David J.; Cantle, Allan J.
This paper describes the techniques which have been developed for an infra-red (IR) target, countermeasure and background image generation system working in real time for HWIL and Trial Proving applications. Operation is in the 3 to 5 and 8 to 14 micron bands. The system may be used to drive a scene projector (otherwise known as a thermal picture synthesizer) or for direct injection into equipment under test. The provision of realistic IR target and countermeasure trajectories and signatures, within representative backgrounds, enables the full performance envelope of a missile system to be evaluated. It also enables an operational weapon system to be proven in a trials environment without compromising safety. The most significant technique developed has been that of line by line synthesis. This minimizes the processing delays to the equivalent of 1.5 frames from input of target and sightline positions to the completion of an output image scan. Using this technique a scene generator has been produced for full closed loop HWIL performance analysis for the development of an air to air missile system. Performance of the synthesis system is as follows: 256 * 256 pixels per frame; 350 target polygons per frame; 100 Hz frame rate; and Gouraud shading, simple reflections, variable geometry targets and atmospheric scaling. A system using a similar technique has also bee used for direct insertion into the video path of a ground to air weapon system in live firing trials. This has provided realistic targets without degrading the closed loop performance. Delay of the modified video signal has been kept to less than 5 lines. The technique has been developed using a combination of 4 high speed Intel i860 RISC processors in parallel with the 4000 series XILINX field programmable gate arrays (FPGA). Start and end conditions for each line of target pixels are prepared and ordered in the I860. The merging with background pixels and output shading and scaling is then carried out in
Logical units are semantic video segments above the shot level. Depending on the common semantics within the unit and data domain, different types of logical unit extraction algorithms have been presented in literature. Topic units are typically extracted for documentaries or news broadcasts while scenes are extracted for narrative-driven video such as feature films, sitcoms, or cartoons. Other types of logical units are extracted from home video and sports. Different algorithms in literature used for the extraction of logical units are reviewed in this paper based on the categories unit type, data domain, features used, segmentation method, and thresholds applied. A detailed comparative study is presented for the case of extracting scenes from narrative-driven video. While earlier comparative studies focused on scene segmentation methods only or on complete news-story segmentation algorithms, in this paper various visual features and segmentation methods with their thresholding mechanisms and their combination into complete scene detection algorithms are investigated. The performance of the resulting large set of algorithms is then evaluated on a set of video files including feature films, sitcoms, children's shows, a detective story, and cartoons.
Li, Haifeng; Peng, Jian; Tao, Chao; Chen, Jie; Deng, Min
Recently, deep convolutional neural network (DCNN) achieved increasingly remarkable success and rapidly developed in the field of natural image recognition. Compared with the natural image, the scale of remote sensing image is larger and the scene and the object it represents are more macroscopic. This study inquires whether remote sensing scene and natural scene recognitions differ and raises the following questions: What are the key factors in remote sensing scene recognition? Is the DCNN r...
Conci, Markus; Müller, Hermann J
Change in the visual scene often goes unnoticed - a phenomenon referred to as "change blindness." This study examined whether the hierarchical structure, i.e., the global-local layout of a scene can influence performance in a one-shot change detection paradigm. To this end, natural scenes of a laid breakfast table were presented, and observers were asked to locate the onset of a new local object. Importantly, the global structure of the scene was manipulated by varying the relations among objects in the scene layouts. The very same items were either presented as global-congruent (typical) layouts or as global-incongruent (random) arrangements. Change blindness was less severe for congruent than for incongruent displays, and this congruency benefit increased with the duration of the experiment. These findings show that global layouts are learned, supporting detection of local changes with enhanced efficiency. However, performance was not affected by scene congruency in a subsequent control experiment that required observers to localize a static discontinuity (i.e., an object that was missing from the repeated layouts). Our results thus show that learning of the global layout is particularly linked to the local objects. Taken together, our results reveal an effect of "global precedence" in natural scenes. We suggest that relational properties within the hierarchy of a natural scene are governed, in particular, by global image analysis, reducing change blindness for local objects through scene learning.
Full Text Available Change in the visual scene often goes unnoticed – a phenomenon referred to as ‘change blindness’. This study examined whether the hierarchical structure, i.e., the global-local layout of a scene can influence performance in a one-shot change detection paradigm. To this end, natural scenes of a laid breakfast table were presented, and observers were asked to locate the onset of a new local object. Importantly, the global structure of the scene was manipulated by varying the relations among objects in the scene layouts. The very same items were either presented as global-congruent (typical layouts or as global-incongruent (random arrangements. Change blindness was less severe for congruent than for incongruent displays, and this congruency benefit increased with the duration of the experiment. These findings show that global layouts are learned, supporting detection of local changes with enhanced efficiency. However, performance was not affected by scene congruency in a subsequent control experiment that required observers to localize a static discontinuity (i.e., an object that was missing from the repeated layouts. Our results thus show that learning of the global layout is particularly linked to the local objects. Taken together, our results reveal an effect of global precedence in natural scenes. We suggest that relational properties within the hierarchy of a natural scene are governed, in particular, by global image analysis, reducing change blindness for local objects through scene learning.
Full Text Available ... in crisis, find a facility near you. Spread the Word Download logos, Web ads, and materials and ... Administration Watch additional videos about getting help. Behind the Scenes see more videos from Veterans Health Administration ...
Theeuwes, J. & Hagenzieker, M.P.
The present study investigates top-down governed visual selection in natural traffic scenes. The subjects had to search for a target object (for example, a traffic sign, or other road users) which was embedded in a natural traffic scene. Given a particular prototypical scene, the target was located
van Gemert, J.C.; Geusebroek, J.M.; Veenman, C.J.; Snoek, C.G.M.; Smeulders, A.W.M.
We present a generic and robust approach for scene categorization. A complex scene is described by proto-concepts like vegetation, water, fire, sky etc. These proto-concepts are represented by low level features, where we use natural images statistics to compactly represent color invariant texture
Bartelsen, J.; Saur, G.; Teutsch, C.
During the last ten years, the availability of images acquired from unmanned aerial vehicles (UAVs) has been continuously increasing due to the improvements and economic success of flight and sensor systems. From our point of view, reliable and automatic image-based change detection may contribute to overcoming several challenging problems in military reconnaissance, civil security, and disaster management. Changes within a scene can be caused by functional activities, i.e., footprints or skid marks, excavations, or humidity penetration; these might be recognizable in aerial images, but are almost overlooked when change detection is executed manually. With respect to the circumstances, these kinds of changes may be an indication of sabotage, terroristic activity, or threatening natural disasters. Although image-based change detection is possible from both ground and aerial perspectives, in this paper we primarily address the latter. We have applied an extended approach to change detection as described by Saur and Kr uger,1 and Saur et al.2 and have built upon the ideas of Saur and Bartelsen.3 The commercial simulation environment Virtual Battle Space 3 (VBS3) is used to simulate aerial "before" and "after" image acquisition concerning flight path, weather conditions and objects within the scene and to obtain synthetic videos. Video frames, which depict the same part of the scene, including "before" and "after" changes and not necessarily from the same perspective, are registered pixel-wise against each other by a photogrammetric concept, which is based on a homography. The pixel-wise registration is used to apply an automatic difference analysis, which, to a limited extent, is able to suppress typical errors caused by imprecise frame registration, sensor noise, vegetation and especially parallax effects. The primary concern of this paper is to seriously evaluate the possibilities and limitations of our current approach for image-based change detection with respect
Wei, Pengxu; Qin, Fei; Wan, Fang; Zhu, Yi; Jiao, Jianbin; Ye, Qixiang
Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.
Emerging video applications are being developed where multiple views of a scene are captured. Two central issues in the deployment of future multiview video (MVV) systems are compression efficiency and interactive video experience, which makes it necessary to develop advanced technologies on multiview video coding (MVC) and interactive multiview video streaming (IMVS). The former aims at efficient compression of all MVV data in a ratedistortion (RD) optimal manner by exploiting both temporal ...
Cofield, Jay L.
This study investigated whether or not low-bandwidth streaming video could be useful for affective purposes. A group of 30 students in a cinema course at a public, liberal arts university viewed a 10-minute dramatic video scene by either videotape or low-bandwidth streaming video. They also took a survey to determine their affective responses and…
Israël, Menno; van den Broek, Egon; van der Putten, Peter; den Uyl, Marten J.; Petrushin, Valery A.; Khan, Latifur
The work presented here introduces a real-time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification
Oklahoma State Dept. of Vocational and Technical Education, Stillwater. Curriculum and Instructional Materials Center.
This instructor's guide contains the materials required to teach four competency-based course units of instruction in installing compressed natural gas (CNG) systems in motor vehicles. It is designed to accompany an instructional videotape (not included) on CNG installation. The following competencies are covered in the four instructional units:…
Hurri, Jarmo; Hyvärinen, Aapo
Recently, statistical models of natural images have shown the emergence of several properties of the visual cortex. Most models have considered the nongaussian properties of static image patches, leading to sparse coding or independent component analysis. Here we consider the basic time dependencies of image sequences instead of their nongaussianity. We show that simple-cell-type receptive fields emerge when temporal response strength correlation is maximized for natural image sequences. Thus, temporal response strength correlation, which is a nonlinear measure of temporal coherence, provides an alternative to sparseness in modeling simple-cell receptive field properties. Our results also suggest an interpretation of simple cells in terms of invariant coding principles, which have previously been used to explain complex-cell receptive fields.
Full Text Available Nowadays, editing technology has entered the digital age. Technology will demonstrate the evidence of processing analog to digital data has become simpler since editing technology has been integrated in the society in all aspects. Understanding the technique of processing analog to digital data is important in producing a video. To utilize this technology, the introduction of equipments is fundamental to understand the features. The next phase is the capturing process that supports the preparation in editing process from scene to scene; therefore, it will become a watchable video.
McCoy, S.W.; Kean, J.W.; Coe, J.A.; Staley, D.M.; Wasklewicz, T.A.; Tucker, G.E.
Many theoretical and laboratory studies have been undertaken to understand debris-flow processes and their associated hazards. However, complete and quantitative data sets from natural debris flows needed for confirmation of these results are limited. We used a novel combination of in situ measurements of debris-flow dynamics, video imagery, and pre- and postflow 2-cm-resolution digital terrain models to study a natural debris-flow event. Our field data constrain the initial and final reach morphology and key flow dynamics. The observed event consisted of multiple surges, each with clear variation of flow properties along the length of the surge. Steep, highly resistant, surge fronts of coarse-grained material without measurable pore-fluid pressure were pushed along by relatively fine-grained and water-rich tails that had a wide range of pore-fluid pressures (some two times greater than hydrostatic). Surges with larger nonequilibrium pore-fluid pressures had longer travel distances. A wide range of travel distances from different surges of similar size indicates that dynamic flow properties are of equal or greater importance than channel properties in determining where a particular surge will stop. Progressive vertical accretion of multiple surges generated the total thickness of mapped debris-flow deposits; nevertheless, deposits had massive, vertically unstratified sedimentological textures. ?? 2010 Geological Society of America.
The Anthropocene concept encapsulates the planetary-scale changes resulting from accelerating socio-ecological transformations, beyond the stratigraphic definition actually in debate. The emergence of multi-scale and proteiform complexity requires inter-discipline and system approaches. Yet, to reduce the cognitive challenge of tackling this complexity, the global Anthropocene syndrome must now be studied from various topical points of view, and grounded at regional and local levels. A system approach should allow to identify AnthropoScenes, i.e. settings where a socio-ecological transformation subsystem is clearly coherent within boundaries and displays explicit relationships with neighbouring/remote scenes and within a nesting architecture. Hydrology is a key topical point of view to be explored, as it is important in many aspects of the Anthropocene, either with water itself being a resource, hazard or transport force; or through the network, connectivity, interface, teleconnection, emergence and scaling issues it determines. We will schematically exemplify these aspects with three contrasted hydrological AnthropoScenes in Tunisia, France and Iceland; and reframe therein concepts of the hydrological change debate. Bai X., van der Leeuw S., O'Brien K., Berkhout F., Biermann F., Brondizio E., Cudennec C., Dearing J., Duraiappah A., Glaser M., Revkin A., Steffen W., Syvitski J., 2016. Plausible and desirable futures in the Anthropocene: A new research agenda. Global Environmental Change, in press, http://dx.doi.org/10.1016/j.gloenvcha.2015.09.017 Brondizio E., O'Brien K., Bai X., Biermann F., Steffen W., Berkhout F., Cudennec C., Lemos M.C., Wolfe A., Palma-Oliveira J., Chen A. C-T. Re-conceptualizing the Anthropocene: A call for collaboration. Global Environmental Change, in review. Montanari A., Young G., Savenije H., Hughes D., Wagener T., Ren L., Koutsoyiannis D., Cudennec C., Grimaldi S., Blöschl G., Sivapalan M., Beven K., Gupta H., Arheimer B., Huang Y
Tang, Rui; Wang, Yuhan; Cosker, Darren; Li, Wenbin
In this paper, we present an automatic system for the analysis and labeling of structural scenes, floor plan drawings in Computer-aided Design (CAD) format. The proposed system applies a fusion strategy to detect and recognize various components of CAD floor plans, such as walls, doors, windows and other ambiguous assets. Technically, a general rule-based filter parsing method is fist adopted to extract effective information from the original floor plan. Then, an image-processing based recovery method is employed to correct information extracted in the first step. Our proposed method is fully automatic and real-time. Such analysis system provides high accuracy and is also evaluated on a public website that, on average, archives more than ten thousands effective uses per day and reaches a relatively high satisfaction rate.
Yang, Chen; Liu, Hung-Ping; Chu, Yen; Liu, Yun-Hen; Wu, Ching-Yang; Ko, Po-Jen; Liu, Hui-Ping
This study aimed to determine the feasibility of a novel transtracheal endoscopic technique for thoracic and mediastinum evaluation in a canine model. In two dogs under general anesthesia, a transverse incision was made in the right lateral wall of the lower trachea and used as an entrance for thoracic and mediastinum evaluation. Transtracheal thoracoscopic evaluation was possible in both animals. One animal experienced massive subcutaneous emphysema immediately after evaluation of the thoracic cavity and required chest tube drainage. The follow-up endoscopies 2 weeks after surgery showed good healing of the tracheal openings in both animals. The transtracheal approach to the thoracic cavity and mediastinum appears to be feasible. This technique may provide an intriguing platform for the development of natural orifice transluminal surgery (NOTES) in the thoracic cavity.
Otrel-Cass, Kathrin; Khalid, Md. Saifuddin
With an interest in learning that is set in collaborative situations, the data session presents excerpts from video data produced by two of fifteen students from a class of 5th semester techno-anthropology course. Students used video cameras to capture the time they spent working with a scientist...... video, nature of the interactional space, and material and spatial semiotics....
Lande, R G
Some researchers and theorists are convinced that graphic scenes of violence on television and in movies are inextricably linked to human aggression. Others insist that a link has not been conclusively established. This paper summarizes scientific studies that have informed these two perspectives. Although many instances of children and adults imitating video violence have been documented, no court has imposed liability for harm allegedly resulting from a video program, an indication that considerable doubt still exists about the role of video violence in stimulating human aggression. The author suggests that a small group of vulnerable viewers are probably more impressionable and therefore more likely to suffer deleterious effects from violent programming. He proposes that research on video violence be narrowed to identifying and describing the vulnerable viewer.
Through a licensing agreement, Intergraph Government Solutions adapted a technology originally developed at NASA's Marshall Space Flight Center for enhanced video imaging by developing its Video Analyst(TM) System. Marshall's scientists developed the Video Image Stabilization and Registration (VISAR) technology to help FBI agents analyze video footage of the deadly 1996 Olympic Summer Games bombing in Atlanta, Georgia. VISAR technology enhanced nighttime videotapes made with hand-held camcorders, revealing important details about the explosion. Intergraph's Video Analyst System is a simple, effective, and affordable tool for video enhancement and analysis. The benefits associated with the Video Analyst System include support of full-resolution digital video, frame-by-frame analysis, and the ability to store analog video in digital format. Up to 12 hours of digital video can be stored and maintained for reliable footage analysis. The system also includes state-of-the-art features such as stabilization, image enhancement, and convolution to help improve the visibility of subjects in the video without altering underlying footage. Adaptable to many uses, Intergraph#s Video Analyst System meets the stringent demands of the law enforcement industry in the areas of surveillance, crime scene footage, sting operations, and dash-mounted video cameras.
Jensen, Lars Baunegaard With; Baseski, Emre; Pugeault, Nicolas
In this paper, we propose a hierarchical architecture for representing scenes, covering 2D and 3D aspects of visual scenes as well as the semantic relations between the different aspects. We argue that labeled graphs are a suitable representational framework for this representation and demonstrat...
example, 3D movies . The change of demand results in an attention for smooth visual quality of the reconstructed scene. In this case, visual quality of the...Vergauwen, and L. Van Gool, “Automated reconstruction of 3D scenes from sequences of images,” ISPRS Journal Of Photogrammetry And Remote Sensing, vol. 55
Full Text Available Recently, computer understanding pictures and stories becomes one of the most important research topics in computer science. However, there are few researches about human like understanding by computers because pictures have not certain format and contain more lyric aspect than that of natural laguage. For picture understanding, a comic is the suitable target because it is consisted by clear and simple plot of stories and separated scenes.In this paper, we propose 2 different types of picture models for 2-scene comics creating system. We also show the method of the application of 2-scene comics creating system by means of proposed picture model.
Full Text Available Humans can quickly and accurately recognize objects within briefly presented natural scenes. Previous work has provided evidence that scene context contributes to this process, demonstrating improved naming of objects that were presented in semantically consistent scenes (e.g., a sandcastle on a beach relative to semantically inconsistent scenes (e.g., a sandcastle on a football field. The current study was aimed at investigating which processes underlie the scene consistency effect. Specifically, we tested: 1 whether the effect is due to increased visual feature and/or shape overlap for consistent relative to inconsistent scene-object pairs; and 2 whether the effect is mediated by attention to the background scene. Experiment 1 replicated the scene consistency effect of a previous report (Davenport & Potter, 2004. Using a new, carefully controlled stimulus set, Experiment 2 showed that the scene consistency effect could not be explained by low-level feature or shape overlap between scenes and target objects. Experiments 3a and 3b investigated whether focused attention modulates the scene consistency effect. By using a location cueing manipulation, participants were correctly informed about the location of the target object on a proportion of trials, allowing focused attention to be deployed towards the target object. Importantly, the effect of scene consistency on target object recognition was independent of spatial attention, and was observed both when attention was focused on the target object and when attention was focused on the background scene. These results indicate that a semantically consistent scene context benefits object recognition independently of the focus of attention. We suggest that the scene consistency effect is primarily driven by global scene properties, or scene gist, that can be processed with minimal attentional resources.
Zou, Li-hui; Zhang, Dezheng; Wulamu, Aziguli
Dynamic scene stitching still has a great challenge in maintaining the global key information without missing or deforming if multiple motion interferences exist in the image acquisition system. Object clips, motion blurs, or other synthetic defects easily occur in the final stitching image. In our research work, we proceed from human visual cognitive mechanism and construct a hybrid-saliency-based cognitive model to automatically guide the video volume stitching. The model consists of three elements of different visual stimuli, that is, intensity, edge contour, and scene depth saliencies. Combined with the manifold-based mosaicing framework, dynamic scene stitching is formulated as a cut path optimization problem in a constructed space-time graph. The cutting energy function for column width selections is defined according to the proposed visual cognition model. The optimum cut path can minimize the cognitive saliency difference throughout the whole video volume. The experimental results show that it can effectively avoid synthetic defects caused by different motion interferences and summarize the key contents of the scene without loss. The proposed method gives full play to the role of human visual cognitive mechanism for the stitching. It is of high practical value to environmental surveillance and other applications.
Full Text Available Dynamic scene stitching still has a great challenge in maintaining the global key information without missing or deforming if multiple motion interferences exist in the image acquisition system. Object clips, motion blurs, or other synthetic defects easily occur in the final stitching image. In our research work, we proceed from human visual cognitive mechanism and construct a hybrid-saliency-based cognitive model to automatically guide the video volume stitching. The model consists of three elements of different visual stimuli, that is, intensity, edge contour, and scene depth saliencies. Combined with the manifold-based mosaicing framework, dynamic scene stitching is formulated as a cut path optimization problem in a constructed space-time graph. The cutting energy function for column width selections is defined according to the proposed visual cognition model. The optimum cut path can minimize the cognitive saliency difference throughout the whole video volume. The experimental results show that it can effectively avoid synthetic defects caused by different motion interferences and summarize the key contents of the scene without loss. The proposed method gives full play to the role of human visual cognitive mechanism for the stitching. It is of high practical value to environmental surveillance and other applications.
Gao, Jiang; Yang, Jie; Zhang, Ying; Waibel, Alex
.... The paper addresses challenges in automatic sign extraction and translation, describes methods for automatic sign extraction, and extends example-based machine translation technology for sign translation...
Tenenbaum, J. M.; Barrow, H. G.; Weyl, S. A.
Cooperative (man-machine) scene analysis techniques were developed whereby humans can provide a computer with guidance when completely automated processing is infeasible. An interactive approach promises significant near-term payoffs in analyzing various types of high volume satellite imagery, as well as vehicle-based imagery used in robot planetary exploration. This report summarizes the work accomplished over the duration of the project and describes in detail three major accomplishments: (1) the interactive design of texture classifiers; (2) a new approach for integrating the segmentation and interpretation phases of scene analysis; and (3) the application of interactive scene analysis techniques to cartography.
Spampinato, Concetto; Palazzo, Simone; Giordano, Daniela
Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare the performance of automated methods with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging and large-scale tasks. In particular, our method relies on a game with a purpose to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided location priors. Performance analysis carried out on complex video benchmarks, and exploiting data provided by over 60 users, demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches.
LITTLE,CHARLES Q.; PETERS,RALPH R.; RIGDON,J. BRIAN; SMALL,DANIEL E.
Traditionally law enforcement agencies have relied on basic measurement and imaging tools, such as tape measures and cameras, in recording a crime scene. A disadvantage of these methods is that they are slow and cumbersome. The development of a portable system that can rapidly record a crime scene with current camera imaging, 3D geometric surface maps, and contribute quantitative measurements such as accurate relative positioning of crime scene objects, would be an asset to law enforcement agents in collecting and recording significant forensic data. The purpose of this project is to develop a feasible prototype of a fast, accurate, 3D measurement and imaging system that would support law enforcement agents to quickly document and accurately record a crime scene.
Do individuals from different cultures perceive scenes differently? Does culture have an influence on visual attention processes? This thesis investigates not only what these influences are, and how they affect eye movements, but also examines some of the proposed mechanisms that underlie the cultural influence in scene perception. Experiments 1 & 2 showed that Saudi participants directed a higher number of fixations to the background of images, in comparison to the British participants. Brit...
Alexander P N Van der Jagt
Full Text Available Attention Restoration Theory (ART states that built scenes place greater load on attentional resources than natural scenes. This is explained in terms of "hard" and "soft" fascination of built and natural scenes. Given a lack of direct empirical evidence for this assumption we propose that perceptual saliency of scene content can function as an empirically derived indicator of fascination. Saliency levels were established by measuring speed of scene category detection using a Go/No-Go detection paradigm. Experiment 1 shows that built scenes are more salient than natural scenes. Experiment 2 replicates these findings using greyscale images, ruling out a colour-based response strategy, and additionally shows that built objects in natural scenes affect saliency to a greater extent than the reverse. Experiment 3 demonstrates that the saliency of scene content is directly linked to cognitive restoration using an established restoration paradigm. Overall, these findings demonstrate an important link between the saliency of scene content and related cognitive restoration.
Chen, Chen; Kuo, C -C Jay
This book offers an overview of traditional big visual data analysis approaches and provides state-of-the-art solutions for several scene comprehension problems, indoor/outdoor classification, outdoor scene classification, and outdoor scene layout estimation. It is illustrated with numerous natural and synthetic color images, and extensive statistical analysis is provided to help readers visualize big visual data distribution and the associated problems. Although there has been some research on big visual data analysis, little work has been published on big image data distribution analysis using the modern statistical approach described in this book. By presenting a complete methodology on big visual data analysis with three illustrative scene comprehension problems, it provides a generic framework that can be applied to other big visual data analysis tasks.
Wakefield, Corey B.; Lewis, Paul D.; Coutts, Teresa B.; Fairclough, David V.; Langlois, Timothy J.
Marine embayments and estuaries play an important role in the ecology and life history of many fish species. Cockburn Sound is one of a relative paucity of marine embayments on the west coast of Australia. Its sheltered waters and close proximity to a capital city have resulted in anthropogenic intrusion and extensive seascape modification. This study aimed to compare the sampling efficiencies of baited videos and fish traps in determining the relative abundance and diversity of temperate demersal fish species associated with naturally occurring (seagrass, limestone outcrops and soft sediment) and modified (rockwall and dredge channel) habitats in Cockburn Sound. Baited videos sampled a greater range of species in higher total and mean abundances than fish traps. This larger amount of data collected by baited videos allowed for greater discrimination of fish assemblages between habitats. The markedly higher diversity and abundances of fish associated with seagrass and limestone outcrops, and the fact that these habitats are very limited within Cockburn Sound, suggests they play an important role in the fish ecology of this embayment. Fish assemblages associated with modified habitats comprised a subset of species in lower abundances when compared to natural habitats with similar physical characteristics. This suggests modified habitats may not have provided the necessary resource requirements (e.g. shelter and/or diet) for some species, resulting in alterations to the natural trophic structure and interspecific interactions. Baited videos provided a more efficient and non-extractive method for comparing fish assemblages and habitat associations of smaller bodied species and juveniles in a turbid environment. PMID:23555847
Abdolhosseini Moghadam, Abdolreza; Kumar, Mrityunjay; Radha, Hayder
Efficient video representation models are critical for many video analysis and processing tasks. In this paper, we present a framework based on the concept of finding the sparsest solution to model video frames. To model the spatio-temporal information, frames from one scene are decomposed into two components: (i) a common frame, which describes the visual information common to all the frames in the scene/segment, and (ii) a set of innovative frames, which depicts the dynamic behaviour of the scene. The proposed approach exploits and builds on recent results in the field of compressed sensing to jointly estimate the common frame and the innovative frames for each video segment. We refer to the proposed modeling framework by CIV (Common and Innovative Visuals). We show how the proposed model can be utilized to find scene change boundaries and extend CIV to videos from multiple scenes. Furthermore, the proposed model is robust to noise and can be used for various video processing applications without relying on motion estimation and detection or image segmentation. Results for object tracking, video editing (object removal, inpainting) and scene change detection are presented to demonstrate the efficiency and the performance of the proposed model.
Full Text Available Surveillance videos contain a considerable amount of data, wherein interesting information to the user is sparsely distributed. Researchers construct video synopsis that contain key information extracted from a surveillance video for efficient browsing and analysis. Geospatial–temporal information of a surveillance video plays an important role in the efficient description of video content. Meanwhile, current approaches of video synopsis lack the introduction and analysis of geospatial-temporal information. Owing to the preceding problems mentioned, this paper proposes an approach called “surveillance video synopsis in GIS”. Based on an integration model of video moving objects and GIS, the virtual visual field and the expression model of the moving object are constructed by spatially locating and clustering the trajectory of the moving object. The subgraphs of the moving object are reconstructed frame by frame in a virtual scene. Results show that the approach described in this paper comprehensively analyzed and created fusion expression patterns between video dynamic information and geospatial–temporal information in GIS and reduced the playback time of video content.
Schyma, C; Schyma, P
The authors report in part 1 about their experiences with the Canon Ex1 Hi camcorder and the possibilities of documentation with the modern video technique. Application examples in legal medicine and criminalistics are described autopsy, scene, reconstruction of crimes etc. The online video documentation of microscopic sessions makes the discussion of findings easier. The use of video films for instruction produces a good resonance. The use of the video documentation can be extended by digitizing (Part 2). Two frame grabbers are presented, with which we obtained good results in digitizing of images captured from video. The best quality of images is achieved by online use of an image analysis chain. Corel 5.0 and PicEd Cora 4.0 allow complete image processings and analysis. The digital image processing influences the objectivity of the documentation. The applicabilities of image libraries are discussed.
, physical damage: they are all readable and interpretable signs. As augmented reality the crime scene carries a narrative which at first is hidden and must be revealed. Due to the process of investigation and the detective's ability to reason and deduce, the crime scene as place is reconstructed as virtual......Using the concept of augmented reality, this article will investigate how places in various ways have become augmented by means of different mediatization strategies. Augmentation of reality implies an enhancement of the places' emotional character: a certain mood, atmosphere or narrative surplus...... of meaning has been implemented. This may take place at different levels, which will be presented and investigated in this article and exemplified by some cases from the fields of tourism and computer games. The article suggests that we may use the forensic term crime scene in order...
Pinsky, Ephi; Siman-tov, Avihay; Peles, David
A novel multispectral video system that continuously optimizes both its spectral range channels and the exposure time of each channel autonomously, under dynamic scenes, varying from short range-clear scene to long range-poor visibility, is currently being developed. Transparency and contrast of high scattering medium of channels with spectral ranges in the near infrared is superior to the visible channels, particularly to the blue range. Longer wavelength spectral ranges that induce higher contrast are therefore favored. Images of 3 spectral channels are fused and displayed for (pseudo) color visualization, as an integrated high contrast video stream. In addition to the dynamic optimization of the spectral channels, optimal real-time exposure time is adjusted simultaneously and autonomously for each channel. A criterion of maximum average signal, derived dynamically from previous frames of the video stream is used (Patent Application - International Publication Number: WO2009/093110 A2, 30.07.2009). This configuration enables dynamic compatibility with the optimal exposure time of a dynamically changing scene. It also maximizes the signal to noise ratio and compensates each channel for the specified value of daylight reflections and sensors response for each spectral range. A possible implementation is a color video camera based on 4 synchronized, highly responsive, CCD imaging detectors, attached to a 4CCD dichroic prism and combined with a common, color corrected, lens. Principal Components Analysis (PCA) technique is then applied for real time "dimensional collapse" in color space, in order to select and fuse, for clear color visualization, the 3 most significant principal channels out of at least 4 characterized by high contrast and rich details in the image data.
Coutrot, Antoine; Guyader, Nathalie
Conversation scenes are a typical example in which classical models of visual attention dramatically fail to predict eye positions. Indeed, these models rarely consider faces as particular gaze attractors and never take into account the important auditory information that always accompanies dynamic social scenes. We recorded the eye movements of participants viewing dynamic conversations taking place in various contexts. Conversations were seen either with their original soundtracks or with unrelated soundtracks (unrelated speech and abrupt or continuous natural sounds). First, we analyze how auditory conditions influence the eye movement parameters of participants. Then, we model the probability distribution of eye positions across each video frame with a statistical method (Expectation-Maximization), allowing the relative contribution of different visual features such as static low-level visual saliency (based on luminance contrast), dynamic low level visual saliency (based on motion amplitude), faces, and center bias to be quantified. Through experimental and modeling results, we show that regardless of the auditory condition, participants look more at faces, and especially at talking faces. Hearing the original soundtrack makes participants follow the speech turn-taking more closely. However, we do not find any difference between the different types of unrelated soundtracks. These eyetracking results are confirmed by our model that shows that faces, and particularly talking faces, are the features that best explain the gazes recorded, especially in the original soundtrack condition. Low-level saliency is not a relevant feature to explain eye positions made on social scenes, even dynamic ones. Finally, we propose groundwork for an audiovisual saliency model. © 2014 ARVO.
Ji, Xiangyang; Miao, Changlong; Zhang, Yongbing; Lin, Xing; Dai, Qionghai
Separating reflective and fluorescent components by hyperspectral (HS) imaging is significant in many applications. This paper designs an imaging system, where both HS reflective images and HS fluorescent images could be obtained from the same scene, even scenes within moving objects. The system consists of a high-frequency-spectra light source and a spatially-spectrally encoded camera. During the capture phase, the light source illuminates the scene with two high-frequency lighting patterns complemented in the spectral domain by turns, then encoded camera captures a image pair accordingly. During the reconstruction phase, sparsity of the natural reflective and fluorescent HS images is utilized to recover reflective and fluorescent spectra from encoded image pair. Benefited from double-shot imaging system, dynamic scene could also be handled. The method is tested in various datasets(including synthetic and real data), and the results demonstrate that the system could achieve high-resolution hyperspectral reflectance and fluorescence recovery with high-accuracy for dynamic scenes, which can be applied for spectral relighting of real scenes.
Biel, Joan-Isaac; Gatica-Perez, Daniel
While multimedia and social computing research have used crowdsourcing techniques to annotate objects, actions, and scenes in social video sites like YouTube, little work has ad- dressed the crowdsourcing of personal and social traits in online social video or social media content in general. In this paper, we address the problems of (1) crowdsourcing the annotation of first impressions of video bloggers (vloggers) personal and social traits in conversational YouTube videos, and (2) mining th...
Ealet, Fabienne; Collin, Bertrand; Sella, G.; Garbay, Catherine
Scene interpretation is a crucial problem for navigation and guidance systems. The necessary integration of a large variety of heterogeneous knowledge leads us to design an architecture that distributes knowledge and that performs parallel and concurrent processing. We choose a multi- agent approach which specialized agents implementation is based on incrementality, distribution, cooperation, attention mechanism and adaptability.
Motion blur from camera shake is a major problem in videos captured by hand-held devices. Unlike single-image deblurring, video-based approaches can take advantage of the abundant information that exists across neighboring frames. As a result the best performing methods rely on aligning nearby frames. However, aligning images is a computationally expensive and fragile procedure, and methods that aggregate information must therefore be able to identify which regions have been accurately aligned and which have not, a task which requires high level scene understanding. In this work, we introduce a deep learning solution to video deblurring, where a CNN is trained end-to-end to learn how to accumulate information across frames. To train this network, we collected a dataset of real videos recorded with a high framerate camera, which we use to generate synthetic motion blur for supervision. We show that the features learned from this dataset extend to deblurring motion blur that arises due to camera shake in a wide range of videos, and compare the quality of results to a number of other baselines.
Al-Atabany Walid I
Full Text Available Abstract Background In this paper we present a novel scene retargeting technique to reduce the visual scene while maintaining the size of the key features. The algorithm is scalable to implementation onto portable devices, and thus, has potential for augmented reality systems to provide visual support for those with tunnel vision. We therefore test the efficacy of our algorithm on shrinking the visual scene into the remaining field of view for those patients. Methods Simple spatial compression of visual scenes makes objects appear further away. We have therefore developed an algorithm which removes low importance information, maintaining the size of the significant features. Previous approaches in this field have included seam carving, which removes low importance seams from the scene, and shrinkability which dynamically shrinks the scene according to a generated importance map. The former method causes significant artifacts and the latter is inefficient. In this work we have developed a new algorithm, combining the best aspects of both these two previous methods. In particular, our approach is to generate a shrinkability importance map using as seam based approach. We then use it to dynamically shrink the scene in similar fashion to the shrinkability method. Importantly, we have implemented it so that it can be used in real time without prior knowledge of future frames. Results We have evaluated and compared our algorithm to the seam carving and image shrinkability approaches from a content preservation perspective and a compression quality perspective. Also our technique has been evaluated and tested on a trial included 20 participants with simulated tunnel vision. Results show the robustness of our method at reducing scenes up to 50% with minimal distortion. We also demonstrate efficacy in its use for those with simulated tunnel vision of 22 degrees of field of view or less. Conclusions Our approach allows us to perform content aware video
Verma, Brijesh; Stockwell, David
This book highlights the methods and applications for roadside video data analysis, with a particular focus on the use of deep learning to solve roadside video data segmentation and classification problems. It describes system architectures and methodologies that are specifically built upon learning concepts for roadside video data processing, and offers a detailed analysis of the segmentation, feature extraction and classification processes. Lastly, it demonstrates the applications of roadside video data analysis including scene labelling, roadside vegetation classification and vegetation biomass estimation in fire risk assessment.
Karaoglu, S.; Tao, R.; Gevers, T.; Smeulders, A.W.M.
Text in natural images typically adds meaning to an object or scene. In particular, text specifies which business places serve drinks (e.g., cafe, teahouse) or food (e.g., restaurant, pizzeria), and what kind of service is provided (e.g., massage, repair). The mere presence of text, its words, and
Kovar, Bohumil; Hanjalic, Alan
This paper presents a novel approach to detecting and classifying a trademark logo in frames of a sport video. In view of the fact that we attempt to detect and recognize a logo in a natural scene, the algorithm developed in this paper differs from traditional techniques for logo detection and classification that are applicable either to well-structured general text documents (e.g. invoices, memos, bank cheques) or to specialized trademark logo databases, where logos appear isolated on a clear background and where their detection and classification is not disturbed by the surrounding visual detail. Although the development of our algorithm is still in its starting phase, experimental results performed so far on a set of soccer TV broadcasts are very encouraging.
Williams, Glenn L.
When video replaces film the digitized video data accumulates very rapidly, leading to a difficult and costly data storage problem. One solution exists for cases when the video images represent continuously repetitive 'static scenes' containing negligible activity, occasionally interrupted by short events of interest. Minutes or hours of redundant video frames can be ignored, and not stored, until activity begins. A new, highly parallel digital state machine generates a digital trigger signal at the onset of a video event. High capacity random access memory storage coupled with newly available fuzzy logic devices permits the monitoring of a video image stream for long term or short term changes caused by spatial translation, dilation, appearance, disappearance, or color change in a video object. Pretrigger and post-trigger storage techniques are then adaptable for archiving the digital stream from only the significant video images.
Kret, M.E.; Roelofs, K.; Stekelenburg, J.J.; de Gelder, B.
We receive emotional signals from different sources, including the face, the whole body, and the natural scene. Previous research has shown the importance of context provided by the whole body and the scene on the recognition of facial expressions. This study measured physiological responses to
Kreindel, Erica; Intraub, Helene
Behavioral and neuroscience research on boundary extension (false memory beyond the edges of a view of a scene) has provided new insights into the constructive nature of scene representation, and motivates questions about development. Early research with children (as young as 6-7 years) was consistent with boundary extension, but relied on an…
van den Stock, J.B.; Vandenbulcke, Mathieu; Sinke, C.B.A.; Goebel, Rainer; de Gelder, B.
Facial expression perception can be influenced by the natural visual context in which the face is perceived. We performed an fMRI experiment presenting participants with fearful or neutral faces against threatening or neutral background scenes. Triangles and scrambled scenes served as control
Shaffer, Rebecca C.; Pedapati, Ernest V.; Shic, Frederick; Gaietto, Kristina; Bowers, Katherine; Wink, Logan K.; Erickson, Craig A.
In this study, we present an eye-tracking paradigm, adapted from previous work with toddlers, for assessing social-interaction looking preferences in youth ages 5-17 with ASD and typically-developing controls (TDC). Videos of children playing together (Social Scenes, SS) were presented side-by-side with animated geometric shapes (GS). Participants…
Crime scenes are constituted by a combination of a plot and a place. The crime scene is a place which has been in a certain state of transformation at a certain moment in time, the moment at which the place constituted the scene for some kind of criminal activity. As such the place has been encod...
Killius, Jim; Elder, Brent; Siegel, Larry; Allweiss, Michael B.
A Scophony Infrared Scene Projector (IRSP) is being developed for use in evaluating thermal-imaging guidance systems. The Scophony IRSP is configured to be a very high frame rate laser-scanned projection system incorporating Scophony modulation. Scophony modulation offers distinct advantages over conventional flying-spot scanning, for example, longer pixel dwell times and multiple pixel projection. The Scophony IRSP serves as the image projection system in a 'hardware in the loop' therminal-phase guidance simulation. It is capable of projecting multiband, target engagement scenarios with high fidelity using Aura's proprietary software/electronic control system. The Scophony IRSP utilizes acoustooptical (AO) devices to produce the required imagery at four separate wavelengths simultaneously. The four separate scenes are then combined and projected into the imaging guidance system.
Kircher, James R.; Marlow, Steven A.; Bastow, Michael
A scophony infrared scene projector (IRSP) was developed by AURA Systems Inc. for use in evaluating thermal imaging guidance systems. The IRSP is a laser-scanned projector system incorporating scophony modulation with acousto-optical devices to produce multiband 96 X 96 image frames. A description of the system and preliminary test results with the Seeker Endo/Exo Demonstration Development breadboard interceptor are addressed.
Bornoe, Nis; Barkhuus, Louise
Microblogging is a recently popular phenomenon and with the increasing trend for video cameras to be built into mobile phones, a new type of microblogging has entered the arena of electronic communication: video microblogging. In this study we examine video microblogging, which is the broadcasting...... of short videos. A series of semi-structured interviews offers an understanding of why and how video microblogging is used and what the users post and broadcast....
Li, Weixin; Mahadevan, Vijay; Vasconcelos, Nuno
The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynamic textures models. These models are used to implement 1) a center-surround discriminant saliency detector that produces spatial saliency scores, and 2) a model of normal behavior that is learned from training data and produces temporal saliency scores. Spatial and temporal anomaly maps are then defined at multiple spatial scales, by considering the scores of these operators at progressively larger regions of support. The multiscale scores act as potentials of a conditional random field that guarantees global consistency of the anomaly judgments. A data set of densely crowded pedestrian walkways is introduced and used to evaluate the proposed anomaly detector. Experiments on this and other data sets show that the latter achieves state-of-the-art anomaly detection results.
Slater, Dan; Kozacik, Stephen; Kelmelis, Eric
Long range telescopic video imagery of distant terrestrial scenes, aircraft, rockets and other aerospace vehicles can be a powerful observational tool. But what about the associated acoustic activity? A new technology, Remote Acoustic Sensing (RAS), may provide a method to remotely listen to the acoustic activity near these distant objects. Local acoustic activity sometimes weakly modulates the ambient illumination in a way that can be remotely sensed. RAS is a new type of microphone that separates an acoustic transducer into two spatially separated components: 1) a naturally formed in situ acousto-optic modulator (AOM) located within the distant scene and 2) a remote sensing readout device that recovers the distant audio. These two elements are passively coupled over long distances at the speed of light by naturally occurring ambient light energy or other electromagnetic fields. Stereophonic, multichannel and acoustic beam forming are all possible using RAS techniques and when combined with high-definition video imagery it can help to provide a more cinema like immersive viewing experience. A practical implementation of a remote acousto-optic readout device can be a challenging engineering problem. The acoustic influence on the optical signal is generally weak and often with a strong bias term. The optical signal is further degraded by atmospheric seeing turbulence. In this paper, we consider two fundamentally different optical readout approaches: 1) a low pixel count photodiode based RAS photoreceiver and 2) audio extraction directly from a video stream. Most of our RAS experiments to date have used the first method for reasons of performance and simplicity. But there are potential advantages to extracting audio directly from a video stream. These advantages include the straight forward ability to work with multiple AOMs (useful for acoustic beam forming), simpler optical configurations, and a potential ability to use certain preexisting video recordings. However
Fragkiadaki, Katerina; Arbelaez, Pablo; Felsen, Panna; Malik, Jitendra
We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object. In each video frame, we compute segment proposals using multiple figure-ground segmentations on per frame motion boundaries. We rank them with a Moving Objectness Detector trained on image and motion fields to detect moving objects and discard over/under segmentations or background parts of the scene. We extend the top ranked segmen...
Full Text Available Terrain classification allows a mobile robot to create an annotated map of its local environment from the three-dimensional (3D and two-dimensional (2D datasets collected by its array of sensors, including a GPS receiver, gyroscope, video camera, and range sensor. However, parts of objects that are outside the measurement range of the range sensor will not be detected. To overcome this problem, this paper describes an edge estimation method for complete scene recovery and complete terrain reconstruction. Here, the Gibbs-Markov random field is used to segment the ground from 2D videos and 3D point clouds. Further, a masking method is proposed to classify buildings and trees in a terrain mesh.
This international bestseller and essential reference is the "bible" for digital video engineers and programmers worldwide. This is by far the most informative analog and digital video reference available, includes the hottest new trends and cutting-edge developments in the field. Video Demystified, Fourth Edition is a "one stop" reference guide for the various digital video technologies. The fourth edition is completely updated with all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video (Video over DSL, Ethernet, etc.), as well as discussions of the latest standards throughout. The accompanying CD-ROM is updated to include a unique set of video test files in the newest formats. *This essential reference is the "bible" for digital video engineers and programmers worldwide *Contains all new chapters on MPEG-4, H.264, SDTV/HDTV, ATSC/DVB, and Streaming Video *Completely revised with all the latest and most up-to-date industry standards.
Dang, T.K.; Worring, M.; Bui, T.D.
In scene investigation, creating a video log captured using a handheld camera is more convenient and more complete than taking photos and notes. By introducing video analysis and computer vision techniques, it is possible to build a spatio-temporal representation of the investigation. Such a
This book presents a detailed analysis of spectral imaging, describing how it can be used for the purposes of material identification, object recognition and scene understanding. The opportunities and challenges of combining spatial and spectral information are explored in depth, as are a wide range of applications. Features: discusses spectral image acquisition by hyperspectral cameras, and the process of spectral image formation; examines models of surface reflectance, the recovery of photometric invariants, and the estimation of the illuminant power spectrum from spectral imagery; describes
Machuca, R.; Gilbert, A. L.
Edge detection in the presence of noise is a well-known problem. This paper examines an applications-motivated approach for solving the problem using novel techniques and presents a method developed by the authors that performs well on a large class of targets. ROC curves are used to compare this method with other well-known edge detection operators, with favorable results. A theoretical argument is presented that favors LMMSE filtering over median filtering in extremely noisy scenes. Simulated results of the research are presented.
Guimaraes, Miguel; Arcanjo, Marcelo; Murta Vale, Maria Helena; Visacro, Silverio
The development of downward and upward leaders that formed two negative cloud-to-ground return strokes in natural lightning, spaced only about 200 µs apart and terminating on ground only a few hundred meters away, was monitored at Morro do Cachimbo Station, Brazil. The simultaneous records of current, close electric field, relative luminosity, and corresponding high-speed video frames (sampling rate of 20,000 frames per second) reveal that the initiation of the first return stroke interfered in the development of the second negative leader, leading it to an apparent continuous development before the attachment, without stepping, and at a regular two-dimensional speed. Based on the experimental data, the formation processes of the two return strokes are discussed, and plausible interpretations for their development are provided.
Ward, N J; Parkes, A; Crone, P R
This study examined the legibility of information presented on head-up displays (HUDs) for automotive application as a function of background scene complexity, the position of the HUD within field of view relative to the background scene, and the perceptual capacity of the perceiver. Groups of field-dependent and field-independent subjects viewed video footage from the perspective of following a lead car on an open road with low, moderate, and high scene complexity. Subjects were required to track the lead vehicle and identify HUD-presented targets of a specified orientation and specified changes in a HUD-presented speedometer. The results indicate that (a) HUD legibility deteriorated with increased visual complexity of the background scene; (b) positioning the HUD on the roadway reduced the effect of background scene complexity on HUD legibility; and (c) field-dependent subjects made fewer correct and more false positive target identifications than did field-independent subjects.
Kong, Wanzeng; Zhao, Xinxin; Hu, Sanqing; Vecchiato, Giovanni; Babiloni, Fabio
How to evaluate the effect of commercials is significantly important in neuromarketing. In this paper, we proposed an electronic way to evaluate the influence of video commercials on consumers by impression index. The impression index combines both the memorization and attention index during consumers observing video commercials by tracking the EEG activity. It extracts features from scalp EEG to evaluate the effectiveness of video commercials in terms of time-frequency-space domain. And, the general global field power was used as an impression index for evaluation of video commercial scenes as time series. Results of experiment demonstrate that the proposed approach is able to track variations of the cerebral activity related to cognitive task such as observing video commercials, and help to judge whether the scene in video commercials is impressive or not by EEG signals.
Hiatt, J R; Shabot, M M; Phillips, E H; Haines, R F; Grant, T L
To determine the clinical acceptability of various levels of video compression for remote proctoring of laparoscopic surgical procedures. Observational, controlled study. Community-based teaching hospital. Physician and nurse observers. Controlled surgical video scenes were subjected to various levels of data compression for digital transmission and display and shown to participant observers. Clinical acceptability of video scenes after application of video compression. Clinically acceptable video compression was achieved with a 1.25-megabit/second data rate, with the use of odd-screen 43.3:1 Joint Photographic Expert Group compression and a small screen for remote viewing. With proper video compression, remote proctoring of laparoscopic procedures may be performed with standard 1.5-megabit/second telecommunication data lines and services.
Abbey, Craig K.; Sohl-Dickstein, Jascha N.; Olshausen, Bruno A.; Eckstein, Miguel P.; Boone, John M.
Researchers studying human and computer vision have found description and construction of these systems greatly aided by analysis of the statistical properties of naturally occurring scenes. More specifically, it has been found that receptive fields with directional selectivity and bandwidth properties similar to mammalian visual systems are more closely matched to the statistics of natural scenes. It is argued that this allows for sparse representation of the independent components of natural images [Olshausen and Field, Nature, 1996]. These theories have important implications for medical image perception. For example, will a system that is designed to represent the independent components of natural scenes, where objects occlude one another and illumination is typically reflected, be appropriate for X-ray imaging, where features superimpose on one another and illumination is transmissive? In this research we begin to examine these issues by evaluating higher-order statistical properties of breast images from X-ray projection mammography (PM) and dedicated breast computed tomography (bCT). We evaluate kurtosis in responses of octave bandwidth Gabor filters applied to PM and to coronal slices of bCT scans. We find that kurtosis in PM rises and quickly saturates for filter center frequencies with an average value above 0.95. By contrast, kurtosis in bCT peaks near 0.20 cyc/mm with kurtosis of approximately 2. Our findings suggest that the human visual system may be tuned to represent breast tissue more effectively in bCT over a specific range of spatial frequencies.
Verbrugge, R.; Israël, Menno; Taatgen, N.; van den Broek, Egon; van der Putten, Peter; Schomaker, L.; den Uyl, Marten J.
This work has been done as part of the EU VICAR (IST) project and the EU SCOFI project (IAP). The aim of the first project was to develop a real time video indexing classification annotation and retrieval system. For our systems, we have adapted the approach of Picard and Minka , who categorized
Simpson, E L; Gaffan, E A
Dark Agouti rats learned to discriminate large visual displays ("scenes") in a computer-controlled Y-maze. Each scene comprised several shapes ("objects") against a contrasting background. The constant-negative paradigm was used; in each problem, one constant scene was presented on every trial together with a trial-unique variable scene, and rats were rewarded for approaching the variable scene. By varying the manner in which variables differed from the constant, we investigated what aspects of scenes and the objects comprising them were salient. In Experiment 1, rats discriminated constant scenes more easily if they contained four objects rather than six, and they showed a slight attentional bias towards the lower halves of the screens. That bias disappeared in Experiment 2. Experiments 3 and 4 showed that rats could discriminate scenes even if the objects that comprised them were closely matched in position, luminance, and area. Therefore, they encoded the form of individual objects. Rats perceived shapes of the same class (e.g. two ellipses) as more similar than shapes from different classes (e.g. ellipse and polygon) regardless of whether they also differed in area. This paradigm is suitable for studying the neuropsychology of perceiving spatial relationships in multi-object scenes and of identifying visual objects.
Tan, Jye-Sheng; Yeh, Su-Ling
Meanings of masked complex scenes can be extracted without awareness; however, it remains unknown whether audiovisual integration occurs with an invisible complex visual scene. The authors examine whether a scenery soundtrack can facilitate unconscious processing of a subliminal visual scene. The continuous flash suppression paradigm was used to render a complex scene picture invisible, and the picture was paired with a semantically congruent or incongruent scenery soundtrack. Participants were asked to respond as quickly as possible if they detected any part of the scene. Release-from-suppression time was used as an index of unconscious processing of the complex scene, which was shorter in the audiovisual congruent condition than in the incongruent condition (Experiment 1). The possibility that participants adopted different detection criteria for the 2 conditions was excluded (Experiment 2). The audiovisual congruency effect did not occur for objects-only (Experiment 3) and background-only (Experiment 4) pictures, and it did not result from consciously mediated conceptual priming (Experiment 5). The congruency effect was replicated when catch trials without scene pictures were added to exclude participants with high false-alarm rates (Experiment 6). This is the first study demonstrating unconscious audiovisual integration with subliminal scene pictures, and it suggests expansions of scene-perception theories to include unconscious audiovisual integration. (c) 2015 APA, all rights reserved).
Johnson, Don; Johnson, Mike
The process of digital capture, editing, and archiving video has become an important aspect of documenting arthroscopic surgery. Recording the arthroscopic findings before and after surgery is an essential part of the patient's medical record. The hardware and software has become more reasonable to purchase, but the learning curve to master the software is steep. Digital video is captured at the time of arthroscopy to a hard disk, and written to a CD at the end of the operative procedure. The process of obtaining video of open procedures is more complex. Outside video of the procedure is recorded on digital tape with a digital video camera. The camera must be plugged into a computer to capture the video on the hard disk. Adobe Premiere software is used to edit the video and render the finished video to the hard drive. This finished video is burned onto a CD. We outline the choice of computer hardware and software for the manipulation of digital video. The techniques of backup and archiving the completed projects and files also are outlined. The uses of digital video for education and the formats that can be used in PowerPoint presentations are discussed.
Hess-Flores, Mauricio [Univ. of California, Davis, CA (United States)
Scene reconstruction from video sequences has become a prominent computer vision research area in recent years, due to its large number of applications in fields such as security, robotics and virtual reality. Despite recent progress in this field, there are still a number of issues that manifest as incomplete, incorrect or computationally-expensive reconstructions. The engine behind achieving reconstruction is the matching of features between images, where common conditions such as occlusions, lighting changes and texture-less regions can all affect matching accuracy. Subsequent processes that rely on matching accuracy, such as camera parameter estimation, structure computation and non-linear parameter optimization, are also vulnerable to additional sources of error, such as degeneracies and mathematical instability. Detection and correction of errors, along with robustness in parameter solvers, are a must in order to achieve a very accurate final scene reconstruction. However, error detection is in general difficult due to the lack of ground-truth information about the given scene, such as the absolute position of scene points or GPS/IMU coordinates for the camera(s) viewing the scene. In this dissertation, methods are presented for the detection, factorization and correction of error sources present in all stages of a scene reconstruction pipeline from video, in the absence of ground-truth knowledge. Two main applications are discussed. The first set of algorithms derive total structural error measurements after an initial scene structure computation and factorize errors into those related to the underlying feature matching process and those related to camera parameter estimation. A brute-force local correction of inaccurate feature matches is presented, as well as an improved conditioning scheme for non-linear parameter optimization which applies weights on input parameters in proportion to estimated camera parameter errors. Another application is in
Zhang, Guofeng; Dong, Zilong; Jia, Jiaya; Wan, Liang; Wong, Tien-Tsin; Bao, Hujun
Compared to still image editing, content-based video editing faces the additional challenges of maintaining the spatiotemporal consistency with respect to geometry. This brings up difficulties of seamlessly modifying video content, for instance, inserting or removing an object. In this paper, we present a new video editing system for creating spatiotemporally consistent and visually appealing refilming effects. Unlike the typical filming practice, our system requires no labor-intensive construction of 3D models/surfaces mimicking the real scene. Instead, it is based on an unsupervised inference of view-dependent depth maps for all video frames. We provide interactive tools requiring only a small amount of user input to perform elementary video content editing, such as separating video layers, completing background scene, and extracting moving objects. These tools can be utilized to produce a variety of visual effects in our system, including but not limited to video composition, "predator" effect, bullet-time, depth-of-field, and fog synthesis. Some of the effects can be achieved in real time.
Kelly, R. F.
Realistic 3-D scene generation is now a possibility for many applications. One barrier to increased use of this technique is the large amount of computer processing time needed to render a scene. With the advent of parallel processors that barrier may be overcome if efficient parallel scene generation algorithms can be developed. In general, this has not been true because of restrictions imposed by non-shared memory and limited processor interconnect architectures. In addition, vector processors do not efficiently support the adaptive nature of many of the algorithms. A new parallel computer, the NYU Ultracomputer, has been developed which features a shared memory with a combining network. The com-bining network permits simultaneous reads and writes to the same memory location using a new instruction the Fetch and_Op. These memory references are resolved in the memory access network and result in particularly efficient shared data structures. Basic elements of this architecture are also being used in the design of the gigaflop range RP3 at IBM. Some algorithms typical of image synthesis are explored in the paper and a class of equivalent queue based algorithms are developed. These algorithms are particularly well suited to the Ultra-computer class processor and hold the promise for many new applications of realistic scene generation.
Juan F. Ramírez Villegas
Full Text Available El modelo de atención visual bottom-up propuesto por Itti et al., 2000 , ha sido un modelo popular en tanto exhibe cierta evidencia neurobiológica de la visión en primates. Este trabajo complementa el modelo computacional de este fenómeno desde la dinámica realista de una red neuronal. Asimismo, esta aproximación se basa en la existencia de mapas topográficos que representan la prominencia de los objetos del campo visual para la formación de una representación general (mapa de prominencia, esta representación es la entrada de una red neuronal dinámica con interacciones locales y globales de colaboración y competencia que convergen sobre las principales particularidades (objetos de la escena.The bottom-up visual attention model proposed by Itti et al. 2000 , has been a popular model since it exhibits certain neurobiological evidence of primates’ vision. This work complements the computational model of this phenomenon using a neural network with realistic dynamics. This approximation is based on several topographical maps representing the objects saliency that construct a general representation (saliency map, which is the input for a dynamic neural network, whose local and global collaborative and competitive interactions converge to the main particularities (objects presented by the visual scene as well.
Henderson, John M.; Nuthmann, Antje; Luke, Steven G.
Recent research on eye movements during scene viewing has primarily focused on where the eyes fixate. But eye fixations also differ in their durations. Here we investigated whether fixation durations in scene viewing are under the direct and immediate control of the current visual input. Subjects freely viewed photographs of scenes in preparation…
Van den Stock, Jan; Vandenbulcke, Mathieu; Sinke, Charlotte B A; Goebel, Rainer; de Gelder, Beatrice
Facial expression perception can be influenced by the natural visual context in which the face is perceived. We performed an fMRI experiment presenting participants with fearful or neutral faces against threatening or neutral background scenes. Triangles and scrambled scenes served as control stimuli. The results showed that the valence of the background influences face selective activity in the right anterior parahippocampal place area (PPA) and subgenual anterior cingulate cortex (sgACC) with higher activation for neutral backgrounds compared to threatening backgrounds (controlled for isolated background effects) and that this effect correlated with trait empathy in the sgACC. In addition, the left fusiform gyrus (FG) responds to the affective congruence between face and background scene. The results show that valence of the background modulates face processing and support the hypothesis that empathic processing in sgACC is inhibited when affective information is present in the background. In addition, the findings reveal a pattern of complex scene perception showing a gradient of functional specialization along the posterior-anterior axis: from sensitivity to the affective content of scenes (extrastriate body area: EBA and posterior PPA), over scene emotion-face emotion interaction (left FG) via category-scene interaction (anterior PPA) to scene-category-personality interaction (sgACC). © The Author (2013). Published by Oxford University Press. For Permissions, please email: firstname.lastname@example.org.
Brenda M Stoesz
Full Text Available Typically developing individuals show a strong visual preference for faces and face-like stimuli; however, this may come at the expense of attending to bodies or to other aspects of a scene. The primary goal of the present study was to provide additional insight into the development of attentional mechanisms that underlie perception of real people in naturalistic scenes. We examined the looking behaviours of typical children, adolescents, and young adults as they viewed static and dynamic scenes depicting one or more people. Overall, participants showed a bias to attend to faces more than on other parts of the scenes. Adding motion cues led to a reduction in the number, but an increase in the average duration of face fixations in single-character scenes. When multiple characters appeared in a scene, motion-related effects were attenuated and participants shifted their gaze from faces to bodies, or made off-screen glances. Children showed the largest effects related to the introduction of motion cues or additional characters, suggesting that they find dynamic faces difficult to process, and are especially prone to look away from faces when viewing complex social scenes – a strategy that could reduce the cognitive and the affective load imposed by having to divide one’s attention between multiple faces. Our findings provide new insights into the typical development of social attention during natural scene viewing, and lay the foundation for future work examining gaze behaviours in typical and atypical development.
In this article, the author introduces a social studies lesson that allows students to learn history and practice reading skills, critical thinking, and writing. The activity is called History Scene Investigation or HSI, which derives its name from the popular television series based on crime scene investigations (CSI). HSI uses discovery learning…
Xia, L.; Pont, S.C.; Heynderickx, I.E.J.R.
Human observers’ ability to infer the light field in empty space is known as the “visual light field.” While most relevant studies were performed using images on computer screens, we investigate the visual light field in a real scene by using a novel experimental setup. A “probe” and a scene were
Hada, Yoshiaki; Ogata, Hiroaki; Yano, Yoneo
This paper focuses on an online video based correction system for language learning. The prototype system using the proposed model supports learning between a native English teacher and a non-native learner using a videoconference system. It extends the videoconference system so that it can record the conversation of a learning scene. If a teacher…
Full Text Available People counting is an important problem in video surveillance applications. This problem has been faced either by trying to detect people in the scene and then counting them or by establishing a mapping between some scene feature and the number of people (avoiding the complex detection problem. This paper presents a novel method, following this second approach, that is based on the use of SURF features and of an ϵ-SVR regressor provide an estimate of this count. The algorithm takes specifically into account problems due to partial occlusions and to perspective. In the experimental evaluation, the proposed method has been compared with the algorithm by Albiol et al., winner of the PETS 2009 contest on people counting, using the same PETS 2009 database. The provided results confirm that the proposed method yields an improved accuracy, while retaining the robustness of Albiol's algorithm.
Full Text Available People counting is an important problem in video surveillance applications. This problem has been faced either by trying to detect people in the scene and then counting them or by establishing a mapping between some scene feature and the number of people (avoiding the complex detection problem. This paper presents a novel method, following this second approach, that is based on the use of SURF features and of an -SVR regressor provide an estimate of this count. The algorithm takes specifically into account problems due to partial occlusions and to perspective. In the experimental evaluation, the proposed method has been compared with the algorithm by Albiol et al., winner of the PETS 2009 contest on people counting, using the same PETS 2009 database. The provided results confirm that the proposed method yields an improved accuracy, while retaining the robustness of Albiol's algorithm.
Conte, Donatello; Foggia, Pasquale; Percannella, Gennaro; Tufano, Francesco; Vento, Mario
People counting is an important problem in video surveillance applications. This problem has been faced either by trying to detect people in the scene and then counting them or by establishing a mapping between some scene feature and the number of people (avoiding the complex detection problem). This paper presents a novel method, following this second approach, that is based on the use of SURF features and of an [InlineEquation not available: see fulltext.]-SVR regressor provide an estimate of this count. The algorithm takes specifically into account problems due to partial occlusions and to perspective. In the experimental evaluation, the proposed method has been compared with the algorithm by Albiol et al., winner of the PETS 2009 contest on people counting, using the same PETS 2009 database. The provided results confirm that the proposed method yields an improved accuracy, while retaining the robustness of Albiol's algorithm.
This thesis is based on a detailed analysis of various topics related to the question of whether video games can be art. In the first place it analyzes the current academic discussion on this subject and confronts different opinions of both supporters and objectors of the idea, that video games can be a full-fledged art form. The second point of this paper is to analyze the properties, that are inherent to video games, in order to find the reason, why cultural elite considers video games as i...
Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Li, Yucheng; Han, Dantao; Yan, Juanli
A wireless video surveillance system based on ARM was designed and implemented in this article. The newest ARM11 S3C6410 was used as the main monitoring terminal chip with the embedded Linux operating system. The video input was obtained by the analog CCD and transferred from analog to digital by the video chip TVP5150. The video was packed by RTP and transmitted by the wireless USB TL-WN322G+ after being compressed by H.264 encoders in S3C6410. Further more, the video images were preprocessed. It can detect the abnormities of the specified scene and the abnormal alarms. The video transmission definition is the standard definition 480P. The video stream can be real-time monitored. The system has been used in the real-time intelligent video surveillance of the specified scene.
cate- gory in Fig. 1. This video contains segments focusing on the snowboard , the person jumping, is shot in an outdoor, ski-resort scene, and has fast... snowboard trick, but is unlikely to include all three. Grouping segments into their relevant scene types can improve recognition. Fi- nally, the model must
Picos, Kenia; Hirales-Carbajal, Adán.; Díaz-Ramírez, Victor H.
A reliable approach for object segmentation based on template-matching filters is proposed. The system employs an adaptive strategy for the generation of space-variant filters which take into account several versions of the target and local statistical properties of the input scene. Moreover, the proposed method considers the geometric modifications of the target while is moving through a video sequence. The detection accuracy of the matched filter brings the location of the target of interest. The estimated location coordinates are used to compute the support area covered by the target using watershed segmentation technique. In each frame, the filter adapts according the geometrical changes of the target in order to estimate its current support region. Experimental tests carried out in a video sequence show that the proposed system yields a very good performance for accuracy detection, and object segmentation efficiency in real-life scenes.
the stationary dataset, we include downsampled versions of dataset obtained by down- sampling the original HD videos to lower framerates and pixel...when video framerates and pixel resolutions are low. This is a relatively unexplored area 3155 Figure 2. Six example scenes in VIRAT Video Dataset...A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen
Gaspar, John G; Street, Whitney N; Windsor, Matthew B; Carbonari, Ronald; Kaczmarski, Henry; Kramer, Arthur F; Mathewson, Kyle E
Cell-phone use impairs driving safety and performance. This impairment may stem from the remote partner's lack of awareness about the driving situation. In this study, pairs of participants completed a driving simulator task while conversing naturally in the car and while talking on a hands-free cell phone. In a third condition, the driver drove while the remote conversation partner could see video of both the road ahead and the driver's face. We tested the extent to which this additional visual information diminished the negative effects of cell-phone distraction and increased situational awareness. Collision rates for unexpected merging events were high when participants drove in a cell-phone condition but were reduced when they were in a videophone condition, reaching a level equal to that observed when they drove with an in-car passenger or drove alone. Drivers and their partners made shorter utterances and made longer, more frequent traffic references when they spoke in the videophone rather than the cell-phone condition. Providing a view of the driving scene allows remote partners to help drivers by modulating their conversation and referring to traffic more often. © The Author(s) 2014.
Jan M. Broekman
Full Text Available Trance shows the Self as a process involved in all sorts and forms of life. A Western perspective on a self and its reifying tendencies is only one (or one series of those variations. The process character of the self does not allow any coherent theory but shows, in particular when confronted with trance, its variability in all regards. What is more: the Self is always first on the scene of itself―a situation in which it becomes a sign for itself. That particular semiotic feature is again not a unified one but leads, as the Self in view of itself does, to series of scenes with changing colors, circumstances and environments. Our first scene “Beyond Monotheism” shows semiotic importance in that a self as determining component of a trance-phenomenon must abolish its own referent and seems not able to answer the question, what makes trance a trance. The Pizzica is an example here. Other social features of trance appear in the second scene, US post traumatic psychological treatments included. Our third scene underlines structures of an unfolding self: beginning with ‘split-ego’ conclusions, a self’s engenderment appears dependent on linguistic events and on spoken words in the first place. A fourth scene explores that theme and explains modern forms of an ego ―in particular those inherent to ‘citizenship’ or a ‘corporation’. The legal consequences are concentrated in the fifth scene, which considers a legal subject by revealing its ‘standing’. Our sixth and final scene pertains to the relation between trance and commerce. All scenes tie together and show parallels between Pizzica, rights-based behavior, RAVE music versus disco, commerce and trance; they demonstrate the meaning of trance as a multifaceted social phenomenon.
Lukes, George E.
Recent activity in synthetic reference scene generation from geographic data bases has lead to new and expanding production responsibilities for the mapping community. It has also spawned a new and growing population of geographic data base users. Optimum utilization of this data requires an understanding of the natural and cultural patterns represented as well as knowledge of the conventions and specifications which guide data base preparation. Prudence dictates effective mechanisms for data base inspection by the user. Appropriate implementation of data display procedures can provide this capability while also supporting routine analysis of data base content. This paper first illustrates a set of convenient mechanisms for the display of the elevation and planimetric components of geographic data files. Then, a new USAETL program in Computer-Assisted Photo Interpretation Research (CAPIR) is introduced. The CAPIR program will explore issues of direct data entry to create geographic data bases from stereo aerial photography. CAPIR also provides a technique for displaying geographic data base contents in corresponding three-dimensional photo models. This capability, termed superposition, will impact on the critical tasks of data validation, revision and intensification which are essential for effective management of geographic files.
Gonzalo H Otazu
Full Text Available The identification of the sound sources present in the environment is essential for the survival of many animals. However, these sounds are not presented in isolation, as natural scenes consist of a superposition of sounds originating from multiple sources. The identification of a source under these circumstances is a complex computational problem that is readily solved by most animals. We present a model of the thalamocortical circuit that performs level-invariant recognition of auditory objects in complex auditory scenes. The circuit identifies the objects present from a large dictionary of possible elements and operates reliably for real sound signals with multiple concurrently active sources. The key model assumption is that the activities of some cortical neurons encode the difference between the observed signal and an internal estimate. Reanalysis of awake auditory cortex recordings revealed neurons with patterns of activity corresponding to such an error signal.
This book presents state-of-the-art computational attention models that have been successfully tested in diverse application areas and can build the foundation for artificial systems to efficiently explore, analyze, and understand natural scenes. It gives a comprehensive overview of the most recent computational attention models for processing visual and acoustic input. It covers the biological background of visual and auditory attention, as well as bottom-up and top-down attentional mechanisms and discusses various applications. In the first part new approaches for bottom-up visual and acoustic saliency models are presented and applied to the task of audio-visual scene exploration of a robot. In the second part the influence of top-down cues for attention modeling is investigated. .
The main goal of this research is to develop the theory and implement practical tools (in both software and hardware) for the capture and recreation of 3D auditory scenes. Our research is expected to have applications in virtual reality, telepresence, film, music, video games, auditory user interfaces, and sound-based surveillance. The first part of our research is concerned with sound capture via a spherical microphone array. The advantage of this array is that it can be steered into any 3D directions digitally with the same beampattern. We develop design methodologies to achieve flexible microphone layouts, optimal beampattern approximation and robustness constraint. We also design novel hemispherical and circular microphone array layouts for more spatially constrained auditory scenes. Using the captured audio, we then propose a unified and simple approach for recreating them by exploring the reciprocity principle that is satisfied between the two processes. Our approach makes the system easy to build, and practical. Using this approach, we can capture the 3D sound field by a spherical microphone array and recreate it using a spherical loudspeaker array, and ensure that the recreated sound field matches the recorded field up to a high order of spherical harmonics. For some regular or semi-regular microphone layouts, we design an efficient parallel implementation of the multi-directional spherical beamformer by using the rotational symmetries of the beampattern and of the spherical microphone array. This can be implemented in either software or hardware and easily adapted for other regular or semi-regular layouts of microphones. In addition, we extend this approach for headphone-based system. Design examples and simulation results are presented to verify our algorithms. Prototypes are built and tested in real-world auditory scenes.
Pandit, Medha; Yusoff, Yusseri; Kittler, Josef; Christmas, William J.; Chilton, E. H. S.
Video database research is commonly concerned with the storage and retrieval of visual information invovling sequence segmentation, shot representation and video clip retrieval. In multimedia applications, video sequences are usually accompanied by a sound track. The sound track contains potential cues to aid shot segmentation such as different speakers, background music, singing and distinctive sounds. These different acoustic categories can be modeled to allow for an effective database retrieval. In this paper, we address the problem of automatic segmentation of audio track of multimedia material. This audio based segmentation can be combined with video scene shot detection in order to achieve partitioning of the multimedia material into semantically significant segments.
Zhong, Yanfei; Fei, Feng; Zhang, Liangpei
The increase of the spatial resolution of remote-sensing sensors helps to capture the abundant details related to the semantics of surface objects. However, it is difficult for the popular object-oriented classification approaches to acquire higher level semantics from the high spatial resolution remote-sensing (HSR-RS) images, which is often referred to as the "semantic gap." Instead of designing sophisticated operators, convolutional neural networks (CNNs), a typical deep learning method, can automatically discover intrinsic feature descriptors from a large number of input images to bridge the semantic gap. Due to the small data volume of the available HSR-RS scene datasets, which is far away from that of the natural scene datasets, there have been few reports of CNN approaches for HSR-RS image scene classifications. We propose a practical CNN architecture for HSR-RS scene classification, named the large patch convolutional neural network (LPCNN). The large patch sampling is used to generate hundreds of possible scene patches for the feature learning, and a global average pooling layer is used to replace the fully connected network as the classifier, which can greatly reduce the total parameters. The experiments confirm that the proposed LPCNN can learn effective local features to form an effective representation for different land-use scenes, and can achieve a performance that is comparable to the state-of-the-art on public HSR-RS scene datasets.
Full Text Available This paper presents a scene composition approach that allows the combinational use of standard three dimensional objects, called models, in order to create X3D scenes. The module is an integral part of a broader design aiming to construct large scale online advertising infrastructures that rely on virtual reality technologies. The architecture addresses a number of problems regarding remote rendering for low end devices and last but not least, the provision of scene composition and integration. Since viewers do not keep information regarding individual input models or scenes, composition requires the consideration of mechanisms that add state to viewing technologies. In terms of this work we extended a well-known, open source X3D authoring tool.
Nortvig, Anne Mette; Sørensen, Birgitte Holm
This project’s aim was to support and facilitate master’s students’ preparation and collaboration by making video podcasts of short lectures available on YouTube prior to students’ first face-to-face seminar. The empirical material stems from group interviews, from statistical data created through...... YouTube analytics and from surveys answered by students after the seminar. The project sought to explore how video podcasts support learning and reflection online and how students use and reflect on the integration of online activities in the videos. Findings showed that students engaged actively...
Yu, Xunyi; Ganz, Aura
In this paper, we propose an identity aware video analytic system that can assist securing the perimeter of a mass casualty incident scene and generate identity annotated video records for forensics and training purposes. Establishing a secure incident scene perimeter and enforcing access control to different zones is a demanding task for current video surveillance systems which lack the ability to provide the identity of the target and its security clearance. Our system which combines active RFID sensors with video analytic tools recovers the identity of the target enabling the activation of suitable alert policies. The system also enables annotation of incident scene video with identity metadata, facilitating the incident response process reconstruction for forensics analysis and emergency response training.
The interactive scenes of The Crystal Cabinet (2008) constitute the first part in my choreographic research project exploring volatile bodies and multistable corporealities. This performance took the form of a dream play opera in twelve scenes including texts and images from William Blake’s (1757-1827) illuminated books. To create his books Blake invented a printing-machine with which he could print his handwritten poems and images. We transformed this idea into an interactive stage area wher...
training the algorithm to learn the background parameters. The need to train such algorithms for each scene separately limits their ability to be...deployed for automatic surveillance tasks, where manual re- training of the module to operate in each new scene is not feasible. A further shortcoming in...and (b). The camera panning is such that the objects of interest, viz. the two cyclists , undergo very small motion in the image coordinates. Figure 1
Berman, Daniel; Golomb, Julie D; Walther, Dirk B
In complex real-world scenes, image content is conveyed by a large collection of intertwined visual features. The visual system disentangles these features in order to extract information about image content. Here, we investigate the role of one integral component: the content of spatial frequencies in an image. Specifically, we measure the amount of image content carried by low versus high spatial frequencies for the representation of real-world scenes in scene-selective regions of human visual cortex. To this end, we attempted to decode scene categories from the brain activity patterns of participants viewing scene images that contained the full spatial frequency spectrum, only low spatial frequencies, or only high spatial frequencies, all carefully controlled for contrast and luminance. Contrary to the findings from numerous behavioral studies and computational models that have highlighted how low spatial frequencies preferentially encode image content, decoding of scene categories from the scene-selective brain regions, including the parahippocampal place area (PPA), was significantly more accurate for high than low spatial frequency images. In fact, decoding accuracy was just as high for high spatial frequency images as for images containing the full spatial frequency spectrum in scene-selective areas PPA, RSC, OPA and object selective area LOC. We also found an interesting dissociation between the posterior and anterior subdivisions of PPA: categories were decodable from both high and low spatial frequency scenes in posterior PPA but only from high spatial frequency scenes in anterior PPA; and spatial frequency was explicitly decodable from posterior but not anterior PPA. Our results are consistent with recent findings that line drawings, which consist almost entirely of high spatial frequencies, elicit a neural representation of scene categories that is equivalent to that of full-spectrum color photographs. Collectively, these findings demonstrate the
Full Text Available In complex real-world scenes, image content is conveyed by a large collection of intertwined visual features. The visual system disentangles these features in order to extract information about image content. Here, we investigate the role of one integral component: the content of spatial frequencies in an image. Specifically, we measure the amount of image content carried by low versus high spatial frequencies for the representation of real-world scenes in scene-selective regions of human visual cortex. To this end, we attempted to decode scene categories from the brain activity patterns of participants viewing scene images that contained the full spatial frequency spectrum, only low spatial frequencies, or only high spatial frequencies, all carefully controlled for contrast and luminance. Contrary to the findings from numerous behavioral studies and computational models that have highlighted how low spatial frequencies preferentially encode image content, decoding of scene categories from the scene-selective brain regions, including the parahippocampal place area (PPA, was significantly more accurate for high than low spatial frequency images. In fact, decoding accuracy was just as high for high spatial frequency images as for images containing the full spatial frequency spectrum in scene-selective areas PPA, RSC, OPA and object selective area LOC. We also found an interesting dissociation between the posterior and anterior subdivisions of PPA: categories were decodable from both high and low spatial frequency scenes in posterior PPA but only from high spatial frequency scenes in anterior PPA; and spatial frequency was explicitly decodable from posterior but not anterior PPA. Our results are consistent with recent findings that line drawings, which consist almost entirely of high spatial frequencies, elicit a neural representation of scene categories that is equivalent to that of full-spectrum color photographs. Collectively, these findings
Alsmirat, Mohammad Abdullah
Video streaming has recently grown dramatically in popularity over the Internet, Cable TV, and wire-less networks. Because of the resource demanding nature of video streaming applications, maximizing resource utilization in any video streaming system is a key factor to increase the scalability and decrease the cost of the system. Resources to…
Full Text Available Many algorithms for temporal video partitioning rely on the analysis of uncompressed video features. Since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream, higher efficiency can be achieved utilizing information from the MPEG compressed domain. This paper introduces a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence. Results of computer simulations are reported.
Hu, Ronghang; Xu, Huazhe; Rohrbach, Marcus; Feng, Jiashi; Saenko, Kate; Darrell, Trevor
In this paper, we address the task of natural language object retrieval, to localize a target object within a given image based on a natural language query of the object. Natural language object retrieval differs from text-based image retrieval task as it involves spatial information about objects within the scene and global scene context. To address this issue, we propose a novel Spatial Context Recurrent ConvNet (SCRC) model as scoring function on candidate boxes for object retrieval, integ...
Xiang, Tao; Gong, Shaogang
This paper aims to address the problem of modelling video behaviour captured in surveillancevideos for the applications of online normal behaviour recognition and anomaly detection. A novelframework is developed for automatic behaviour profiling and online anomaly sampling/detectionwithout any manual labelling of the training dataset. The framework consists of the followingkey components: (1) A compact and effective behaviour representation method is developed basedon discrete scene event detection. The similarity between behaviour patterns are measured basedon modelling each pattern using a Dynamic Bayesian Network (DBN). (2) Natural grouping ofbehaviour patterns is discovered through a novel spectral clustering algorithm with unsupervisedmodel selection and feature selection on the eigenvectors of a normalised affinity matrix. (3) Acomposite generative behaviour model is constructed which is capable of generalising from asmall training set to accommodate variations in unseen normal behaviour patterns. (4) A run-timeaccumulative anomaly measure is introduced to detect abnormal behaviour while normal behaviourpatterns are recognised when sufficient visual evidence has become available based on an onlineLikelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normalbehaviour recognition at the shortest possible time. The effectiveness and robustness of our approachis demonstrated through experiments using noisy and sparse datasets collected from both indoorand outdoor surveillance scenarios. In particular, it is shown that a behaviour model trained usingan unlabelled dataset is superior to those trained using the same but labelled dataset in detectinganomaly from an unseen video. The experiments also suggest that our online LRT based behaviourrecognition approach is advantageous over the commonly used Maximum Likelihood (ML) methodin differentiating ambiguities among different behaviour classes observed online.
The Nutrition Communication Project has overseen production of a training video interpersonal communication for health workers involved in growth monitoring and promotion (GMP) programs in Latin America entitled Comuniquemonos, Ya] Producers used the following questions as their guidelines: Who is the audience?, Why is the training needed?, and What are the objectives and advantages of using video? Communication specialists, anthropologists, educators, and nutritionists worked together to write the script. Then video camera specialists taped the video in Bolivia and Guatemala. A facilitator's guide complete with an outline of an entire workshop comes with the video. The guide encourages trainees to participate in various situations. Trainees are able to compare their interpersonal skills with those of the health workers on the video. Further they can determine cause and effect. The video has 2 scenes to demonstrate poor and good communication skills using the same health worker in both situations. Other scenes highlight 6 communication skills: developing a warm environment, asking questions, sharing results, listening, observing, and doing demonstration. All types of health workers ranging from physicians to community health workers as well as health workers from various countries (Guatemala, Honduras, Bolivia, and Ecuador) approve of the video. Some trainers have used the video without using the guide and comment that it began a debate on communication 's role in GMP efforts.
that we can explore in detail exploits the fact that even though each φm is testing a different 2D image slice, the image slices are often related...space-time cube. We related temporal bandwidth to the spacial resolution of the camera and the speed of objects in the scene. We applied our findings to...performed directly on the compressive measurements without requiring a potentially expensive video reconstruction. Accomplishments In our work exploring
The drug scene generally comprises the following four distinct categories of young people: neophytes, addicts who enjoy a high status vis-à-vis other addicts, multiple drug addicts, and non-addicted drug dealers. It has its own evolution, hierarchy, structure and criteria of success and failure. The members are required to conform to the established criteria. The integration of the young addict into the drug scene is not voluntary in the real sense of the word, for he is caught between the culture that he rejects and the pseudo-culture of the drug scene. To be accepted into the drug scene, the neophyte must furnish proof of his reliability, which often includes certain forms of criminal activities. The addict who has achieved a position of importance in the drug world serves as a role model for behaviour to the neophyte. In a more advanced phase of addiction, the personality of the addict and the social functions of the drug scene are overwhelmed by the psychoactive effects of the drug, and this process results in the social withdrawal of the addict. The life-style of addicts and the subculture they develop are largely influenced by the type of drug consumed. For example, it is possible to speak of a heroin subculture and a cocaine subculture. In time, every drug scene deteriorates so that it becomes fragmented into small groups, which is often caused by legal interventions or a massive influx of new addicts. The fragmentation of the drug scene is followed by an increase in multiple drug abuse, which often aggravates the medical and social problems of drug addicts.
Botella, Guillermo; García, Carlos; Meyer-Bäse, Uwe
This contribution focuses on different topics covered by the special issue titled `Hardware Implementation of Machine vision Systems' including FPGAs, GPUS, embedded systems, multicore implementations for image analysis such as edge detection, segmentation, pattern recognition and object recognition/interpretation, image enhancement/restoration, image/video compression, image similarity and retrieval, satellite image processing, medical image processing, motion estimation, neuromorphic and bioinspired vision systems, video processing, image formation and physics based vision, 3D processing/coding, scene understanding, and multimedia.
Horst, A.R.A. van der
TNO Human Factors conducted long-term video observations to collect data on the pre-crash phase of real accidents (what exactly happened just before the collision?). The video recordings of collisions were used to evaluate and validate the safety value of indepth accident analyses, road scene
Horst, A.R.A. van der
TNO conducted long-term video observations to collect data on the pre-crash phase of real accidents (what exactly happened just before the collision?). The video recordings of collisions were used to evaluate and validate the safety value of in-depth accident analyses, road scene analyses, and
Dette kapitel har fokus på metodiske problemstillinger, der opstår i forhold til at bruge (digital) video i forbindelse med forskningskommunikation, ikke mindst online. Video har længe været benyttet i forskningen til dataindsamling og forskningskommunikation. Med digitaliseringen og internettet er...... der dog opstået nye muligheder og udfordringer i forhold til at formidle og distribuere forskningsresultater til forskellige målgrupper via video. Samtidig er klassiske metodologiske problematikker som forskerens positionering i forhold til det undersøgte stadig aktuelle. Både klassiske og nye...... problemstillinger diskuteres i kapitlet, som rammesætter diskussionen ud fra forskellige positioneringsmuligheder: formidler, historiefortæller, eller dialogist. Disse positioner relaterer sig til genrer inden for ’akademisk video’. Afslutningsvis præsenteres en metodisk værktøjskasse med redskaber til planlægning...
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...... World Videos. The workshops were run on December 4, 2016, in Cancun in Mexico. The two workshops together received 13 papers. Each paper was then reviewed by at least two expert reviewers in the field. In all, 11 papers were accepted to be presented at the workshops. The topics covered in the papers...
Lecca, Michela; Smolka, Bogdan
This text covers state-of-the-art color image and video enhancement techniques. The book examines the multivariate nature of color image/video data as it pertains to contrast enhancement, color correction (equalization, harmonization, normalization, balancing, constancy, etc.), noise removal and smoothing. This book also discusses color and contrast enhancement in vision sensors and applications of image and video enhancement. · Focuses on enhancement of color images/video · Addresses algorithms for enhancing color images and video · Presents coverage on super resolution, restoration, in painting, and colorization.
Parraman, Carinna E.; McCann, John J.; Rizzi, Alessandro
The presentation provides an update on ongoing research using three-dimensional Colour Mondrians. Two still life arrangements comprising hand-painted coloured blocks of 11 different colours were subjected to two different lighting conditions of a nearly uniform light and directed spotlights. The three-dimensional nature of these test targets adds shadows and multiple reflections, not found in flat Mondrian targets. Working from exactly the same pair of scenes, an author painted them using watercolour inks and paints to recreate both LDR and HDR Mondrians on paper. This provided us with a second set of appearance measurements of both scenes. Here we measured appearances by measuring reflectances of the artist's rendering. Land's Colour Mondrian extended colour constancy from a pixel to a complex scene. Since it used a planar array in uniform illumination, it did not measure the appearances of real life 3-D scenes in non-uniform illumination. The experiments in this paper, by simultaneously studying LDR and HDR renditions of the same array of reflectances, extend Land's Mondrian towards real scenes in non-uniform illumination. The results show that the appearances of many areas in complex scenes do not correlate with reflectance.
Ihssen, Niklas; Keil, Andreas
Perceptual processing of natural scene pictures is enhanced when the scene conveys emotional content. Such "motivated attention" to pleasant and unpleasant pictures has been shown to improve identification accuracy in non-speeded behavioural tasks. An open question is whether emotional content also modulates the speed of visual scene processing. In the present studies we show that unpleasant content reliably slowed two-choice categorization of pictures, irrespective of physical image properties, perceptual complexity, and categorization instructions. Conversely, pleasant content did not slow or even accelerated choice reactions, relative to neutral scenes. As indicated by lateralized readiness potentials, these effects occurred at cognitive processing rather than motor preparation/execution stages. Specifically, analysis of event-related potentials showed a prolongation of early scene discrimination for stimuli perceived as emotionally arousing, regardless of valence, and reflected in delayed peaks of the N1 component. In contrast, the timing of other processing steps, reflected in the P2 and late positive potential components and presumably related to post-discriminatory processes such as stimulus-response mapping, appeared to be determined by hedonic valence, with more pleasant scenes eliciting faster processing. Consistent with this model, varying arousal (low/high) within the emotional categories mediated the effects of valence on choice reaction speed. Functionally, arousal may prolong stimulus analysis in order to prevent erroneous and potentially harmful decisions. Pleasantness may act as a safety signal allowing rapid initiation of overt responses.
Full Text Available In recent years, an interest in multimedia services has become a global trend and this trend is still rising. The video quality is a very significant part from the bundle of multimedia services, which leads to a requirement for quality assessment in the video domain. Video quality of a streamed video across IP networks is generally influenced by two factors “transmission link imperfection and efficiency of compression standards. This paper deals with subjective video quality assessment and the impact of the compression standards H.264, H.265 and VP9 on perceived video quality of these compression standards. The evaluation is done for four full HD sequences, the difference of scenes is in the content“ distinction is based on Spatial (SI and Temporal (TI Index of test sequences. Finally, experimental results follow up to 30% bitrate reducing of H.265 and VP9 compared with the reference H.264.
Bager, Gitte; Vilic, Kenan; Vilic, Adnan
This paper introduces a method for tracking patients under video surveillance based on a color marker system. The patients are not restricted in their movements, which requires a tracking system that can overcome non-ideal scenes e.g. occlusions, very fast movements, lighting issues and other mov...
Bager, Gitte; Vilic, Kenan; Alving, Jørgen
This report introduces a method for tracking of patients under video surveillance based on a marker system. The patients are not restricted in their movements, which requires a tracking system that can overcome non-ideal scenes e.g. occlusions, very fast movements, lightning issues and other moving...
Scenes containing many polygons generated in real time. Video pipeline subsystem having branched structure performs scan conversion of polygons in images generated by computers. New subsystem divides polygons into triangles, each of which processed rapidly in parallel, modular fashion and merged into image.
Students of English Language Education Program in Faculty of Cultural Studies Universitas Brawijaya ideally master Grammar before taking the degree of Sarjana Pendidikan. However, the fact shows that they are still weak in Grammar especially tenses. Therefore, the researchers initiate to develop a video as a media to teach tenses. Objectively, by using video, students get better understanding on tenses so that they can communicate using English accurately and contextually. To develop the video, the researchers used ADDIE model (Analysis, Design, Development, Implementation, Evaluation. First, the researchers analyzed the students’ learning need to determine the product that would be developed, in this case was a movie about English tenses. Then, the researchers developed a video as the product. The product then was validated by media expert who validated attractiveness, typography, audio, image, and usefulness and content expert and validated by a content expert who validated the language aspects and tenses of English used by the actors in the video dealing with the grammar content, pronunciation, and fluency performed by the actors. The result of validation shows that the video developed was considered good. Theoretically, it is appropriate to be used English Grammar classes. However, the media expert suggests that it still needs some improvement for the next development especially dealing with the synchronization between lips movement and sound on the scenes while the content expert suggests that the Grammar content of the video should focus on one tense only to provide more detailed concept of the tense.
Thompson, Matthew B; Tangen, Jason M; McCarthy, Duncan J
There has been very little research into the nature and development of fingerprint matching expertise. Here we present the results of an experiment testing the claimed matching expertise of fingerprint examiners. Expert (n = 37), intermediate trainee (n = 8), new trainee (n = 9), and novice (n = 37) participants performed a fingerprint discrimination task involving genuine crime scene latent fingerprints, their matches, and highly similar distractors, in a signal detection paradigm. Results show that qualified, court-practicing fingerprint experts were exceedingly accurate compared with novices. Experts showed a conservative response bias, tending to err on the side of caution by making more errors of the sort that could allow a guilty person to escape detection than errors of the sort that could falsely incriminate an innocent person. The superior performance of experts was not simply a function of their ability to match prints, per se, but a result of their ability to identify the highly similar, but nonmatching fingerprints as such. Comparing these results with previous experiments, experts were even more conservative in their decision making when dealing with these genuine crime scene prints than when dealing with simulated crime scene prints, and this conservatism made them relatively less accurate overall. Intermediate trainees-despite their lack of qualification and average 3.5 years experience-performed about as accurately as qualified experts who had an average 17.5 years experience. New trainees-despite their 5-week, full-time training course or their 6 months experience-were not any better than novices at discriminating matching and similar nonmatching prints, they were just more conservative. Further research is required to determine the precise nature of fingerprint matching expertise and the factors that influence performance. The findings of this representative, lab-based experiment may have implications for the way fingerprint examiners testify in
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real W...
Schildwachter, Eric F.; Boreman, Glenn D.
The Scophony scene projector has been examined in detail. Modulation transfer function was measured and found to be significantly lower than expected. The discrepancy is shown to be due to variation in the Bragg angle with input frequency. Experimental data is compared with calculated performance.
navigate along interstate routes at speeds in excess of 110 mph, and the inclusion of the first down line in televised football games . These...roughly 2000 feet above the target based on the Sadr City scene dimensions and scaling fac - tors. Images were rendered at a resolution of 1000×1000 as
Goodman, Jane; Gillis, Sarah
This article summarizes the work of a diverse group of researchers and practitioners from 5 continents on "Vocational Guidance Requests Within the International Scene" presented in the discussion group at a symposium of the International Association for Educational and Vocational Guidance, the Society for Vocational Psychology, and the…
On 21 October the LHC inauguration ceremony will take place and people from all over CERN have been busy preparing. With delegations from 38 countries attending, including ministers and heads of state, the Bulletin has gone behind the scenes to see what it takes to put together an event of this scale.
Calvo, Manuel G.; Lang, Peter J.
The authors investigated whether emotional pictorial stimuli are especially likely to be processed in parafoveal vision. Pairs of emotional and neutral visual scenes were presented parafoveally (2.1[degrees] or 2.5[degrees] of visual angle from a central fixation point) for 150-3,000 ms, followed by an immediate recognition test (500-ms delay).…
Doerschner, Katja; Maloney, Laurence T; Boyaci, Huseyin
We investigated how spatial pattern, background, and dynamic range affect perceived gloss in brightly lit real scenes. Observers viewed spherical objects against uniform backgrounds. There were three possible objects. Two were black matte spheres with circular matte white dots painted on them (matte-dot spheres). The third sphere was painted glossy black (glossy black sphere). Backgrounds were either black or white matte, and observers saw each of the objects in turn on each background. Scenes were illuminated by an intense collimated source. On each trial, observers matched the apparent albedo of the sphere to an albedo reference scale and its apparent gloss to a gloss reference scale. We found that matte-dot spheres and the black glossy sphere were perceived as glossy on both backgrounds. All spheres were judged to be significantly glossier when in front of the black background. In contrast with previous research using conventional computer displays, we find that background markedly affects perceived gloss. This finding is surprising because darker surfaces are normally perceived as glossier (F. Pellacini, J. A. Ferwerda, & D. P. Greenberg, 2000). We conjecture that there are cues to surface material signaling glossiness present in high dynamic range scenes that are absent or weak in scenes presented using conventional computer displays.
This is a cookbook full of recipes with practical examples enriched with code and the required screenshots for easy and quick comprehension. You should be familiar with the basic concepts of the OpenSceneGraph API and should be able to write simple programs. Some OpenGL and math knowledge will help a lot, too.
Rhee, Taehyun; Petikam, Lohit; Allen, Benjamin; Chalmers, Andrew
This paper presents a novel immersive system called MR360 that provides interactive mixed reality (MR) experiences using a conventional low dynamic range (LDR) 360° panoramic video (360-video) shown in head mounted displays (HMDs). MR360 seamlessly composites 3D virtual objects into a live 360-video using the input panoramic video as the lighting source to illuminate the virtual objects. Image based lighting (IBL) is perceptually optimized to provide fast and believable results using the LDR 360-video as the lighting source. Regions of most salient lights in the input panoramic video are detected to optimize the number of lights used to cast perceptible shadows. Then, the areas of the detected lights adjust the penumbra of the shadow to provide realistic soft shadows. Finally, our real-time differential rendering synthesizes illumination of the virtual 3D objects into the 360-video. MR360 provides the illusion of interacting with objects in a video, which are actually 3D virtual objects seamlessly composited into the background of the 360-video. MR360 was implemented in a commercial game engine and tested using various 360-videos. Since our MR360 pipeline does not require any pre-computation, it can synthesize an interactive MR scene using a live 360-video stream while providing realistic high performance rendering suitable for HMDs.
Ranganath, Heggere S.; Chipman, Laure J.
The ability to match two scenes is a fundamental requirement in a variety of computer vision tasks. A graph theoretic approach to inexact scene matching is presented which is useful in dealing with problems due to imperfect image segmentation. A scene is described by a set of graphs, with nodes representing objects and arcs representing relationships between objects. Each node has a set of values representing the relations between pairs of objects, such as angle, adjacency, or distance. With this method of scene representation, the task in scene matching is to match two sets of graphs. Because of segmentation errors, variations in camera angle, illumination, and other conditions, an exact match between the sets of observed and stored graphs is usually not possible. In the developed approach, the problem is represented as an association graph, in which each node represents a possible mapping of an observed region to a stored object, and each arc represents the compatibility of two mappings. Nodes and arcs have weights indicating the merit or a region-object mapping and the degree of compatibility between two mappings. A match between the two graphs corresponds to a clique, or fully connected subgraph, in the association graph. The task is to find the clique that represents the best match. Fuzzy relaxation is used to update the node weights using the contextual information contained in the arcs and neighboring nodes. This simplifies the evaluation of cliques. A method of handling oversegmentation and undersegmentation problems is also presented. The approach is tested with a set of realistic images which exhibit many types of sementation errors.
Su, Po-Chyi; Wang, Yu-Wei; Chen, Chien-Chang
This paper presents a highlight extraction scheme for sports videos. The approach makes use of the transition logos inserted preceding and following the slow motion replays by the broadcaster, which demonstrate highlights of the game. First, the features of a MPEG compressed video are retrieved for subsequent processing. After the shot boundary detection procedure, the processing units are formed and the units with fast moving scenes are then selected. Finally, the detection of overlaying objects is performed to signal the appearance of a transition logo. Experimental results show the feasibility of this promising method for sports videos highlight extraction.
Pauly, Olivier; Diotte, Benoit; Fallavollita, Pascal; Weidert, Simon; Euler, Ekkehard; Navab, Nassir
In orthopedic and trauma surgery, AR technology can support surgeons in the challenging task of understanding the spatial relationships between the anatomy, the implants and their tools. In this context, we propose a novel augmented visualization of the surgical scene that mixes intelligently the different sources of information provided by a mobile C-arm combined with a Kinect RGB-Depth sensor. Therefore, we introduce a learning-based paradigm that aims at (1) identifying the relevant objects or anatomy in both Kinect and X-ray data, and (2) creating an object-specific pixel-wise alpha map that permits relevance-based fusion of the video and the X-ray images within one single view. In 12 simulated surgeries, we show very promising results aiming at providing for surgeons a better surgical scene understanding as well as an improved depth perception. Copyright © 2014 Elsevier Ltd. All rights reserved.
Yu, Haiping; Guo, Lei; Wang, Shenggang; Lippert, Jack; Li, Le
A Multispectral Polarized Scene Projector (MPSP) had been developed in the short-wave infrared (SWIR) regime for the test & evaluation (T&E) of spectro-polarimetric imaging sensors. This MPSP generates multispectral and hyperspectral video images (up to 200 Hz) with 512×512 spatial resolution with active spatial, spectral, and polarization modulation with controlled bandwidth. It projects input SWIR radiant intensity scenes from stored memory with user selectable wavelength and bandwidth, as well as polarization states (six different states) controllable on a pixel level. The spectral contents are implemented by a tunable filter with variable bandpass built based on liquid crystal (LC) material, together with one passive visible and one passive SWIR cholesteric liquid crystal (CLC) notch filters, and one switchable CLC notch filter. The core of the MPSP hardware is the liquid-crystal-on-silicon (LCoS) spatial light modulators (SLMs) for intensity control and polarization modulation.
Korjoukov, I.; Jeurissen, D.; Kloosterman, N.A.; Verhoeven, J.E.; Scholte, H.S.; Roelfsema, P.R.
Visual perception starts with localized filters that subdivide the image into fragments that undergo separate analyses. The visual system has to reconstruct objects by grouping image fragments that belong to the same object. A widely held view is that perceptual grouping occurs in parallel across
Hansen, Thorsten; Gegenfurtner, Karl R
The magnitudes of chromatic and achromatic edge contrast are statistically independent and thus provide independent information, which can be used for object-contour perception. However, it is unclear if and how much object-contour perception benefits from chromatic edge contrast. To address this question, we investigated how well human-marked object contours can be predicted from achromatic and chromatic edge contrast. We used four data sets of human-marked object contours with a total of 824 images. We converted the images to the Derrington-Krauskopf-Lennie color space to separate chromatic from achromatic information in a physiologically meaningful way. Edges were detected in the three dimensions of the color space (one achromatic and two chromatic) and compared to human-marked object contours using receiver operating-characteristic (ROC) analysis for a threshold-independent evaluation. Performance was quantified by the difference of the area under the ROC curves (ΔAUC). Results were consistent across different data sets and edge-detection methods. If chromatic edges were used in addition to achromatic edges, predictions were better for 83% of the images, with a prediction advantage of 3.5% ΔAUC, averaged across all data sets and edge detectors. For some images the prediction advantage was considerably higher, up to 52% ΔAUC. Interestingly, if achromatic edges were used in addition to chromatic edges, the average prediction advantage was smaller (2.4% ΔAUC). We interpret our results such that chromatic information is important for object-contour perception.
This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...... include: re-identification, consumer behavior analysis, utilizing pupillary response for task difficulty measurement, logo detection, saliency prediction, classification of facial expressions, face recognition, face verification, age estimation, super-resolution, pose estimation, and pain recognition...
include: re-identification, consumer behavior analysis, utilizing pupillary response for task difficulty measurement, logo detection, saliency prediction, classification of facial expressions, face recognition, face verification, age estimation, super-resolution, pose estimation, and pain recognition......This book collects the papers presented at two workshops during the 23rd International Conference on Pattern Recognition (ICPR): the Third Workshop on Video Analytics for Audience Measurement (VAAM) and the Second International Workshop on Face and Facial Expression Recognition (FFER) from Real...
Full Text Available Automated video object recognition is a topic of emerging importance in both defense and civilian applications. This work describes an accurate and low-power neuromorphic architecture and system for real-time automated video object recognition. Our system, Neuormorphic Visual Understanding of Scenes (NEOVUS, is inspired by recent findings in computational neuroscience on feed-forward object detection and classification pipelines for processing and extracting relevant information from visual data. The NEOVUS architecture is inspired by the ventral (what and dorsal (where streams of the mammalian visual pathway and combines retinal processing, form-based and motion-based object detection, and convolutional neural nets based object classification. Our system was evaluated by the Defense Advanced Research Projects Agency (DARPA under the NEOVISION2 program on a variety of urban area video datasets collected from both stationary and moving platforms. The datasets are challenging as they include a large number of targets in cluttered scenes with varying illumination and occlusion conditions. The NEOVUS system was also mapped to commercially available off-the-shelf hardware. The dynamic power requirement for the system that includes a 5.6Mpixel retinal camera processed by object detection and classification algorithms at 30 frames per second was measured at 21.7 Watts (W, for an effective energy consumption of 5.4 nanoJoules (nJ per bit of incoming video. In a systematic evaluation of five different teams by DARPA on three aerial datasets, the NEOVUS demonstrated the best performance with the highest recognition accuracy and at least three orders of magnitude lower energy consumption than two independent state of the art computer vision systems. These unprecedented results show that the NEOVUS has the potential to revolutionize automated video object recognition towards enabling practical low-power and mobile video processing applications.
Calvo, Manuel G; Nummenmaa, Lauri; Hyönä, Jukka
To investigate preferential processing of emotional scenes competing for limited attentional resources with neutral scenes, prime pictures were presented briefly (450 ms), peripherally (5.2 degrees away from fixation), and simultaneously (one emotional and one neutral scene) versus singly. Primes were followed by a mask and a probe for recognition. Hit rate was higher for emotional than for neutral scenes in the dual- but not in the single-prime condition, and A' sensitivity decreased for neutral but not for emotional scenes in the dual-prime condition. This preferential processing involved both selective orienting and efficient encoding, as revealed, respectively, by a higher probability of first fixation on--and shorter saccade latencies to--emotional scenes and by shorter fixation time needed to accurately identify emotional scenes, in comparison with neutral scenes.
.... This fundamental process of auditory perception is called auditory scene analysis. of particular importance in auditory scene analysis is the separation of speech from interfering sounds, or speech segregation...
Mallick, Mahendra; La Scala, Barbara F.
Tracking people and vehicles in an urban environment using video cameras onboard unmanned aerial vehicles has drawn a great deal of interest in recent years due to their low cost compared with expensive radar systems. Video cameras onboard a number of small UAVs can provide inexpensive, effective, and highly flexible airborne intelligence, surveillance and reconnaissance as well as situational awareness functions. The perspective transformation is a commonly used general measurement model for the video camera when the variation in terrain height in the object scene is not negligible and the distance between the camera and the scene is not large. The perspective transformation is a nonlinear function of the object position. Most video tracking applications use a nearly constant velocity model (NCVM) of the target in the local horizontal plane. The filtering problem is nonlinear due to nonlinearity in the measurement model. In this paper, we present algorithms for quantifying the degree of nonlinearity (DoN) by calculating the differential geometry based parameter-effects curvature and intrinsic curvature measures of nonlinearity for the video tracking problem. We use the constant velocity model (CVM) of a target in 2D with simulated video measurements in the image plane. We have presented preliminary results using 200 Monte Carlo simulations and future work will focus on detailed numerical results. Our results for the chosen video tracking problem indicate that the DoN is low and therefore, we expect the extended Kalman filter to be reasonable choice.
Kim, Soyun; Dede, Adam J O; Hopkins, Ramona O; Squire, Larry R
We evaluated two different perspectives about the function of the human hippocampus--one that emphasizes the importance of memory and another that emphasizes the importance of spatial processing and scene construction. We gave tests of boundary extension, scene construction, and memory to patients with lesions limited to the hippocampus or large lesions of the medial temporal lobe. The patients were intact on all of the spatial tasks and impaired on all of the memory tasks. We discuss earlier studies that associated performance on these spatial tasks to hippocampal function. Our results demonstrate the importance of medial temporal lobe structures for memory and raise doubts about the idea that these structures have a prominent role in spatial cognition.
Fullerton, Dan; Bonner, David
Building students' ability to transfer physics fundamentals to real-world applications establishes a deeper understanding of underlying concepts while enhancing student interest. Forensic science offers a great opportunity for students to apply physics to highly engaging, real-world contexts. Integrating these opportunities into inquiry-based problem solving in a team environment provides a terrific backdrop for fostering communication, analysis, and critical thinking skills. One such activity, inspired jointly by the museum exhibit "CSI: The Experience"2 and David Bonner's TPT article "Increasing Student Engagement and Enthusiasm: A Projectile Motion Crime Scene,"3 provides students with three different crime scenes, each requiring an analysis of projectile motion. In this lesson students socially engage in higher-order analysis of two-dimensional projectile motion problems by collecting information from 3-D scale models and collaborating with one another on its interpretation, in addition to diagramming and mathematical analysis typical to problem solving in physics.
Ball, Felix; Elzemann, Anne; Busch, Niko A
The change blindness paradigm, in which participants often fail to notice substantial changes in a scene, is a popular tool for studying scene perception, visual memory, and the link between awareness and attention...
Error NUC Non-Uniformity Correction RMSE Root Mean Squared Error RSD Relative Standard Deviation S3NUC Static Scene Statistical Non-Uniformity...Deviation ( RSD ) which normalizes the standard deviation, σ, to the mean estimated value, µ using the equation RS D = σ µ × 100. The RSD plot of the gain...estimates is shown in Figure 4.1(b). The RSD plot shows that after a sample size of approximately 10, the different photocount values and the inclusion
Bogdan Harasymowicz-Boggio; Barbara Siemiątkowska
Awareness of its own limitations is a fundamental feature of the human sight, which has been almost completely omitted in computer vision systems. In this paper we present a method of explicitly using information about perceptual limitations of a 3D vision system, such as occluded areas, limited field of view, loss of precision along with distance increase, and imperfect segmentation for a better understanding of the observed scene. The proposed mechanism integrates metric and semantic infere...
Elías Gómez Macías
Full Text Available Partiendo de óxido de magnesio comercial se preparó una suspensión acuosa, la cual se secó y calcinó para conferirle estabilidad térmica. El material, tanto fresco como usado, se caracterizó mediante DRX, área superficial BET y SEM-EPMA. El catalizador mostró una matriz de MgO tipo periclasa con CaO en la superficie. Las pruebas de actividad catalítica se efectuaron en lecho fijo empacado con partículas obtenidas mediante prensado, trituración y clasificación del material. El flujo de reactivos consistió en mezclas gas natural-aire por debajo del límite inferior de inflamabilidad. Para diferentes flujos y temperaturas de entrada de la mezcla reactiva, se midieron las concentraciones de CH4, CO2 y CO en los gases de combustión con un analizador de gases tipo infrarrojo no dispersivo (NDIR. Para alcanzar conversión total de metano se requirió aumentar la temperatura de entrada al lecho a medida que se incrementó el flujo de gases reaccionantes. Los resultados obtenidos permiten desarrollar un sistema de combustión catalítica de bajo costo con un material térmicamente estable, que promueva la alta eficiencia en la combustión de gas natural y elimine los problemas de estabilidad, seguridad y de impacto ambiental negativo inherentes a los procesos de combustión térmica convencional.
Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.
In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.
Yuan, Yuan; Fang, Jianwu; Wang, Qi
Abnormal behavior detection in crowd scenes is continuously a challenge in the field of computer vision. For tackling this problem, this paper starts from a novel structure modeling of crowd behavior. We first propose an informative structural context descriptor (SCD) for describing the crowd individual, which originally introduces the potential energy function of particle's interforce in solid-state physics to intuitively conduct vision contextual cueing. For computing the crowd SCD variation effectively, we then design a robust multi-object tracker to associate the targets in different frames, which employs the incremental analytical ability of the 3-D discrete cosine transform (DCT). By online spatial-temporal analyzing the SCD variation of the crowd, the abnormality is finally localized. Our contribution mainly lies on three aspects: 1) the new exploration of abnormal detection from structure modeling where the motion difference between individuals is computed by a novel selective histogram of optical flow that makes the proposed method can deal with more kinds of anomalies; 2) the SCD description that can effectively represent the relationship among the individuals; and 3) the 3-D DCT multi-object tracker that can robustly associate the limited number of (instead of all) targets which makes the tracking analysis in high density crowd situation feasible. Experimental results on several publicly available crowd video datasets verify the effectiveness of the proposed method.
Gutschalk, Alexander; Dykstra, Andrew R
Our auditory system is constantly faced with the task of decomposing the complex mixture of sound arriving at the ears into perceptually independent streams constituting accurate representations of individual sound sources. This decomposition, termed auditory scene analysis, is critical for both survival and communication, and is thought to underlie both speech and music perception. The neural underpinnings of auditory scene analysis have been studied utilizing invasive experiments with animal models as well as non-invasive (MEG, EEG, and fMRI) and invasive (intracranial EEG) studies conducted with human listeners. The present article reviews human neurophysiological research investigating the neural basis of auditory scene analysis, with emphasis on two classical paradigms termed streaming and informational masking. Other paradigms - such as the continuity illusion, mistuned harmonics, and multi-speaker environments - are briefly addressed thereafter. We conclude by discussing the emerging evidence for the role of auditory cortex in remapping incoming acoustic signals into a perceptual representation of auditory streams, which are then available for selective attention and further conscious processing. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
Baktashmotlagh, Mahsa; Harandi, Mehrtash; Lovell, Brian C; Salzmann, Mathieu
Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process. In this paper, we introduce non-linear stationary subspace analysis: a method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., the parts specific to individual videos). Our method also encourages the new representation to be discriminative, thus accounting for the underlying classification problem. We demonstrate the effectiveness of our approach on dynamic texture recognition, scene classification and action recognition.
Keller, Sune Høgild; Lauze, Francois Bernard; Nielsen, Mads
In this paper we propose an energy based algorithm for motion compensated video super-resolution (VSR) targeted on upscaling of standard definition (SD) video to high definition (HD) video. Since the motion (flow field) of the image sequence is generally unknown, we introduce a formulation...... for super-resolved sequences. Computing super-resolved flows has to our knowledge not been done before. Most advanced super-resolution (SR) methods found in literature cannot be applied to general video with arbitrary scene content and/or arbitrary optical flows, as it is possible with our simultaneous VSR...... method. Series of experiments show that our method outperforms other VSR methods when dealing with general video input and that it continues to provide good results even for large scaling factors, up to 8×8....
Xue, Hongyang; Zhao, Zhou; Cai, Deng
Video question answering is an important task towards scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a single static image which is distinct from the dynamic and sequential visual data in the real world. Their approaches cannot utilize the temporal information in videos. In this paper we introduce the task of free-form open-ended video question answering. The open-ended answers enable wider applications compared with the common multiple-choice tasks in Visual-QA. We first propose a dataset for open-ended Video-QA with the automatic question generation approaches. Then we propose our sequential video attention and temporal question attention models. These two models apply the attention mechanism on videos and questions while preserving the sequential and temporal structures of the guides. The two models are integrated into the model of unified attention. After the video and the question are encoded, the answers are generated wordwisely from our models by a decoder. In the end, we evaluate our models on the proposed dataset. The experimental results demonstrate the effectiveness of our proposed model.
Jia, Lili; Liu, Dan; Jiang, Mu-Jin; Cao, Ning
Conventional optical video surveillance systems usually just record what they view, but they can't make sense of what they are viewing. With lots of useless video information stored and transmitted, waste of memory space and increasing the bandwidth are produced every day. In order to reduce the overall cost of the system, and improve the application value of the monitoring system, we use the Kinect sensor with CMOS infrared sensor, as a supplement to the traditional video surveillance system, to establish the natural user interface system for indoor surveillance. In this paper, the architecture of the natural user interface system, complex background monitoring object separation, user behavior analysis algorithms are discussed. By the analysis of the monitoring object, instead of the command language grammar, when the monitored object need instant help, the system with the natural user interface sends help information. We introduce the method of combining the new system and traditional monitoring system. In conclusion, theoretical analysis and experimental results in this paper show that the proposed system is reasonable and efficient. It can satisfy the system requirements of non-contact, online, real time, higher precision and rapid speed to control the state of affairs at the scene.
Full Text Available We investigated the contribution of binocular disparity to the rapid recognition of scenes and simpler spatial patterns using a paradigm combining backward masked stimulus presentation and short-term match-to-sample recognition. First, we showed that binocular disparity did not contribute significantly to the recognition of briefly presented natural and artificial scenes, even when the availability of monocular cues was reduced. Subsequently, using dense random dot stereograms as stimuli, we showed that observers were in principle able to extract spatial patterns defined only by disparity under brief, masked presentations. Comparing our results with the predictions from a cue-summation model, we showed that combining disparity with luminance did not per se disrupt the processing of disparity. Our results suggest that the rapid recognition of scenes is mediated mostly by a monocular comparison of the images, although we can rely on stereo in fast pattern recognition.
Halperin, Tavi; Poleg, Yair; Arora, Chetan; Peleg, Shmuel
The possibility of sharing one's point of view makes use of wearable cameras compelling. These videos are often long, boring and coupled with extreme shake, as the camera is worn on a moving person. Fast forwarding (i.e. frame sampling) is a natural choice for quick video browsing. However, this accentuates the shake caused by natural head motion in an egocentric video, making the fast forwarded video useless. We propose EgoSampling, an adaptive frame sampling that gives stable, fast forwarde...
Bjorn M Kampa
Full Text Available How are visual scenes encoded in local neural networks of visual cortex? In rodents, visual cortex lacks a columnar organization so that processing of diverse features from a spot in visual space could be performed locally by populations of neighboring neurons. To examine how complex visual scenes are represented by local microcircuits in mouse visual cortex we measured visually-evoked responses of layer 2/3 neuronal populations using 3D two-photon calcium imaging. Both natural and artificial movie scenes (10-s duration evoked distributed and sparsely organized responses in local populations of 70 to 150 neurons within the sampled volumes. About 50% of neurons showed calcium transients during visual scene presentation, of which about half displayed reliable temporal activation patterns. The majority of the reliably responding neurons were activated primarily by one of the four visual scenes applied. Consequently, single neurons performed poorly in decoding, which visual scene had been presented. In contrast, high levels of decoding performance (>80% were reached when considering population responses, requiring about 80 randomly picked cells or 20 reliable responders. Furthermore, reliable responding neurons tended to have neighbors sharing the same stimulus preference. Because of this local redundancy, it was beneficial for efficient scene decoding to read out activity from spatially distributed rather than locally clustered neurons. Our results suggest a population code in layer 2/3 of visual cortex, where the visual environment is dynamically represented in the activation of distinct functional sub-networks.
Kaakinen, Johanna K; Hyönä, Jukka; Viljanen, Minna
In the study, 33 participants viewed photographs from either a potential homebuyer's or a burglar's perspective, or in preparation for a memory test, while their eye movements were recorded. A free recall and a picture recognition task were performed after viewing. The results showed that perspective had rapid effects, in that the second fixation after the scene onset was more likely to land on perspective-relevant than on perspective-irrelevant areas within the scene. Perspective-relevant areas also attracted longer total fixation time, more visits, and longer first-pass dwell times than did perspective-irrelevant areas. As for the effects of visual saliency, the first fixation was more likely to land on a salient than on a nonsalient area; salient areas also attracted more visits and longer total fixation time than did nonsalient areas. Recall and recognition performance reflected the eye fixation results: Both were overall higher for perspective-relevant than for perspective-irrelevant scene objects. The relatively low error rates in the recognition task suggest that participants had gained an accurate memory for scene objects. The findings suggest that the role of bottom-up versus top-down factors varies as a function of viewing task and the time-course of scene processing. © 2011 The Experimental Psychology Society
Schildwachter, Eric F.; Boreman, Glenn D.
A Scophony-configuration infrared scene projector, consisting of a raster-scanned CO2 laser and an acoustooptic (AO) modulator, was characterized for modulation transfer function (MTF) performance. The MTF components considered in the model were the Gaussian beam input to the AO cell, the finite aperture of the scan mirror, the width of the detector in the image plane, the transfer function of the amplifier electronics, and a term caused by Bragg-angle detuning over the bandwidth of the AM video signal driving the AO cell. The finite bandwidth of the input video signal caused a spread in the Bragg angle required for maximum diffraction efficiency. In the Scophony configuration, a collimated laser beam enters the AO cell at only one particular angle, so a falloff of diffraction efficiency (and hence MTF) resulted as the modulation frequency was increased. The Bragg-angle detuning term was found to dominate the measured system MTF.
Aminoff, Elissa M; Toneva, Mariya; Shrivastava, Abhinav; Chen, Xinlei; Misra, Ishan; Gupta, Abhinav; Tarr, Michael J
How do we understand the complex patterns of neural responses that underlie scene understanding? Studies of the network of brain regions held to be scene-selective-the parahippocampal/lingual region (PPA), the retrosplenial complex (RSC), and the occipital place area (TOS)-have typically focused on single visual dimensions (e.g., size), rather than the high-dimensional feature space in which scenes are likely to be neurally represented. Here we leverage well-specified artificial vision systems to explicate a more complex understanding of how scenes are encoded in this functional network. We correlated similarity matrices within three different scene-spaces arising from: (1) BOLD activity in scene-selective brain regions; (2) behavioral measured judgments of visually-perceived scene similarity; and (3) several different computer vision models. These correlations revealed: (1) models that relied on mid- and high-level scene attributes showed the highest correlations with the patterns of neural activity within the scene-selective network; (2) NEIL and SUN-the models that best accounted for the patterns obtained from PPA and TOS-were different from the GIST model that best accounted for the pattern obtained from RSC; (3) The best performing models outperformed behaviorally-measured judgments of scene similarity in accounting for neural data. One computer vision method-NEIL ("Never-Ending-Image-Learner"), which incorporates visual features learned as statistical regularities across web-scale numbers of scenes-showed significant correlations with neural activity in all three scene-selective regions and was one of the two models best able to account for variance in the PPA and TOS. We suggest that these results are a promising first step in explicating more fine-grained models of neural scene understanding, including developing a clearer picture of the division of labor among the components of the functional scene-selective brain network.
Elissa Michele Aminoff
Full Text Available How do we understand the complex patterns of neural responses that underlie scene understanding? Studies of the network of brain regions held to be scene-selective – the parahippocampal/lingual region (PPA, the retrosplenial complex (RSC, and the occipital place area (TOS – have typically focused on single visual dimensions (e.g., size, rather than the high-dimensional feature space in which scenes are likely to be neurally represented. Here we leverage well-specified artificial vision systems to explicate a more complex understanding of how scenes are encoded in this functional network. We correlated similarity matrices within three different scene-spaces arising from: 1 BOLD activity in scene-selective brain regions; 2 behavioral measured judgments of visually-perceived scene similarity; and 3 several different computer vision models. These correlations revealed: 1 models that relied on mid- and high-level scene attributes showed the highest correlations with the patterns of neural activity within the scene-selective network; 2 NEIL and SUN – the models that best accounted for the patterns obtained from PPA and TOS – were different from the GIST model that best accounted for the pattern obtained from RSC; 3 The best performing models outperformed behaviorally-measured judgments of scene similarity in accounting for neural data. One computer vision method – NEIL (Never-Ending-Image-Learner, which incorporates visual features learned as statistical regularities across web-scale numbers of scenes – showed significant correlations with neural activity in all three scene-selective regions and was one of the two models best able to account for variance in the PPA and TOS. We suggest that these results are a promising first step in explicating more fine-grained models of neural scene understanding, including developing a clearer picture of the division of labor among the components of the functional scene-selective brain network.
Gorski, Charlotta; Minnis, Helen
Video feedforward is a solution-focused intervention used to improve desired behaviour. We present two case studies of using video feedforward in reactive attachment disorder. Children with reactive attachment disorder, their caregivers and their clinician completed storyboards of behaviours desired during a 'miracle day' and filmed the individual scenes. These scenes were edited to a prolonged sequence of successful behaviour which was fed back to the child and their caregiver using principles of video interaction guidance. Families reported major improvements in the targeted behaviours, usually within a week of filming the 'miracle day'. © The Author(s) 2013.
Lazar, Aurel A; Pnevmatikakis, Eftychios A
We investigate architectures for time encoding and time decoding of visual stimuli such as natural and synthetic video streams (movies, animation). The architecture for time encoding is akin to models of the early visual system. It consists of a bank of filters in cascade with single-input multi-output neural circuits. Neuron firing is based on either a threshold-and-fire or an integrate-and-fire spiking mechanism with feedback. We show that analog information is represented by the neural circuits as projections on a set of band-limited functions determined by the spike sequence. Under Nyquist-type and frame conditions, the encoded signal can be recovered from these projections with arbitrary precision. For the video time encoding machine architecture, we demonstrate that band-limited video streams of finite energy can be faithfully recovered from the spike trains and provide a stable algorithm for perfect recovery. The key condition for recovery calls for the number of neurons in the population to be above a threshold value.
Full Text Available Abstract The counts of malware attacks exploiting the internet increasing day by day and has become a serious threat. The latest malware spreading out through the media players embedded using the video clip of funny in nature to lure the end users. Once it is executed and installed then the behavior of the malware is in the malware authors hand. The spread of the malware emulates through Internet USB drives sharing of the files and folders can be anything which makes presence concealed. The funny video named as it connected to the film celebrity where the malware variant was collected from the laptop of the terror outfit organization .It runs in the backend which it contains malicious code which steals the user sensitive information like banking credentials username amp password and send it to the remote host user called command amp control. The stealed data is directed to the email encapsulated in the malicious code. The potential malware will spread through the USB and other devices .In summary the analysis reveals the presence of malicious code in executable video file and its behavior.
Delgado, Francisco J.; Noyes, Matthew
Our Hybrid Reality and Advanced Operations Lab is developing incredibly realistic and immersive systems that could be used to provide training, support engineering analysis, and augment data collection for various human performance metrics at NASA. To get a better understanding of what Hybrid Reality is, let's go through the two most commonly known types of immersive realities: Virtual Reality, and Augmented Reality. Virtual Reality creates immersive scenes that are completely made up of digital information. This technology has been used to train astronauts at NASA, used during teleoperation of remote assets (arms, rovers, robots, etc.) and other activities. One challenge with Virtual Reality is that if you are using it for real time-applications (like landing an airplane) then the information used to create the virtual scenes can be old (i.e. visualized long after physical objects moved in the scene) and not accurate enough to land the airplane safely. This is where Augmented Reality comes in. Augmented Reality takes real-time environment information (from a camera, or see through window, and places digitally created information into the scene so that it matches with the video/glass information). Augmented Reality enhances real environment information collected with a live sensor or viewport (e.g. camera, window, etc.) with the information-rich visualization provided by Virtual Reality. Hybrid Reality takes Augmented Reality even further, by creating a higher level of immersion where interactivity can take place. Hybrid Reality takes Virtual Reality objects and a trackable, physical representation of those objects, places them in the same coordinate system, and allows people to interact with both objects' representations (virtual and physical) simultaneously. After a short period of adjustment, the individuals begin to interact with all the objects in the scene as if they were real-life objects. The ability to physically touch and interact with digitally created
Yamamoto, Daisuke; Nagao, Katashi
In this paper, we developed a Web-based video annotation system, named iVAS (intelligent Video Annotation Server). Audiences can associate any video content on the Internet with annotations. The system analyzes video content in order to acquire cut/shot information and color histograms. And it also automatically generates a Web page for editing annotations. Then, audiences can create annotation data by two methods. The first one helps the users to create text data such as person/object names, scene descriptions, and comments interactively. The second method facilitates the users associating any video fragments with their subjective impression by just clicking a mouse button. The generated annotation data are accumulated and managed by an XML database connected with iVAS. We also developed some application systems based on annotations such as video retrieval, video simplification, and video-content-based community support. One of the major advantages of our approach is easy integration of hand-coded and automatically-generated (such as color histograms and cut/shot information) annotations. Additionally, since our annotation system is open for public, we must consider some reliability or correctness of annotation data. We also developed an automatic evaluation method of annotation reliability using the users' feedback. In the future, these fundamental technologies will contribute to the formation of new communities centered around video content.
Rafique, Sara A; Solomon-Harris, Lily M; Steeves, Jennifer K E
Viewing the world involves many computations across a great number of regions of the brain, all the while appearing seamless and effortless. We sought to determine the connectivity of object and scene processing regions of cortex through the influence of transient focal neural noise in discrete nodes within these networks. We consecutively paired repetitive transcranial magnetic stimulation (rTMS) with functional magnetic resonance-adaptation (fMR-A) to measure the effect of rTMS on functional response properties at the stimulation site and in remote regions. In separate sessions, rTMS was applied to the object preferential lateral occipital region (LO) and scene preferential transverse occipital sulcus (TOS). Pre- and post-stimulation responses were compared using fMR-A. In addition to modulating BOLD signal at the stimulation site, TMS affected remote regions revealing inter and intrahemispheric connections between LO, TOS, and the posterior parahippocampal place area (PPA). Moreover, we show remote effects from object preferential LO to outside the ventral perception network, in parietal and frontal areas, indicating an interaction of dorsal and ventral streams and possibly a shared common framework of perception and action. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full Text Available Pedestrian movement is woven into the fabric of urban regions. With more people living in cities than ever before, there is an increased need to understand and model how pedestrians utilize and move through space for a variety of applications, ranging from urban planning and architecture to security. Pedestrian modeling has been traditionally faced with the challenge of collecting data to calibrate and validate such models of pedestrian movement. With the increased availability of mobility datasets from video surveillance and enhanced geolocation capabilities in consumer mobile devices we are now presented with the opportunity to change the way we build pedestrian models. Within this paper we explore the potential that such information offers for the improvement of agent-based pedestrian models. We introduce a Scene- and Activity-Aware Agent-Based Model (SA2-ABM, a method for harvesting scene activity information in the form of spatiotemporal trajectories, and incorporate this information into our models. In order to assess and evaluate the improvement offered by such information, we carry out a range of experiments using real-world datasets. We demonstrate that the use of real scene information allows us to better inform our model and enhance its predictive capabilities.
Full Text Available From the recognition of the tension between reality and fiction in contemporary theatre, generally defined as theatre of the real, we intend to make an intersection of this phenomenon with the theoretical field of performativity, which focuses on the work in process, dynamic transformation and experience. The intention is to associate the theory of performativity to observations about the latest work of Theatre Vertigo, directed by Antonio Araujo, Bom Retiro 958 metros. The use of genetic ways to approach theatre will serve as a motto to interpret some aspects of the creative process and the scene.
Ângela Cristina Salgueiro Marques
Full Text Available This paper is aimed at discussing, with focus on Jacques Rancière, how an image policy can be noticed in the creative production of scenes of dissent from which the political agent emerge, appears and constitute himself in a process of subjectivation. The political and critical power of the image is linked to survival acts: operations and attempts that enable to resist to captures, silences and excesses comitted by the media discourses, by the social institutions and by the State.
This essay presents a contemporary translation of and brief commentary on the gay bashing scene found in Marcel Proust's A la Recherche du Temps perdu: Le Côté de Guermantes, Tome I. The paper notes that Proust argues in this passage for the acceptance of homosexuality for two main reasons: because gay bashing won't eradicate it; and because gayness is the simple, direct movement of a being toward perceived beauty. The paper suggests that Proust reveals his own gayness (and that of his protagonist) by employing the latter argument in defense of homosexuality whereas, throughout the novel, he presents heterosexual attraction as an immensely indirect, artistically manufactured construct.
Mullally, Sinéad L.; Vargha-Khadem, Faraneh; Maguire, Eleanor A.
Amnesic patients with bilateral hippocampal damage sustained in adulthood are generally unable to construct scenes in their imagination. By contrast, patients with developmental amnesia (DA), where hippocampal damage was acquired early in life, have preserved performance on this task, although the reason for this sparing is unclear. One possibility is that residual function in remnant hippocampal tissue is sufficient to support basic scene construction in DA. Such a situation was found in the one amnesic patient with adult-acquired hippocampal damage (P01) who could also construct scenes. Alternatively, DA patients’ scene construction might not depend on the hippocampus, perhaps being instead reliant on non-hippocampal regions and mediated by semantic knowledge. To adjudicate between these two possibilities, we examined scene construction during functional MRI (fMRI) in Jon, a well-characterised patient with DA who has previously been shown to have preserved scene construction. We found that when Jon constructed scenes he activated many of the regions known to be associated with imagining scenes in control participants including ventromedial prefrontal cortex, posterior cingulate, retrosplenial and posterior parietal cortices. Critically, however, activity was not increased in Jon's remnant hippocampal tissue. Direct comparisons with a group of control participants and patient P01, confirmed that they activated their right hippocampus more than Jon. Our results show that a type of non-hippocampal dependent scene construction is possible and occurs in DA, perhaps mediated by semantic memory, which does not appear to involve the vivid visualisation of imagined scenes. PMID:24231038
Hwang, Alex D; Wang, Hsueh-Cheng; Pomplun, Marc
The perception of objects in our visual world is influenced by not only their low-level visual features such as shape and color, but also their high-level features such as meaning and semantic relations among them. While it has been shown that low-level features in real-world scenes guide eye movements during scene inspection and search, the influence of semantic similarity among scene objects on eye movements in such situations has not been investigated. Here we study guidance of eye movements by semantic similarity among objects during real-world scene inspection and search. By selecting scenes from the LabelMe object-annotated image database and applying latent semantic analysis (LSA) to the object labels, we generated semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target. An ROC analysis of these maps as predictors of subjects' gaze transitions between objects during scene inspection revealed a preference for transitions to objects that were semantically similar to the currently inspected one. Furthermore, during the course of a scene search, subjects' eye movements were progressively guided toward objects that were semantically similar to the search target. These findings demonstrate substantial semantic guidance of eye movements in real-world scenes and show its importance for understanding real-world attentional control. Copyright © 2011 Elsevier Ltd. All rights reserved.
Mark Daniel Lescroart
Full Text Available Perception of natural visual scenes activates several functional areas in the human brain, including the Parahippocampal Place Area (PPA, Retrosplenial Complex (RSC, and the Occipital Place Area (OPA. It is currently unclear what specific scene-related features are represented in these areas. Previous studies have suggested that PPA, RSC, and/or OPA might represent at least three qualitatively different classes of features: (1 2D features related to Fourier power; (2 3D spatial features such as the distance to objects in a scene; or (3 abstract features such as the categories of objects in a scene. To determine which of these hypotheses best describes the visual representation in scene-selective areas, we applied voxel-wise modeling (VM to BOLD fMRI responses elicited by a set of 1,386 images of natural scenes. VM provides an efficient method for testing competing hypotheses by comparing predictions of brain activity based on encoding models that instantiate each hypothesis. Here we evaluated three different encoding models that instantiate each of the three hypotheses listed above. We used linear regression to fit each encoding model to the fMRI data recorded from each voxel, and we evaluated each fit model by estimating the amount of variance it predicted in a withheld portion of the data set. We found that voxel-wise models based on Fourier power or the subjective distance to objects in each scene predicted much of the variance predicted by a model based on object categories. Furthermore, the response variance explained by these three models is largely shared, and the individual models explain little unique variance in responses. Based on an evaluation of previous studies and the data we present here, we conclude that there is currently no good basis to favor any one of the three alternative hypotheses about visual representation in scene-selective areas. We offer suggestions for further studies that may help resolve this issue.
Lescroart, Mark D.; Stansbury, Dustin E.; Gallant, Jack L.
Perception of natural visual scenes activates several functional areas in the human brain, including the Parahippocampal Place Area (PPA), Retrosplenial Complex (RSC), and the Occipital Place Area (OPA). It is currently unclear what specific scene-related features are represented in these areas. Previous studies have suggested that PPA, RSC, and/or OPA might represent at least three qualitatively different classes of features: (1) 2D features related to Fourier power; (2) 3D spatial features such as the distance to objects in a scene; or (3) abstract features such as the categories of objects in a scene. To determine which of these hypotheses best describes the visual representation in scene-selective areas, we applied voxel-wise modeling (VM) to BOLD fMRI responses elicited by a set of 1386 images of natural scenes. VM provides an efficient method for testing competing hypotheses by comparing predictions of brain activity based on encoding models that instantiate each hypothesis. Here we evaluated three different encoding models that instantiate each of the three hypotheses listed above. We used linear regression to fit each encoding model to the fMRI data recorded from each voxel, and we evaluated each fit model by estimating the amount of variance it predicted in a withheld portion of the data set. We found that voxel-wise models based on Fourier power or the subjective distance to objects in each scene predicted much of the variance predicted by a model based on object categories. Furthermore, the response variance explained by these three models is largely shared, and the individual models explain little unique variance in responses. Based on an evaluation of previous studies and the data we present here, we conclude that there is currently no good basis to favor any one of the three alternative hypotheses about visual representation in scene-selective areas. We offer suggestions for further studies that may help resolve this issue. PMID:26594164
According to a national telephone survey by the Pew Internet Project, 99 percent of boys and 94 percent of girls ages 12-17 play computer, Web, portable, or console games; and 50 percent play such games daily. The survey report, Teens, Video Games, and Civics, examines the extent and nature of teens' game playing and sheds some light on the…
Yang, Jie; Messinger, David W.; Dube, Roger R.; Ientilucci, Emmett J.
Filtered multispectral imaging technique might be a potential method for crime scene documentation and evidence detection due to its abundant spectral information as well as non-contact and non-destructive nature. Low-cost and portable multispectral crime scene imaging device would be highly useful and efficient. The second generation crime scene imaging system uses CMOS imaging sensor to capture spatial scene and bandpass Interference Filters (IFs) to capture spectral information. Unfortunately CMOS sensors suffer from severe spatial non-uniformity compared to CCD sensors and the major cause is Fixed Pattern Noise (FPN). IFs suffer from "blue shift" effect and introduce spatial-spectral correlated errors. Therefore, Fixed Pattern Noise (FPN) correction is critical to enhance crime scene image quality and is also helpful for spatial-spectral noise de-correlation. In this paper, a pixel-wise linear radiance to Digital Count (DC) conversion model is constructed for crime scene imaging CMOS sensor. Pixel-wise conversion gain Gi,j and Dark Signal Non-Uniformity (DSNU) Zi,j are calculated. Also, conversion gain is divided into four components: FPN row component, FPN column component, defects component and effective photo response signal component. Conversion gain is then corrected to average FPN column and row components and defects component so that the sensor conversion gain is uniform. Based on corrected conversion gain and estimated image incident radiance from the reverse of pixel-wise linear radiance to DC model, corrected image spatial uniformity can be enhanced to 7 times as raw image, and the bigger the image DC value within its dynamic range, the better the enhancement.
Full Text Available ... NEI YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration ... Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: ...
Full Text Available ... YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia ... of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ...
... YouTube Videos > NEI YouTube Videos: Amblyopia NEI YouTube Videos YouTube Videos Home Age-Related Macular Degeneration Amblyopia ... of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video for NEI YouTube Videos: Amblyopia ...
Founded as a media training project in 1969, Appalshop (Whitesburg, Kentucky) has become a center that produces award-winning films and videos; provides summer internships for high school students; maintains a traveling theater company; publishes musical recordings; operates a community radio station; and presents a variety of workshops,…
Sussman, Elyse S.
Assessment of the neural correlates of auditory scene analysis, using an index of sound change detection that does not require the listener to attend to the sounds [a component of event-related brain potentials called the mismatch negativity (MMN)], has previously demonstrated that segregation processes can occur without attention focused on the sounds and that within-stream contextual factors influence how sound elements are integrated and represented in auditory memory. The current study investigated the relationship between the segregation and integration processes when they were called upon to function together. The pattern of MMN results showed that the integration of sound elements within a sound stream occurred after the segregation of sounds into independent streams and, further, that the individual streams were subject to contextual effects. These results are consistent with a view of auditory processing that suggests that the auditory scene is rapidly organized into distinct streams and the integration of sequential elements to perceptual units takes place on the already formed streams. This would allow for the flexibility required to identify changing within-stream sound patterns, needed to appreciate music or comprehend speech..
Potter, Mary C.
Three times per second, our eyes make a new fixation that generates a new bottom-up analysis in the visual system. How much is extracted from each glimpse? For how long and in what form is that information remembered? To answer these questions, investigators have mimicked the effect of continual shifts of fixation by using rapid serial visual presentation of sequences of unrelated pictures. Experiments in which viewers detect specified target pictures show that detection on the basis of meaning is possible at presentation durations as brief as 13 ms, suggesting that understanding may be based on feedforward processing, without feedback. In contrast, memory for what was just seen is poor unless the viewer has about 500 ms to think about the scene: the scene does not need to remain in view. Initial memory loss after brief presentations occurs over several seconds, suggesting that at least some of the information from the previous few fixations persists long enough to support a coherent representation of the current environment. In contrast to marked memory loss shortly after brief presentations, memory for pictures viewed for 1 s or more is excellent. Although some specific visual information persists, the form and content of the perceptual and memory representations of pictures over time indicate that conceptual information is extracted early and determines most of what remains in longer-term memory. PMID:22371707
Kwon, TaeKyu; Li, Yunfeng; Sawada, Tadamasa; Pizlo, Zygmunt
This study, which was influenced a lot by Gestalt ideas, extends our prior work on the role of a priori constraints in the veridical perception of 3D shapes to the perception of 3D scenes. Our experiments tested how human subjects perceive the layout of a naturally-illuminated indoor scene that contains common symmetrical 3D objects standing on a horizontal floor. In one task, the subject was asked to draw a top view of a scene that was viewed either monocularly or binocularly. The top views the subjects reconstructed were configured accurately except for their overall size. These size errors varied from trial to trial, and were shown most-likely to result from the presence of a response bias. There was little, if any, evidence of systematic distortions of the subjects' perceived visual space, the kind of distortions that have been reported in numerous experiments run under very unnatural conditions. This shown, we proceeded to use Foley's (Vision Research 12 (1972) 323-332) isosceles right triangle experiment to test the intrinsic geometry of visual space directly. This was done with natural viewing, with the impoverished viewing conditions Foley had used, as well as with a number of intermediate viewing conditions. Our subjects produced very accurate triangles when the viewing conditions were natural, but their performance deteriorated systematically as the viewing conditions were progressively impoverished. Their perception of visual space became more compressed as their natural visual environment was degraded. Once this was shown, we developed a computational model that emulated the most salient features of our psychophysical results. We concluded that human observers see 3D scenes veridically when they view natural 3D objects within natural 3D environments. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Full Text Available ... Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos was designed ... Activity Role of Body Weight in Osteoarthritis Educational Videos for Patients Rheumatoid Arthritis Educational Video Series Psoriatic ...
Zhang, Fang-Lue; Wang, Jue; Zhao, Han; Martin, Ralph R; Hu, Shi-Min
A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: 1) stabilizing video content over short time scales; 2) ensuring simple and consistent camera paths over longer time scales; and 3) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L(1) camera path optimization framework, extended to handle multiple constraints. Two passes of optimization are used to address both low-level and high-level constraints on the camera path. The experimental and user study results show that our approach outputs video that is perceptually better than the input, or the results of using stabilization only.
Full Text Available Transcoding is an effective method to provide video adaptation for heterogeneous internetwork video access and communication environments, which require the tailoring (i.e., repurposing of coded video properties to channel conditions, terminal capabilities, and user preferences. This paper presents a video transcoding system that is capable of applying a suite of error resilience tools on the input compressed video streams while controlling the output rates to provide robust communications over error-prone and bandwidth-limited 3G wireless networks. The transcoder is also designed to employ a new adaptive intra-refresh algorithm, which is responsive to the detected scene activity inherently embedded into the video content and the reported time-varying channel error conditions of the wireless network. Comprehensive computer simulations demonstrate significant improvements in the received video quality performances using the new transcoding architecture without an extra computational cost.
Eminsoy, Sertac; Dogan, Safak; Kondoz, Ahmet M.
Transcoding is an effective method to provide video adaptation for heterogeneous internetwork video access and communication environments, which require the tailoring (i.e., repurposing) of coded video properties to channel conditions, terminal capabilities, and user preferences. This paper presents a video transcoding system that is capable of applying a suite of error resilience tools on the input compressed video streams while controlling the output rates to provide robust communications over error-prone and bandwidth-limited 3G wireless networks. The transcoder is also designed to employ a new adaptive intra-refresh algorithm, which is responsive to the detected scene activity inherently embedded into the video content and the reported time-varying channel error conditions of the wireless network. Comprehensive computer simulations demonstrate significant improvements in the received video quality performances using the new transcoding architecture without an extra computational cost.
Zhou, Bolei; Lapedriza, Agata; Khosla, Aditya; Oliva, Aude; Torralba, Antonio
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification performance at tasks such as visual object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world. Using the state-of-the-art Convolutional Neural Networks (CNNs), we provide scene classification CNNs (Places-CNNs) as baselines, that significantly outperform the previous approaches. Visualization of the CNNs trained on Places shows that object detectors emerge as an intermediate representation of scene classification. With its high-coverage and high-diversity of exemplars, the Places Database along with the Places-CNNs offer a novel resource to guide future progress on scene recognition problems.
Wu, Zhi-guo; Wang, Ming-jia
Dynamic target recognition is an important issue in the field of image processing research. It is widely used in photoelectric detection, target tracking, video surveillance areas. Complex cruise scene of target detection, compared to the static background, since the target and background objects together and both are in motion, greatly increases the complexity of moving target detection and recognition. Based on the practical engineering applications, combining an embedded systems and real-time image detection technology, this paper proposes a real-time movement detection method on an embedded system based on the FPGA + DSP system architecture on an embedded system. The DSP digital image processing system takes high speed digital signal processor DSP TMS320C6416T as the main computing components. And we take large capacity FPGA as coprocessor. It is designed and developed a high-performance image processing card. The FPGA is responsible for the data receiving and dispatching, DSP is responsible for data processing. The FPGA collects image data and controls SDRAM according to the digital image sequence. The SDRAM realizes multiport image buffer. DSP reads real-time image through SDRAM and performs scene motion detection algorithm. Then we implement the data reception and data processing parallelization. This system designs and realizes complex cruise scene motion detection for engineering application. The image edge information has the anti-light change and the strong anti-interference ability. First of all, the adjacent frame and current frame image are processed by convolution operation, extract the edge images. Then we compute correlation strength and the value of movement offset. We can complete scene motion parameters estimation by the result, in order to achieve real-time accurate motion detection. We use images in resolution of 768 * 576 and 25Hz frame rate to do the real-time cruise experiment. The results show that the proposed system achieves real
E.M.A.L. Beauxis-Aussalet (Emmanuelle); E. Arslanova (Elvira); L. Hardman (Lynda); J.R. van Ossenbruggen (Jacco)
htmlabstractIn-situ video recording of underwater ecosystems is able to provide valuable information for biology research and natural resources management, e.g. changes in species abundance. Searching the videos manually, however, requires costly human effort. Our video analysis tool supports the
Full Text Available Abstract Illumination changes cause challenging problems for video surveillance algorithms, as objects of interest become masked by changes in background appearance. It is desired for such algorithms to maintain a consistent perception of a scene regardless of illumination variation. This work introduces a concept we call BigBackground, which is a model for representing large, persistent scene features based on chromatic self-similarity. This model is found to comprise 50% to 90% of surveillance scenes. The large, stable regions represented by the model are used as reference points for performing illumination compensation. The presented compensation technique is demonstrated to decrease improper false-positive classification of background pixels by an average of 83% compared to the uncompensated case and by 25% to 43% compared to compensation techniques from the literature.
Avramescu, A. M.
In present, with the help of computer, we can create special effects that look so real that we almost don't perceive them as being different. These special effects are somehow hard to differentiate from the real elements like those on the screen. With the increasingly accesible 3D field that has more and more areas of application, the 3D technology goes easily from architecture to product designing. Real like 3D animations are used as means of learning, for multimedia presentations of big global corporations, for special effects and even for virtual actors in movies. Technology, as part of the movie art, is considered a prerequisite but the cinematography is the first art that had to wait for the correct intersection of technological development, innovation and human vision in order to attain full achievement. Increasingly more often, the majority of industries is using 3D sequences (three dimensional). 3D represented graphics, commercials and special effects from movies are all designed in 3D. The key for attaining real visual effects is to successfully combine various distinct elements: characters, objects, images and video scenes; like all these elements represent a whole that works in perfect harmony. This article aims to exhibit a game design from these days. Considering the advanced technology and futuristic vision of designers, nowadays we have different and multifarious game models. Special effects are decisively contributing in the creation of a realistic three-dimensional scene. These effects are essential for transmitting the emotional state of the scene. Creating the special effects is a work of finesse in order to achieve high quality scenes. Special effects can be used to get the attention of the onlooker on an object from a scene. Out of the conducted study, the best-selling game of the year 2010 was Call of Duty: Modern Warfare 2. This way, the article aims for the presented scene to be similar with many locations from this type of games, more
Full Text Available As the public education system in Northern Ontario continues to take a downward spiral, a plethora of secondary school students are being placed in an alternative educational environment. Juxtaposing the two educational settings reveals very similar methods and characteristics of educating our youth as opposed to using a truly alternative approach to education. This video reviews the relationship between public education and alternative education in a remote Northern Ontario setting. It is my belief that the traditional methods of teaching are not appropriate in educating at risk students in alternative schools. Paper and pencil worksheets do not motivate these students to learn and succeed. Alternative education should emphasize experiential learning, a just in time curriculum based on every unique individual and the students true passion for everyday life. Cameron Culbert was born on February 3rd, 1977 in North Bay, Ontario. His teenage years were split between attending public school and his willed curriculum on the ski hill. Culbert spent 10 years (1996-2002 & 2006-2010 competing for Canada as an alpine ski racer. His passion for teaching and coaching began as an athlete and has now transferred into the classroom and the community. As a graduate of Nipissing University (BA, BEd, MEd. Camerons research interests are alternative education, physical education and technology in the classroom. Currently Cameron is an active educator and coach in Northern Ontario.
Smith, Rachel Charlotte; Christensen, Kasper Skov; Iversen, Ole Sejer
We introduce Video Design Games to train educators in teaching design. The Video Design Game is a workshop format consisting of three rounds in which participants observe, reflect and generalize based on video snippets from their own practice. The paper reports on a Video Design Game workshop...
Ostrowski, Jeffrey R.; Sarhan, Nabil J.
The popularity of social media has grown dramatically over the World Wide Web. In this paper, we analyze the video popularity distribution of well-known social video websites (YouTube, Google Video, and the AOL Truveo Video Search engine) and characterize their workload. We identify trends in the categories, lengths, and formats of those videos, as well as characterize the evolution of those videos over time. We further provide an extensive analysis and comparison of video content amongst the main regions of the world.
Adam Eichenbaum; Daphne Bavelier; C Shawn Green
The authors review recent research that reveals how today's video games instantiate naturally and effectively many principles psychologists, neuroscientists, and educators believe critical for learning...
Höferlin, Markus Johannes
The amount of video data recorded world-wide is tremendously growing and has already reached hardly manageable dimensions. It originates from a wide range of application areas, such as surveillance, sports analysis, scientific video analysis, surgery documentation, and entertainment, and its analysis represents one of the challenges in computer science. The vast amount of video data renders manual analysis by watching the video data impractical. However, automatic evaluation of video material...
Li, Shuohao; Han, Anqi; Chen, Xu; Yin, Xiaoqing; Zhang, Jun
Recognizing text in images captured in the wild is a fundamental preprocessing task for many computer vision and machine learning applications and has gained significant attention in recent years. This paper proposes an end-to-end trainable deep review neural network for scene text recognition, which is a combination of feature extraction, feature reviewing, feature attention, and sequence recognition. Our model can generate the predicted text without any segmentation or grouping algorithm. Because the attention model in the feature attention stage lacks global modeling ability, a review network is applied to extract the global context of sequence data in the feature reviewing stage. We perform rigorous experiments across a number of standard benchmarks, including IIIT5K, SVT, ICDAR03, and ICDAR13 datasets. Experimental results show that our model is comparable to or outperforms state-of-the-art techniques.
We present a system for automatically synthesizing a diverse set of semantically valid, and well-arranged 3D interior scenes for a given empty room shape. Unlike existing work on layout synthesis, that typically knows potentially needed 3D models and optimizes their location through cost functions, our technique performs the retrieval and placement of 3D models by discovering the relationships between the room space and the models\\' categories. This is enabled by a new analytical structure, called Wall Grid Structure, which jointly considers the categories and locations of 3D models. Our technique greatly reduces the amount of user intervention and provides users with suggestions and inspirations. We demonstrate the applicability of our approach on three types of scenarios: conference rooms, living rooms and bedrooms.
Full Text Available Awareness of its own limitations is a fundamental feature of the human sight, which has been almost completely omitted in computer vision systems. In this paper we present a method of explicitly using information about perceptual limitations of a 3D vision system, such as occluded areas, limited field of view, loss of precision along with distance increase, and imperfect segmentation for a better understanding of the observed scene. The proposed mechanism integrates metric and semantic inference using Dempster-Shafer theory, which makes it possible to handle observations that have different degrees and kinds of uncertainty. The system has been implemented and tested in a real indoor environment, showing the benefits of the proposed approach.
Berg, Alex Rune; Jordán, Tibor
We investigate algorithmic questions and structural problems concerning graph families defined by `edge-counts'. Motivated by recent developments in the unique realization problem of graphs, we give an efficient algorithm to compute the rigid, redundantly rigid, M-connected, and globally rigid...... components of a graph. Our algorithm is based on (and also extends and simplifies) the idea of Hendrickson and Jacobs, as it uses orientations as the main algorithmic tool. We also consider families of bipartite graphs which occur in parallel drawings and scene analysis. We verify a conjecture of Whiteley...... by showing that 2d-connected bipartite graphs are d-tight. We give a new algorithm for finding a maximal d-sharp subgraph. We also answer a question of Imai and show that finding a maximum size d-sharp subgraph is NP-hard....
Every doctor regardless of specialization in his practice may meet the need to provide assistance to victims of crime-related action. In this article there were disscused the issues of informing the investigative authorities about the crime, ensuring the safety of themselves and the environment at the scene. It also shows the specific elements of necessary procedures and practice to deal with the victims designed to securing any evidence present of potential or committed crime in proper manner. Special attention has been given to medical operation and other, necessary in case of certain criminal groups, among the latter we need to underline: actions against sexual freedom and decency, bodily integrity, life and well-being of human, and specially homicide, infanticide and suicide.
Now that the electromagnetic calorimeter support and the mini space frame have been installed, practically all ALICE’s infrastructure is in place. The calorimeter support, an austenitic stainless steel shell weighing 30 tonnes, was slid gently inside the detector, in between the face of the magnet and the space frame. With the completion of two major installation projects, the scene is finally set for the ALICE experiment…or at least it nearly is, as a few design studies, minor installation jobs and measurements still need to be carried out before the curtain can finally be raised. The experiment’s chief engineer Diego Perini confirms: "All the heavy infrastructure for ALICE has been in place and ready for the grand opening since December 2007." The next step will be the installation of additional modules on the TOF and TRD detectors between January and March 2008, and physicists have already started testing the equipment with co...
Lelas, Marko; Pribanić, Tomislav
Ovim radom predstavljena je nova metoda stereo uparivanja temeljena na kombinaciji aktivnog i pasivnog stereo pristupa. Rekonstruirana scena skenirana je laserskom linijom, dok se par stereo kamera koristi za akviziciju video isječka. Svaki slikovni element rekonstruirane scene skeniran je laserskom linijom u određenom trenutku stoga su profili intenziteta svjetline u vremenskoj domeni izrazito korelirani za slikovne elemente lijeve i desne kamere koji odgovaraju istom slikovnom element rekon...
Harkey, Ann Marie
Contents: Publicly released videos on technology transfer items available for licensing from NASA. Includes; Powder Handling Device for Analytical Instruments (Ames); 2. Fiber Optic Shape Sensing (FOSS) (Armstrong); 3. Robo-Glove (Johnson); 4. Modular Robotic Vehicle (Johnson); 5. Battery Management System (Johnson); 6. Active Response Gravity Offload System (ARGOS) (Johnson); 7. Contaminant Resistant Coatings for Extreme Environments (Langley); 8. Molecular Adsorber Coating (MAC) (Goddard); 9. Ultrasonic Stir Welding (Marshall). Also includes scenes from the International Space Station.
Quigg, Stephanie L; Want, Stephen C
Exposure to idealized media portrayals of women induces appearance dissatisfaction in females, in the short term. Interventions that highlight the artificial nature of media portrayals can mitigate this effect. The present research investigated whether a 75 second television commercial, that demonstrates behind-the-scenes techniques used to artificially enhance media models, could be similarly effective. Eighty-seven Caucasian female undergraduates were randomly assigned to one of three conditions. The first group viewed music videos and ordinary television commercials. A second group viewed the same music videos and the "intervention" commercial. A final, control, group viewed television and commercials featuring no people. Viewing music videos resulted in significantly lower levels of self-reported appearance satisfaction compared to viewing control television, p<.05, d=-.67. However, exposure to the intervention commercial counter-acted this effect. Demonstrating the extent to which media portrayals of women are artificially enhanced can mitigate detrimental effects on female appearance satisfaction. Copyright © 2010 Elsevier Ltd. All rights reserved.
Pavšič Mrevlje, Tinkara
Crime scene technicians collect evidence related to crime and are therefore exposed to many traumatic situations. The coping strategies they use are thus very important in the process of facing the psychological consequences of such work. The available literature shows that crime scene technicians are an understudied subgroup of police workers. Our study is therefore the first unfolding insights into technicians' coping strategies, post-traumatic symptomatology and somatic health, based on a sample of 64 male crime scene technicians (85% of all Slovene technicians). Crime scene technicians mainly use avoidance coping strategies. Approach strategies that are more effective in the long-term-i.e. lead to a larger buffering of the effects of traumatic stress-are more frequently used if technicians are familiar with the nature of the task, when they have time to prepare for it, and if they feel that past situations have been positively resolved. Behavioural avoidance strategies were found to be least effective when dealing with traumatic experiences and are also related to more frequent problems of physical health. Results indicate that appropriate trainings for future technicians would facilitate the use of more effective coping strategies and consequently lead to a more effective and satisfied worker. Copyright © 2014 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
To view brain activity in register with visual stimuli, a technique here referred to as "retinotopic projection," which translates functional measurements into retinotopic space, is employed. Retinotopic projection is here first applied to a previously acquired fMRI dataset in which a large set of grayscale photos of real scenes were presented to three subjects. A simple model of local contrast integration accounts for much of the data in early visual areas (V1 and V2). However, consistent discrepancies were discovered: Human faces tend to evoke stronger responses relative to other scene elements than predicted by the model, whereas periodic patterns evoke weaker responses than predicted by the model. Next, in new fMRI experiments, three subjects directed attention toward various elements of naturalistic scenes (Vermeer paintings). Retinotopic projection applied to these data showed that attending to an object increased activation in cortex corresponding to the location of that object. Together the results suggest that even during passive viewing, the visual system differentially processes natural scenes in a manner consistent with deployment of visual attention to salient elements.
Medioni, Gerard; Kang, Zhuoliang
A product may receive each image in a stream of video image of a scene, and before processing the next image, generate information indicative of the position and orientation of an image capture device that captured the image at the time of capturing the image. The product may do so by identifying distinguishable image feature points in the image; determining a coordinate for each identified image feature point; and for each identified image feature point, attempting to identify one or more distinguishable model feature points in a three dimensional (3D) model of at least a portion of the scene that appears likely to correspond to the identified image feature point. Thereafter, the product may find each of the following that, in combination, produce a consistent projection transformation of the 3D model onto the image: a subset of the identified image feature points for which one or more corresponding model feature points were identified; and, for each image feature point that has multiple likely corresponding model feature points, one of the corresponding model feature points. The product may update a 3D model of at least a portion of the scene following the receipt of each video image and before processing the next video image base on the generated information indicative of the position and orientation of the image capture device at the time of capturing the received image. The product may display the updated 3D model after each update to the model.
Green, C Shawn; Li, Renjie; Bavelier, Daphne
Action video games have been shown to enhance behavioral performance on a wide variety of perceptual tasks, from those that require effective allocation of attentional resources across the visual scene, to those that demand the successful identification of fleetingly presented stimuli. Importantly, these effects have not only been shown in expert action video game players, but a causative link has been established between action video game play and enhanced processing through training studies. Although an account based solely on attention fails to capture the variety of enhancements observed after action game playing, a number of models of perceptual learning are consistent with the observed results, with behavioral modeling favoring the hypothesis that avid video game players are better able to form templates for, or extract the relevant statistics of, the task at hand. This may suggest that the neural site of learning is in areas where information is integrated and actions are selected; yet changes in low-level sensory areas cannot be ruled out. Copyright © 2009 Cognitive Science Society, Inc.
Jia, Lili; Tao, Junjie; You, Ying
Video surveillance systems play an important role in the crime scene investigation, and the digital surveillance system always requires the superimposed video data being subjected to a data compression processing. The purpose of this paper is to study the use of inpainting techniques to remove the characters and inpaint the target region. We give the efficient framework including getting Character Superimposition mask, superimposition movement and inpainting the blanks. The character region is located with the manual ROI selection and varying text extractor, such as the time. The superimposed characters usually have distinguished colors from the original background, so the edges are easily detected. We use the canny operator the get the edge image. The missing information which is effect the structure of the original image is reconstructed using a structure propagating algorithm. The experiment was done with C/C++ in the vs2010 KDE. The framework of this paper showed is powerful to recreate the character superimposition region and helpful to the crime scene investigation.
Effective emergency mental health intervention for victims of crime, natural disaster or terrorism begins the moment the first responders arrive. This article describes a range of on-scene crisis intervention options, including verbal communication, body language, behavioral strategies, and interpersonal style. The correct intervention in the first few moments and hours of a crisis can profoundly influence the recovery course of victims and survivors of catastrophic events.
salient objects and environments providing mutual context (i.e., a primary or key object in an outdoor scene embedded in a realistic environmental ...tracking: a two part vision system for small robot navigation in forested environment . Proc. SPIE 8387, Unmanned Systems Technology XIV Conference; 2012...of realistic autonomous outdoor missions in complex and changing environments . Scene understanding for realistic outdoor missions has been
Castelhano, Monica S.; Henderson, John M.
In 3 experiments the authors used a new contextual bias paradigm to explore how quickly information is extracted from a scene to activate gist, whether color contributes to this activation, and how color contributes, if it does. Participants were shown a brief presentation of a scene followed by the name of a target object. The target object could…
Gottesman, Carmela V.
Four experiments examined whether scene processing is facilitated by layout representation, including layout that was not perceived but could be predicted based on a previous partial view (boundary extension). In a priming paradigm (after Sanocki, 2003), participants judged objects' distances in photographs. In Experiment 1, full scenes (target),…
Nuthmann, Antje; Smith, Tim J.; Engbert, Ralf; Henderson, John M.
Eye-movement control during scene viewing can be represented as a series of individual decisions about where and when to move the eyes. While substantial behavioral and computational research has been devoted to investigating the placement of fixations in scenes, relatively little is known about the mechanisms that control fixation durations.…
Nummenmaa, Lauri; Hyona, Jukka; Calvo, Manuel G.
The authors assessed whether parafoveal perception of emotional content influences saccade programming. In Experiment 1, paired emotional and neutral scenes were presented to parafoveal vision. Participants performed voluntary saccades toward either of the scenes according to an imperative signal (color cue). Saccadic reaction times were faster…
Lefevre, Sebastien; Tuia, Devis; Wegner, Jan Dirk; Produit, Timothee; Nassar, Ahmed Samy
In this paper, we discuss and review how combined multiview imagery from satellite to street level can benefit scene analysis. Numerous works exist that merge information from remote sensing and images acquired from the ground for tasks such as object detection, robots guidance, or scene
Lueken, Ulrike; Hoyer, Jürgen; Siegert, Jens; Gloster, Andrew T; Wittchen, Hans-Ulrich
Although video stimulation has been successfully employed in dental phobia, conclusions regarding the specificity of reactions are limited. A novel, video-based paradigm using cross-phobic video stimulation was validated based on subjective and autonomic responses. Forty subjects were stratified according to dental anxiety as measured by the Dental Fear Survey (DFS) using a median-split procedure (high-DFS and low-DFS groups). Anxiety stimuli comprised dental-anxiety scenes and non-dental-anxiety control scenes (snake stimuli). Neutral scenes were tailored to each anxiety stimulus. Dental, but not snake, stimuli were rated as more anxiety provoking only in the high-DFS group. Elevated skin-conductance amplitudes were observed in the high-DFS group for dental anxiety vs. neutral videos, but not for snake anxiety vs. neutral videos. State and trait anxiety and autonomic reactivity were correlated according to expectations. Using cross-phobic video stimulation, it was demonstrated that phobogenic reactions in dental anxiety are specific to the respective stimulus material and do not generalize to other non-dental-anxiety control conditions. The validation of the paradigm may support and stimulate future research on the characterization of dental anxiety on different response systems, including its underlying neural substrates. © 2011 Eur J Oral Sci.
Hu, Tao; Qi, Yuxiao; Li, Shipeng
For intelligent service robots, indoor scene classification is an important issue. To overcome the weak real-time performance of conventional algorithms, a new method based on Cloud computing is proposed for global image features in indoor scene classification. With MapReduce method, global PHOG feature of indoor scene image is extracted in parallel. And, feature eigenvector is used to train the decision classifier through SVM concurrently. Then, the indoor scene is validly classified by decision classifier. To verify the algorithm performance, we carried out an experiment with 350 typical indoor scene images from MIT LabelMe image library. Experimental results show that the proposed algorithm can attain better real-time performance. Generally, it is 1.4 2.1 times faster than traditional classification methods which rely on single computation, while keeping stable classification correct rate as 70%.
Full Text Available Modern graphic/programming tools like Unity enables the possibility of creating 3D scenes as well as making 3D scene based program applications, including full physical model, motion, sounds, lightning effects etc. This paper deals with the usage of dynamic frames based generator in the automatic generation of 3D scene and related source code. The suggested model enables the possibility to specify features of the 3D scene in a form of textual specification, as well as exporting such features from a 3D tool. This approach enables higher level of code generation flexibility and the reusability of the main code and scene artifacts in a form of textual templates. An example of the generated application is presented and discussed.
Nho, Seon Mi; Kim, Eun A
The purpose of this study was to verify the relationships among social support, resilience and post traumatic stress disorder (PTSD), and especially to identify factors influencing PTSD in police crime scene investigators. A cross-sectional design was used, with a convenience sample of 226 police crime scene investigators from 7 Metropolitan Police Agencies. Data were collected through self-report questionnaires during July and August, 2015. Data were analyzed using t-test, χ²-test, Fisher's exact test, and binary logistic regression analysis with SPSS/WIN 21.0 program. The mean score for PTSD in police crime scene investigators was 13.69. 11 points. Of the crime scene investigators 181 (80.1%) were in the low-risk group and 45 (19.9%) in high-risk group. Social support (t=5.68, pcrime scene investigators, intervention programs including social support and strategies to increase should be established.
Straub, Julian; Rosman, Guy; Freifeld, Oren; Leonard, John J.; Fisher, III; , John W.
In one embodiment, a method of identifying the dominant orientations of a scene comprises representing a scene as a plurality of directional vectors. The scene may comprise a three-dimensional representation of a scene, and the plurality of directional vectors may comprise a plurality of surface normals. The method further comprises determining, based on the plurality of directional vectors, a plurality of orientations describing the scene. The determined plurality of orientations explains the directionality of the plurality of directional vectors. In certain embodiments, the plurality of orientations may have independent axes of rotation. The plurality of orientations may be determined by representing the plurality of directional vectors as lying on a mathematical representation of a sphere, and inferring the parameters of a statistical model to adapt the plurality of orientations to explain the positioning of the plurality of directional vectors lying on the mathematical representation of the sphere.
Ishikawa, Tomoya; Yamazawa, Kazumasa; Sato, Tomokazu; Ikeda, Sei; Nakamura, Yutaka; Fujikawa, Kazutoshi; Sunahara, Hideki; Yokoya, Naokazu
In this paper, we describe a new telepresence system which enables a user to look around a virtualized real world easily in network environments. The proposed system includes omni-directional video viewers on web browsers and allows the user to look around the omni-directional video contents on the web browsers. The omni-directional video viewer is implemented as an Active-X program so that the user can install the viewer automatically only by opening the web site which contains the omni-directional video contents. The system allows many users at different sites to look around the scene just like an interactive TV using a multi-cast protocol without increasing the network traffic. This paper describes the implemented system and the experiments using live and stored video streams. In the experiment with stored video streams, the system uses an omni-directional multi-camera system for video capturing. We can look around high resolution and high quality video contents. In the experiment with live video streams, a car-mounted omni-directional camera acquires omni-directional video streams surrounding the car, running in an outdoor environment. The acquired video streams are transferred to the remote site through the wireless and wired network using multi-cast protocol. We can see the live video contents freely in arbitrary direction. In the both experiments, we have implemented a view-dependent presentation with a head-mounted display (HMD) and a gyro sensor for realizing more rich presence.
Mullally, Sinéad L; Vargha-Khadem, Faraneh; Maguire, Eleanor A
Amnesic patients with bilateral hippocampal damage sustained in adulthood are generally unable to construct scenes in their imagination. By contrast, patients with developmental amnesia (DA), where hippocampal damage was acquired early in life, have preserved performance on this task, although the reason for this sparing is unclear. One possibility is that residual function in remnant hippocampal tissue is sufficient to support basic scene construction in DA. Such a situation was found in the one amnesic patient with adult-acquired hippocampal damage (P01) who could also construct scenes. Alternatively, DA patients' scene construction might not depend on the hippocampus, perhaps being instead reliant on non-hippocampal regions and mediated by semantic knowledge. To adjudicate between these two possibilities, we examined scene construction during functional MRI (fMRI) in Jon, a well-characterised patient with DA who has previously been shown to have preserved scene construction. We found that when Jon constructed scenes he activated many of the regions known to be associated with imagining scenes in control participants including ventromedial prefrontal cortex, posterior cingulate, retrosplenial and posterior parietal cortices. Critically, however, activity was not increased in Jon's remnant hippocampal tissue. Direct comparisons with a group of control participants and patient P01, confirmed that they activated their right hippocampus more than Jon. Our results show that a type of non-hippocampal dependent scene construction is possible and occurs in DA, perhaps mediated by semantic memory, which does not appear to involve the vivid visualisation of imagined scenes. © 2013 Published by Elsevier Ltd.
Wang, Hong-jie; Qian, Li-xun; Cao, Chun; Li, Zhuo
Infrared scenes generation technologies are used to simulate the infrared radiation characteristics of target and background in the laboratory. They provide synthetic infrared imagery for thermal imager test and evaluation application in the infrared imaging systems. At present, many Infrared scenes generation technologies have been widely used, and they make a lot of achievements. In this paper, we design and manufacture one high performance IR scene generation technology, and the whole thin film type transducer is the key, which is fabricated based on micro electro mechanical systems (MEMS). The specific MEMS technological process parameters are obtained from a large number of experiments. The properties of infrared scene generation chip are investigated experimentally. It achieves high resolution, high frame, and reliable performance, which can meet the requirements of most simulation system. The radiation coefficient of the thin film transducer is measured to be 0.86. The frame rate is 160 Hz. The emission spectrum is from 2μm to 12μm in infrared band. Illuminated by the visible light with different intensities the equivalent black body temperature of transducer could be varied in the range of 290K to 440K. The spatial resolution is more than 256×256.The geometric distortion and the uniformity of the generated infrared scene is 5 percent. The infrared scene generator based on the infrared scene generation chip include three parts, which are visual image projector, visual to thermal transducer and the infrared scene projector. The experimental results show that this thin film type infrared scene generation chip meets the requirements of most of hardware-in-the-loop scene simulation systems for IR sensors testing.
Proverbio, Alice Mado; Adorni, Roberta; Zani, Alberto; Trestianu, Laura
Recent findings have demonstrated that women might be more reactive than men to viewing painful stimuli (vicarious response to pain), and therefore more empathic [Han, S., Fan, Y., & Mao, L. (2008). Gender difference in empathy for pain: An electrophysiological investigation. Brain Research, 1196, 85-93]. We investigated whether the two sexes differed in their cerebral responses to affective pictures portraying humans in different positive or negative contexts compared to natural or urban scenarios. 440 IAPS slides were presented to 24 Italian students (12 women and 12 men). Half the pictures displayed humans while the remaining scenes lacked visible persons. ERPs were recorded from 128 electrodes and swLORETA (standardized weighted Low-Resolution Electromagnetic Tomography) source reconstruction was performed. Occipital P115 was greater in response to persons than to scenes and was affected by the emotional valence of the human pictures. This suggests that processing of biologically relevant stimuli is prioritized. Orbitofrontal N2 was greater in response to positive than negative human pictures in women but not in men, and not to scenes. A late positivity (LP) to suffering humans far exceeded the response to negative scenes in women but not in men. In both sexes, the contrast suffering-minus-happy humans revealed a difference in the activation of the occipito/temporal, right occipital (BA19), bilateral parahippocampal, left dorsal prefrontal cortex (DPFC) and left amygdala. However, increased right amygdala and right frontal area activities were observed only in women. The humans-minus-scenes contrast revealed a difference in the activation of the middle occipital gyrus (MOG) in men, and of the left inferior parietal (BA40), left superior temporal gyrus (STG, BA38) and right cingulate (BA31) in women (270-290 ms). These data indicate a sex-related difference in the brain response to humans, possibly supporting human empathy.
Ning, Shuangning; Sang, Xinzhu; Chen, Duo
A markerless client-server augmented reality system is presented. In this research, the more extensive and mature virtual reality head-mounted display is adopted to assist the implementation of augmented reality. The viewer is provided an image in front of their eyes with the head-mounted display. The front-facing camera is used to capture video signals into the workstation. The generated virtual scene is merged with the outside world information received from the camera. The integrated video is sent to the helmet display system. The distinguishing feature and novelty is to realize the augmented reality with natural features instead of marker, which address the limitations of the marker, such as only black and white, the inapplicability of different environment conditions, and particularly cannot work when the marker is partially blocked. Further, 3D stereoscopic perception of virtual animation model is achieved. The high-speed and stable socket native communication method is adopted for transmission of the key video stream data, which can reduce the calculation burden of the system.
Ren, Zhuo-Ming; Shi, Yu-Qiang; Liao, Hao
Online popularity has a major impact on videos, music, news and other contexts in online systems. Characterizing online popularity dynamics is nature to explain the observed properties in terms of the already acquired popularity of each individual. In this paper, we provide a quantitative, large scale, temporal analysis of the popularity dynamics in two online video-provided websites, namely MovieLens and Netflix. The two collected data sets contain over 100 million records and even span a decade. We characterize that the popularity dynamics of online videos evolve over time, and find that the dynamics of the online video popularity can be characterized by the burst behaviors, typically occurring in the early life span of a video, and later restricting to the classic preferential popularity increase mechanism.
Zhao, Hongjian; Xia, Shixiong; Yao, Rui; Niu, Qiang; Zhou, Yong
Concatenating multicamera videos with differing centers of projection into a single panoramic video is a critical technology of many important applications. We propose a real-time video fusion approach to create wide field-of-view video. To provide a fast and accurate video registration method, we propose multistage hashing to find matched feature-point pairs from coarse to fine. In the first stage of multistage hashing, a short compact binary code is learned from all feature points, and then we calculate the Hamming distance between each two points to find the candidate-matched points. In the second stage, a long binary code is obtained by remapping the candidate points for fine matching. To tackle the distortion and scene depth variation of multiview frames in videos, we build hybrid transformation with depth adjustment. The depth compensation between two adjacent frames extends into multiple frames in an iterative model for successive video frames. We conduct several experiments with different dynamic scenes and camera numbers to verify the performance of the proposed real-time video fusion approach.
The video coding and distribution approach presented in this paper has two key characteristics that make it ideal for integration of video communication services over common broadband digital networks. The modular multi-resolution nature of the coding scheme provides the necessary flexibility to accommodate future advances in video technology as well as robust distribution over various network environments. This paper will present an efficient and scalable coding scheme for video communications. The scheme is capable of encoding and decoding video signals in a hierarchical, multilayer fashion to provide video at differing quality grades. Subsequently, the utilization of this approach to enable efficient bandwidth sharing and robust distribution of video signals in multipoint communications is presented. Coding and distribution architectures are discussed which include multi-party communications in a multi-window fashion within ATM environments. Furthermore, under the limited capabilities typical of wideband/broadband access networks, this architecture accommodates important video-based service applications such as Interactive Distance Learning.
Matsushita, Yasuyuki; Ofek, Eyal; Ge, Weina; Tang, Xiaoou; Shum, Heung-Yeung
Video stabilization is an important video enhancement technology which aims at removing annoying shaky motion from videos. We propose a practical and robust approach of video stabilization that produces full-frame stabilized videos with good visual quality. While most previous methods end up with producing smaller size stabilized videos, our completion method can produce full-frame videos by naturally filling in missing image parts by locally aligning image data of neighboring frames. To achieve this, motion inpainting is proposed to enforce spatial and temporal consistency of the completion in both static and dynamic image areas. In addition, image quality in the stabilized video is enhanced with a new practical deblurring algorithm. Instead of estimating point spread functions, our method transfers and interpolates sharper image pixels of neighboring frames to increase the sharpness of the frame. The proposed video completion and deblurring methods enabled us to develop a complete video stabilizer which can naturally keep the original image quality in the stabilized videos. The effectiveness of our method is confirmed by extensive experiments over a wide variety of videos.
Full Text Available ... support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ... group for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support ...
Bourgonjon, Jeroen; Soetaert, Ronald
... by exploring a particular aspect of digitization that affects young people, namely video games. They explore the new social spaces which emerge in video game culture and how these spaces relate to community building and citizenship...
Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles News Resources Links Videos Podcasts Webinars For the ... Doctor Find a Provider Meet the Team Blog Articles News Provider Directory Donate Resources Links Videos Podcasts ...
Full Text Available ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For ... Doctor Find a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos ...
questions of our media literacy pertaining to authoring multimodal texts (visual, verbal, audial, etc.) in research practice and the status of multimodal texts in academia. The implications of academic video extend to wider issues of how researchers harness opportunities to author different types of texts......Is video becoming “the new black” in academia, if so, what are the challenges? The integration of video in research methodology (for collection, analysis) is well-known, but the use of “academic video” for dissemination is relatively new (Eriksson and Sørensen). The focus of this paper is academic...... video, or short video essays produced for the explicit purpose of communicating research processes, topics, and research-based knowledge (see the journal of academic videos: www.audiovisualthinking.org). Video is increasingly used in popular showcases for video online, such as YouTube and Vimeo, as well...
Full Text Available ... Back Support Groups Is a support group for me? Find a Group Upcoming Events Video Library Photo ... Support Groups Back Is a support group for me? Find a group Back Upcoming events Video Library ...
Full Text Available ... group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork ... for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ...
Full Text Available ... the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media For Clinicians For ... Family Caregivers Glossary Menu In this section Links Videos Podcasts Webinars For the Media For Clinicians For ...
... the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media For Clinicians For ... Family Caregivers Glossary Menu In this section Links Videos Podcasts Webinars For the Media For Clinicians For ...
Full Text Available ... a Provider Meet the Team Blog Articles & Stories News Resources Links Videos Podcasts Webinars For the Media ... a Provider Meet the Team Blog Articles & Stories News Provider Directory Donate Resources Links Videos Podcasts Webinars ...
Full Text Available ... for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
This article is an introduction to video screen capture. Basic information of two software programs, QuickTime for Mac and BlueBerry Flashback Express for PC, are also discussed. Practical applications for video screen capture are given.
Full Text Available ... News Resources Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary ... this section Links Videos Podcasts Webinars For the Media For Clinicians For Policymakers For Family Caregivers Glossary ...
Full Text Available The Art Toys phenomenon, better known as Art Toy Movement, was born in China in the mid-nineties and quickly spread out to the rest of the world. The toys are an artistic production of serial sculpture, made by handcrafts or on an industrial scale. There are several types of toys, such as custom toys and canvas toys, synonyms of designer toys, although they are often defined according to the constituent material, such as vinyl toys (plastic and plush toys (fabric. Art toys are the heirs of an already pop-surrealist and neo-pop circuit, which since the eighties of the twentieth century has pervaded the Japanese-American art scene, winking to the playful spirit of the avant-garde of the early century. Some psychoanalytic, pedagogical and anthropological studies about “play theories”, may also help us to understand and identify these heterogeneous products as real works of art and not simply as collectible toys.
Durnal, Evan W
A mysterious green ooze is injected into a brightly illuminated and humming machine; 10s later, a printout containing a complete biography of the substance is at the fingertips of an attractive young investigator who exclaims "we found it!" We have all seen this event occur countless times on any and all of the three CSI dramas, Cold Cases, Crossing Jordans, and many more. With this new style of "infotainment" (Surette, 2007), comes an increasingly blurred line between the hard facts of reality and the soft, quick solutions of entertainment. With these advances in technology, how can crime rates be anything but plummeting as would-be criminals cringe at the idea of leaving the smallest speck of themselves at a crime scene? Surely there are very few serious crimes that go unpunished in today's world of high-tech, fast-paced gadgetry. Science and technology have come a great distance since Sir Arthur Conan Doyle first described the first famous forensic scientist (Sherlock Holmes), but still have light-years to go. (c) 2010. Published by Elsevier Ireland Ltd.
Barr, David A; Haigh, Craig A; Haller, Jeannie M; Smith, Denise L
The objective of this study was to retrospectively investigate aspects of medical monitoring, including medical complaints, vital signs at entry, and vital sign recovery, in firefighters during rehabilitation following operational firefighting duties. Incident scene rehabilitation logs obtained over a 5-year span that included 53 incidents, approximately 40 fire departments, and more than 530 firefighters were reviewed. Only 13 of 694 cases involved a firefighter reporting a medical complaint. In most cases, vital signs were similar between firefighters who registered a complaint and those who did not. On average, heart rate was 104 ± 23 beats·min(-1), systolic blood pressure was 132 ± 17 mmHg, diastolic blood pressure was 81 ± 12 mmHg, and respiratory rate was 19 ± 3 breaths·min(-1) upon entry into rehabilitation. At least two measurements of heart rate, systolic blood pressure, diastolic blood pressure, and respiratory rate were obtained for 365, 383, 376, and 160 cases, respectively. Heart rate, systolic and diastolic blood pressures, and respiratory rate decreased significantly (p firefighters recovered from the physiological stress of firefighting without any medical complaint or symptoms. Furthermore, vital signs were within fire service suggested guidelines for release within 10 or 20 minutes of rehabilitation. The data suggested that vital signs of firefighters with medical symptoms were not significantly different from vital signs of firefighters who had an unremarkable recovery.
Pasch, H. L.
An overview of video coding is presented. The aim is not to give a technical summary of possible coding techniques, but to address subjects related to video compression in general and to the transmission of compressed video in more detail. Bit rate reduction is in general possible by removing redundant information; removing information the eye does not use anyway; and reducing the quality of the video. The codecs which are used for reducing the bit rate, can be divided into two groups: Constant Bit rate Codecs (CBC's), which keep the bit rate constant, but vary the video quality; and Variable Bit rate Codecs (VBC's), which keep the video quality constant by varying the bit rate. VBC's can be in general reach a higher video quality than CBC's using less bandwidth, but need a transmission system that allows the bandwidth of a connection to fluctuate in time. The current and the next generation of the PSTN does not allow this; ATM might. There are several factors which influence the quality of video: the bit error rate of the transmission channel, slip rate, packet loss rate/packet insertion rate, end-to-end delay, phase shift between voice and video, and bit rate. Based on the bit rate of the coded video, the following classification of coded video can be made: High Definition Television (HDTV); Broadcast Quality Television (BQTV); video conferencing; and video telephony. The properties of these classes are given. The video conferencing and video telephony equipment available now and in the next few years can be divided into three categories: conforming to 1984 CCITT standard for video conferencing; conforming to 1988 CCITT standard; and conforming to no standard.
Online videos are an increasingly important way technology is contributing to the improvement of physics teaching. Students and teachers have begun to rely on online videos to provide them with content knowledge and instructional strategies. Online audiences are expecting greater production value, and departments are sometimes requesting educators to post video pre-labs or to flip our classrooms. In this article, I share my advice on creating engaging physics videos.
Potter, Ray; Roberts, Deborah
This guide aims to provide an introduction to Desktop Video Conferencing. You may be familiar with video conferencing, where participants typically book a designated conference room and communicate with another group in a similar room on another site via a large screen display. Desktop video conferencing (DVC), as the name suggests, allows users to video conference from the comfort of their own office, workplace or home via a desktop/laptop Personal Computer. DVC provides live audio and visua...
... 47 Telecommunication 4 2010-10-01 2010-10-01 false Video description of video programming. 79.3... CLOSED CAPTIONING AND VIDEO DESCRIPTION OF VIDEO PROGRAMMING § 79.3 Video description of video programming. (a) Definitions. For purposes of this section the following definitions shall apply: (1...
Castelhano, Monica S; Pereira, Effie J
Many studies in reading have shown the enhancing effect of context on the processing of a word before it is directly fixated (parafoveal processing of words; Balota et al., 1985; Balota & Rayner, 1983; Ehrlich & Rayner, 1981). Here, we examined whether scene context influences the parafoveal processing of objects and enhances the extraction of object information. Using a modified boundary paradigm (Rayner, 1975), the Dot-Boundary paradigm, participants fixated on a suddenly-onsetting cue before the preview object would onset 4° away. The preview object could be identical to the target, visually similar, visually dissimilar, or a control (black rectangle). The preview changed to the target object once a saccade toward the object was made. Critically, the objects were presented on either a consistent or an inconsistent scene background. Results revealed that there was a greater processing benefit for consistent than inconsistent scene backgrounds and that identical and visually similar previews produced greater processing benefits than other previews. In the second experiment, we added an additional context condition in which the target location was inconsistent, but the scene semantics remained consistent. We found that changing the location of the target object disrupted the processing benefit derived from the consistent context. Most importantly, across both experiments, the effect of preview was not enhanced by scene context. Thus, preview information and scene context appear to independently boost the parafoveal processing of objects without any interaction from object-scene congruency.
Buggey, Tom; Ogle, Lindsey
Video self-modeling (VSM) first appeared on the psychology and education stage in the early 1970s. The practical applications of VSM were limited by lack of access to tools for editing video, which is necessary for almost all self-modeling videos. Thus, VSM remained in the research domain until the advent of camcorders and VCR/DVD players and,…
Epley, Hannah K.
There is a need for Extension professionals to show clientele the benefits of their program. This article shares how promotional videos are one way of reaching audiences online. An example is given on how a promotional video has been used and developed using iMovie software. Tips are offered for how professionals can create a promotional video and…
Adamczyk, Marcin; Hołowko, Elwira; Lech, Krzysztof; Michoński, Jakub; MÄ czkowski, Grzegorz; Bolewicki, Paweł; Januszkiewicz, Kamil; Sitnik, Robert
Three dimensional measurements (such as photogrammetry, Time of Flight, Structure from Motion or Structured Light techniques) are becoming a standard in the crime scene documentation process. The usage of 3D measurement techniques provide an opportunity to prepare more insightful investigation and helps to show every trace in the context of the entire crime scene. In this paper we would like to present a hierarchical, three-dimensional measurement system that is designed for crime scenes documentation process. Our system reflects the actual standards in crime scene documentation process - it is designed to perform measurement in two stages. First stage of documentation, the most general, is prepared with a scanner with relatively low spatial resolution but also big measuring volume - it is used for the whole scene documentation. Second stage is much more detailed: high resolution but smaller size of measuring volume for areas that required more detailed approach. The documentation process is supervised by a specialised application CrimeView3D, that is a software platform for measurements management (connecting with scanners and carrying out measurements, automatic or semi-automatic data registration in the real time) and data visualisation (3D visualisation of documented scenes). It also provides a series of useful tools for forensic technicians: virtual measuring tape, searching for sources of blood spatter, virtual walk on the crime scene and many others. In this paper we present our measuring system and the developed software. We also provide an outcome from research on metrological validation of scanners that was performed according to VDI/VDE standard. We present a CrimeView3D - a software-platform that was developed to manage the crime scene documentation process. We also present an outcome from measurement sessions that were conducted on real crime scenes with cooperation with Technicians from Central Forensic Laboratory of Police.
Sheng, Lu; Ngan, King Ngi; Lim, Chern-Loon; Li, Songnan
In this paper, we propose a new method to online enhance the quality of a depth video based on the intermediary of a so-called static structure of the captured scene. The static and dynamic regions of the input depth frame are robustly separated by a layer assignment procedure, in which the dynamic part stays in the front while the static part fits and helps to update this structure by a novel online variational generative model with added spatial refinement. The dynamic content is enhanced spatially while the static region is otherwise substituted by the updated static structure so as to favor the long-range spatiotemporal enhancement. The proposed method both performs long-range temporal consistency on the static region and keeps necessary depth variations in the dynamic content. Thus, it can produce flicker-free and spatially optimized depth videos with reduced motion blur and depth distortion. Our experimental results reveal that the proposed method is effective in both static and dynamic indoor scenes and is compatible with depth videos captured by Kinect and time-of-flight camera. We also demonstrate that excellent performance can be achieved by the proposed method in comparison with the existing spatiotemporal approaches. In addition, our enhanced depth videos and static structures can act as effective cues to improve various applications, including depth-aided background subtraction and novel view synthesis, showing satisfactory results with few visual artifacts.
Full Text Available Obtaining a 3D description of man-made and natural environments is a basic task in Computer Vision and Remote Sensing. To this end, laser scanning is currently one of the dominating techniques to gather reliable 3D information. The scanning principle inherently needs a certain time interval to acquire the 3D point cloud. On the other hand, new active sensors provide the possibility of capturing range information by images with a single measurement. With this new technique image-based active ranging is possible which allows capturing dynamic scenes, e.g. like walking pedestrians in a yard or moving vehicles. Unfortunately most of these range imaging sensors have strong technical limitations and are not yet sufficient for airborne data acquisition. It can be seen from the recent development of highly specialized (far-range imaging sensors – so called flash-light lasers – that most of the limitations could be alleviated soon, so that future systems will be equipped with improved image size and potentially expanded operating range. The presented work is a first step towards the development of methods capable for application of range images in outdoor environments. To this end, an experimental setup was set up for investigating these proposed possibilities. With the experimental setup a measurement campaign was carried out and first results will be presented within this paper.
Repeated elements are ubiquitous and abundant in both manmade and natural scenes. Editing such images while preserving the repetitions and their relations is nontrivial due to overlap, missing parts, deformation across instances, illumination variation, etc. Manually enforcing such relations is laborious and error-prone. We propose a novel framework where user scribbles are used to guide detection and extraction of such repeated elements. Our detection process, which is based on a novel boundary band method, robustly extracts the repetitions along with their deformations. The algorithm only considers the shape of the elements, and ignores similarity based on color, texture, etc. We then use topological sorting to establish a partial depth ordering of overlapping repeated instances. Missing parts on occluded instances are completed using information from other instances. The extracted repeated instances can then be seamlessly edited and manipulated for a variety of high level tasks that are otherwise difficult to perform. We demonstrate the versatility of our framework on a large set of inputs of varying complexity, showing applications to image rearrangement, edit transfer, deformation propagation, and instance replacement. © 2010 ACM.
Heilbron, Fabian Caba
This paper describes a framework for recognizing human actions in videos by incorporating a new set of visual cues that represent the context of the action. We develop a weak foreground-background segmentation approach in order to robustly extract not only foreground features that are focused on the actors, but also global camera motion and contextual scene information. Using dense point trajectories, our approach separates and describes the foreground motion from the background, represents the appearance of the extracted static background, and encodes the global camera motion that interestingly is shown to be discriminative for certain action classes. Our experiments on four challenging benchmarks (HMDB51, Hollywood2, Olympic Sports, and UCF50) show that our contextual features enable a significant performance improvement over state-of-the-art algorithms.
Ren, Zhuo-Ming; Shi; Liao, Hao
Online popularity has a major impact on videos, music, news and other contexts in online systems. Characterizing online popularity dynamics is nature to explain the observed properties in terms of the already acquired popularity of each individual. In this paper, we provide a quantitative, large scale, temporal analysis of the popularity dynamics in two online video-provided websites, namely MovieLens and Netflix. The two collected data sets contain over 100 million records and even span...
Liu, Ruixu; Asari, Vijayan K.
A new methodology for 3D change detection which can support effective robot sensing and navigation in a reconstructed indoor environment is presented in this paper. We register the RGB-D images acquired with an untracked camera into a globally consistent and accurate point-cloud model. This paper introduces a robust system that detects camera position for multiple RGB video frames by using both photo-metric error and feature based method. It utilizes the iterative closest point (ICP) algorithm to establish geometric constraints between the point-cloud as they become aligned. For the change detection part, a bag-of-word (DBoW) model is used to match the current frame with the previous key frames based on RGB images with Oriented FAST and Rotated BRIEF (ORB) feature. Then combine the key-frame translation and ICP to align the current point-cloud with reconstructed 3D scene to localize the robot position. Meanwhile, camera position and orientation are used to aid robot navigation. After preprocessing the data, we create an Octomap Model to detect the scene change measurements. The experimental evaluations performed to evaluate the capability of our algorithm show that the robot's location and orientation are accurately determined and provide promising results for change detection indicating all the object changes with very limited false alarm rate.
Full Text Available In the IVF clinic - a place designed principally for the production and implantation of embryos - scientists and IVF recipients are faced with decisions regarding the disposition of frozen embryos. At this time there are hundred of thousands of cryopreserved embryos awaiting such determinations. They may be thawed for transfer to the woman herself, they may be donated for research or for use by other infertile couples, they may remain in frozen storage, or they may variously be discarded by being allowed to 'succumb', or 'perish'. Where the choice is discard, some IVF clients have chosen to formalise the process through ceremony. A new language is emerging in response to the desires of the would-be-parents who might wish to characterise the discard experience as a ‘good death’. This article examines the procedure known as ‘compassionate transfer’ where the embryo to be discarded is placed in the woman’s vagina where it is clear that it will not develop further. An alternate method has the embryo transferred in the usual manner but without the benefit of fertility-enhancing hormones at a point in the cycle unreceptive to implantation. The embryo destined for disposal is thus removed from the realm of technological possibility and ‘returned’ to the female body for a homely death. While debates continue about whether or not embryos constitute life, new practices are developing in response to the emotional experience of embryo discard. We argue that compassionate transfer is a death scene taking shape. In this article, we take the measure of this new death scene’s fabrication, and consider the form, significance, and legal complexity of its ceremonies.
Belonging to the wider academic field of computer vision, video analytics has aroused a phenomenal surge of interest since the current millennium. Video analytics is intended to solve the problem of the incapability of exploiting video streams in real time for the purpose of detection or anticipation. It involves analyzing the videos using algorithms that detect and track objects of interest over time and that indicate the presence of events or suspect behavior involving these objects.The aims of this book are to highlight the operational attempts of video analytics, to identify possi
There has been a phenomenal growth in video applications over the past few years. An accurate traffic model of Variable Bit Rate (VBR) video is necessary for performance evaluation of a network design and for generating synthetic traffic that can be used for benchmarking a network. A large number of models for VBR video traffic have been proposed in the literature for different types of video in the past 20 years. Here, the authors have classified and surveyed these models and have also evaluated the models for H.264 AVC and MVC encoded video and discussed their findings.
Recent development in video technology, such as the liquid crystal displays and shutters, have made it feasible to incorporate stereoscopic depth into the 3-D representations on 2-D displays. However, depth has already been vividly portrayed in video displays without stereopsis using the classical artists' depth cues described by Helmholtz (1866) and the dynamic depth cues described in detail by Ittleson (1952). Successful static depth cues include overlap, size, linear perspective, texture gradients, and shading. Effective dynamic cues include looming (Regan and Beverly, 1979) and motion parallax (Rogers and Graham, 1982). Stereoscopic depth is superior to the monocular distance cues under certain circumstances. It is most useful at portraying depth intervals as small as 5 to 10 arc secs. For this reason it is extremely useful in user-video interactions such as telepresence. Objects can be manipulated in 3-D space, for example, while a person who controls the operations views a virtual image of the manipulated object on a remote 2-D video display. Stereopsis also provides structure and form information in camouflaged surfaces such as tree foliage. Motion parallax also reveals form; however, without other monocular cues such as overlap, motion parallax can yield an ambiguous perception. For example, a turning sphere, portrayed as solid by parallax can appear to rotate either leftward or rightward. However, only one direction of rotation is perceived when stereo-depth is included. If the scene is static, then stereopsis is the principal cue for revealing the camouflaged surface structure. Finally, dynamic stereopsis provides information about the direction of motion in depth (Regan and Beverly, 1979). Clearly there are many spatial constraints, including spatial frequency content, retinal eccentricity, exposure duration, target spacing, and disparity gradient, which - when properly adjusted - can greatly enhance stereodepth in video displays.
Pasqualotto, Achille; Finucane, Ciara M; Newell, Fiona N
We investigated the effects of indirect, ambient visual information on haptic spatial memory. Using touch only, participants first learned an array of objects arranged in a scene and were subsequently tested on their recognition of that scene which was always hidden from view. During haptic scene exploration, participants could either see the surrounding room or were blindfolded. We found a benefit in haptic memory performance only when ambient visual information was available in the early stages of the task but not when participants were initially blindfolded. Specifically, when ambient visual information was available a benefit on performance was found in a subsequent block of trials during which the participant was blindfolded (Experiment 1), and persisted over a delay of one week (Experiment 2). However, we found that the benefit for ambient visual information did not transfer to a novel environment (Experiment 3). In Experiment 4 we further investigated the nature of the visual information that improved haptic memory and found that geometric information about a surrounding (virtual) room rather than isolated object landmarks, facilitated haptic scene memory. Our results suggest that vision improves haptic memory for scenes by providing an environment-centred, allocentric reference frame for representing object location through touch. Copyright © 2013 Elsevier B.V. All rights reserved.
The full-color guide to shooting great video with the Flip Video camera. The inexpensive Flip Video camera is currently one of the hottest must-have gadgets. It's portable and connects easily to any computer to transfer video you shoot onto your PC or Mac. Although the Flip Video camera comes with a quick-start guide, it lacks a how-to manual, and this full-color book fills that void! Packed with full-color screen shots throughout, Flip Video For Dummies shows you how to shoot the best possible footage in a variety of situations. You'll learn how to transfer video to your computer and then edi
Zhao, Fan; Yao, Zao; Song, XiaoFang; Yao, Yi
Image and video dehazing is a popular topic in the field of computer vision and digital image processing. A fast, optimized dehazing algorithm was recently proposed that enhances contrast and reduces flickering artifacts in a dehazed video sequence by minimizing a cost function that makes transmission values spatially and temporally coherent. However, its fixed-size block partitioning leads to block effects. Further, the weak edges in a hazy image are not addressed. Hence, a video dehazing algorithm based on customized spectral clustering is proposed. To avoid block artifacts, the spectral clustering is customized to segment static scenes to ensure the same target has the same transmission value. Assuming that dehazed edge images have richer detail than before restoration, an edge cost function is added to the ransmission model. The experimental results demonstrate that the proposed method provides higher dehazing quality and lower time complexity than the previous technique.
Cantey, Thomas M.; Bowden, Mark; Cosby, David; Ballard, Gary
This paper is a continuation of the merging of two dynamic infrared scene projector technologies to provide a unique and innovative solution for the simulation of high dynamic temperature ranges for testing infrared imaging sensors. This paper will present some of the challenges and performance issues encountered in implementing this unique projector system into a Hardware-in-the-Loop (HWIL) simulation facility. The projection system combines the technologies of a Honeywell BRITE II extended voltage range emissive resistor array device and an optically scanned laser diode array projector (LDAP). The high apparent temperature simulations are produced from the luminescent infrared radiation emitted by the high power laser diodes. The hybrid infrared projector system is being integrated into an existing HWIL simulation facility and is used to provide real-world high radiance imagery to an imaging infrared unit under test. The performance and operation of the projector is presented demonstrating the merit and success of the hybrid approach. The high dynamic range capability simulates a 250 Kelvin apparent background temperature to 850 Kelvin maximum apparent temperature signatures. This is a large increase in radiance projection over current infrared scene projection capabilities.
National Aeronautics and Space Administration — In response to the NASA need for a free-standing immersive virtual scene display system interfaced with an exercise treadmill to mimic terrestrial exercise...
Mudrik, Liad; Shalgi, Shani; Lamy, Dominique; Deouell, Leon Y
Whether contextual regularities facilitate perceptual stages of scene processing is widely debated, and empirical evidence is still inconclusive. Specifically, it was recently suggested that contextual violations affect early processing of a scene only when the incongruent object and the scene are presented a-synchronously, creating expectations. We compared event-related potentials (ERPs) evoked by scenes that depicted a person performing an action using either a congruent or an incongruent object (e.g., a man shaving with a razor or with a fork) when scene and object were presented simultaneously. We also explored the role of attention in contextual processing by using a pre-cue to direct subjects׳ attention towards or away from the congruent/incongruent object. Subjects׳ task was to determine how many hands the person in the picture used in order to perform the action. We replicated our previous findings of frontocentral negativity for incongruent scenes that started ~ 210 ms post stimulus presentation, even earlier than previously found. Surprisingly, this incongruency ERP effect was negatively correlated with the reaction times cost on incongruent scenes. The results did not allow us to draw conclusions about the role of attention in detecting the regularity, due to a weak attention manipulation. By replicating the 200-300 ms incongruity effect with a new group of subjects at even earlier latencies than previously reported, the results strengthen the evidence for contextual processing during this time window even when simultaneous presentation of the scene and object prevent the formation of prior expectations. We discuss possible methodological limitations that may account for previous failures to find this an effect, and conclude that contextual information affects object model selection processes prior to full object identification, with semantic knowledge activation stages unfolding only later on. Copyright © 2014 Elsevier Ltd. All rights reserved.
Mullally, Sinéad L.; Vargha-Khadem, Faraneh; Maguire, Eleanor A.
Amnesic patients with bilateral hippocampal damage sustained in adulthood are generally unable to construct scenes in their imagination. By contrast, patients with developmental amnesia (DA), where hippocampal damage was acquired early in life, have preserved performance on this task, although the reason for this sparing is unclear. One possibility is that residual function in remnant hippocampal tissue is sufficient to support basic scene construction in DA. Such a situation was found in the...
Dalrymple, Kirsten A; Birmingham, Elina; Bischof, Walter F; Barton, Jason J S; Kingstone, Alan
Simultanagnosia is a disorder of visual attention, defined as an inability to see more than one object at once. It has been conceived as being due to a constriction of the visual "window" of attention, a metaphor that we examine in the present article. A simultanagnosic patient (SL) and two non-simultanagnosic control patients (KC and ES) described social scenes while their eye movements were monitored. These data were compared to a group of healthy subjects who described the same scenes under the same conditions as the patients, or through an aperture that restricted their vision to a small portion of the scene. Experiment 1 demonstrated that SL showed unusually low proportions of fixations to the eyes in social scenes, which contrasted with all other participants who demonstrated the standard preferential bias toward eyes. Experiments 2 and 3 revealed that when healthy participants viewed scenes through a window that was contingent on where they looked (Experiment 2) or where they moved a computer mouse (Experiment 3), their behavior closely mirrored that of patient SL. These findings suggest that a constricted window of visual processing has important consequences for how simultanagnosic patients explore their world. Our paradigm's capacity to mimic simultanagnosic behaviors while viewing complex scenes implies that it may be a valid way of modeling simultanagnosia in healthy individuals, providing a useful tool for future research. More broadly, our results support the thesis that people fixate the eyes in social scenes because they are informative to the meaning of the scene. Copyright © 2010 Elsevier B.V. All rights reserved.
Joshi, Amit M.; Mishra, Vivekanand; Patrikar, R. M.
With the advent of technology, video has become a prominent entity that is shared over networks. With easy availability of various editing tools, data integrity and ownership issues have caused great concern worldwide. Video watermarking is an evolving field that may be used to address such issues. Till date, most of the algorithms have been developed for uncompressed domain watermarking and implemented on software platforms. They provide flexibility and simplicity, but at the same time, they are not suited for real-time applications. They work offline where videos are captured and then watermark is embedded in the video. In the present work, a hardware-based implementation of video watermarking is proposed that overcomes the limitation of software watermarking methods and can be readily adapted to the H.264 standard. This paper focuses on an invisible and robust video watermarking scheme, which can be easily implemented as an integral part of the standard H.264 encoder. The proposed watermarking algorithm involves Integer DCT-based watermark embedding method, wherein Integer DCT is calculated with a fully parallel approach resulting in better speed. The proposed video watermarking is designed with pipelining and parallel architecture for real-time implementation. Here, scene change detection technique is used to improve the performance. Different planes of the watermark are embedded in different frames of a particular scene in order to achieve robustness against various temporal attacks.
Full Text Available This paper is focused on two main topics: crime scene reconstruction, based on a geomatic approach, and crime scene analysis, through GIS based procedures. According to the experience of the authors in performing forensic analysis for real cases, the aforesaid topics will be examined with the specific goal of verifying the relationship of human walk paths at a crime scene with blood patterns on the floor. In order to perform such analyses, the availability of pictures taken by first aiders is mandatory, since they provide information about the crime scene before items are moved or interfered with. Generally, those pictures are affected by large geometric distortions, thus - after a brief description of the geomatic techniques suitable for the acquisition of reference data (total station surveying, photogrammetry and laser scanning - it will be shown the developed methodology, based on photogrammetric algorithms, aimed at calibrating, georeferencing and mosaicking the available images acquired on the scene. The crime scene analysis is based on a collection of GIS functionalities for simulating human walk movements and creating a statistically significant sample. The developed GIS software component will be described in detail, showing how the analysis of this statistical sample of simulated human walks allows to rigorously define the probability of performing a certain walk path without touching the bloodstains on the floor.
Marc Ciufo Green
Full Text Available The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. This is partly due to the lack thus far of a substantial body of spatial recordings of acoustic scenes. This paper formally introduces EigenScape, a new database of fourth-order Ambisonic recordings of eight different acoustic scene classes. The potential applications of a spatial machine listening system are discussed before detailed information on the recording process and dataset are provided. A baseline spatial classification system using directional audio coding (DirAC techniques is detailed and results from this classifier are presented. The classifier is shown to give good overall scene classification accuracy across the dataset, with 7 of 8 scenes being classified with an accuracy of greater than 60% with an 11% improvement in overall accuracy compared to use of Mel-frequency cepstral coefficient (MFCC features. Further analysis of the results shows potential improvements to the classifier. It is concluded that the results validate the new database and show that spatial features can characterise acoustic scenes and as such are worthy of further investigation.
Full Text Available User-generated video content has grown tremendously fast to the point of outpacing professional content creation. In this work we develop methods that analyze contextual information of multiple user-generated videos in order to obtain semantic information about public happenings (e.g., sport and live music events being recorded in these videos. One of the key contributions of this work is a joint utilization of different data modalities, including such captured by auxiliary sensors during the video recording performed by each user. In particular, we analyze GPS data, magnetometer data, accelerometer data, video- and audio-content data. We use these data modalities to infer information about the event being recorded, in terms of layout (e.g., stadium, genre, indoor versus outdoor scene, and the main area of interest of the event. Furthermore we propose a method that automatically identifies the optimal set of cameras to be used in a multicamera video production. Finally, we detect the camera users which fall within the field of view of other cameras recording at the same public happening. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real sport events and live music performances.
Full Text Available With the development of wireless network and the improvement of mobile device capability, video streaming is more and more widespread in such an environment. Under the condition of limited resource and inherent constraints, appropriate video adaptations have become one of the most important and challenging issues in wireless multimedia applications. In this paper, we propose a novel content-aware video adaptation in order to effectively utilize resource and improve visual perceptual quality. First, the attention model is derived from analyzing the characteristics of brightness, location, motion vector, and energy features in compressed domain to reduce computation complexity. Then, through the integration of attention model, capability of client device and correlational statistic model, attractive regions of video scenes are derived. The information object- (IOB- weighted rate distortion model is used for adjusting the bit allocation. Finally, the video adaptation scheme dynamically adjusts video bitstream in frame level and object level. Experimental results validate that the proposed scheme achieves better visual quality effectively and efficiently.
Davis, James W.
With the use of large video networks, there is a need to coordinate and interpret the video imagery for decision support systems with the goal of reducing the cognitive and perceptual overload of human operators. We present computer vision strategies that enable efficient control and management of cameras to effectively monitor wide-coverage areas, and examine the framework within an actual multi-camera outdoor urban video surveillance network. First, we construct a robust and precise camera control model for commercial pan-tilt-zoom (PTZ) video cameras. In addition to providing a complete functional control mapping for PTZ repositioning, the model can be used to generate wide-view spherical panoramic viewspaces for the cameras. Using the individual camera control models, we next individually map the spherical panoramic viewspace of each camera to a large aerial orthophotograph of the scene. The result provides a unified geo-referenced map representation to permit automatic (and manual) video control and exploitation of cameras in a coordinated manner. The combined framework provides new capabilities for video sensor networks that are of significance and benefit to the broad surveillance/security community.
Full Text Available This paper reports on the development of an automated embedded video surveillance system using two customized embedded RISC processors. The application is partitioned into object tracking and video stream encoding subsystems. The real-time object tracker is able to detect and track moving objects by video images of scenes taken by stationary cameras. It is based on the block-matching algorithm. The video stream encoding involves the optimization of an international telecommunications union (ITU-T H.263 baseline video encoder for quarter common intermediate format (QCIF and common intermediate format (CIF resolution images. The two subsystems running on two processor cores were integrated and a simple protocol was added to realize the automated video surveillance system. The experimental results show that the system is capable of detecting, tracking, and encoding QCIF and CIF resolution images with object movements in them in real-time. With low cycle-count, low-transistor count, and low-power consumption requirements, the system is ideal for deployment in remote locations.
Han, Doug Hyun; Bolo, Nicolas; Daniels, Melissa A; Arenella, Lynn; Lyoo, In Kyoon; Renshaw, Perry F
Recent studies have suggested that the brain circuitry mediating cue-induced desire for video games is similar to that elicited by cues related to drugs and alcohol. We hypothesized that desire for Internet video games during cue presentation would activate similar brain regions to those that have been linked with craving for drugs or pathologic gambling. This study involved the acquisition of diagnostic magnetic resonance imaging and functional magnetic resonance imaging data from 19 healthy male adults (age, 18-23 years) following training and a standardized 10-day period of game play with a specified novel Internet video game, "War Rock" (K2 Network, Irvine, CA). Using segments of videotape consisting of 5 contiguous 90-second segments of alternating resting, matched control, and video game-related scenes, desire to play the game was assessed using a 7-point visual analogue scale before and after presentation of the videotape. In responding to Internet video game stimuli, compared with neutral control stimuli, significantly greater activity was identified in left inferior frontal gyrus, left parahippocampal gyrus, right and left parietal lobe, right and left thalamus, and right cerebellum (false discovery rate Internet video game showed significantly greater activity in right medial frontal lobe, right and left frontal precentral gyrus, right parietal postcentral gyrus, right parahippocampal gyrus, and left parietal precuneus gyrus. Controlling for total game time, reported desire for the Internet video game in the subjects who played more Internet video game was positively correlated with activation in right medial frontal lobe and right parahippocampal gyrus. The present findings suggest that cue-induced activation to Internet video game stimuli may be similar to that observed during cue presentation in persons with substance dependence or pathologic gambling. In particular, cues appear to commonly elicit activity in the dorsolateral prefrontal, orbitofrontal
Adegoke Oloruntoba Adelufosi
Full Text Available The Nigerian home video industry, popularly known as Nollywood is a booming industry, with increasing numbers of easily accessible online videos. The aim of this study was to analyse the contents of popular Nigerian online videos to determine the prevalence of smoking imageries and their public health implications. Using specific search terms, popular English language and indigenous Yoruba language, Nigerian home videos uploaded on YouTube in 2013 were identified and sorted based on their view counts. Data on smoking related scenes such as smoking incidents, context of tobacco use, depiction of cigarette brand, gender of smokers and film rating were collected. Of the 60 online videos whose contents were assessed in this study, 26 (43.3% had scenes with cigarrete smoking imageries. The mean (SD smoking incident was 2.7 (1.6, giving an average of one smoking incident for every 26 to 27 min of film. More than half (53.8% of the films with tobacco use had high smoking imageries. An average of 2 characters per film smoked, mostly in association with acts of criminality or prostitution (57.7% and alcohol use (57.7%. There were scenes of the main protagonists smoking in 73.1% of the films with scenes of female protagonists smoking (78.9% more than the male protagonists (21.1%. Smoking imageries are common in popular Nigerian online movies. Given the wide reach of online videos, their potential to be viewed by people from different cultures and to negatively influence youngsters, it is important that smoking portrayals in online movies are controlled.
Tanabe-Ishibashi, Azumi; Ikeda, Takashi; Osaka, Naoyuki
Many people have experienced the inability to recognize a familiar face in a changed context, a phenomenon known as the “butcher-on-the-bus” effect. Whether this context effect is a facilitation of memory by old contexts or a disturbance of memory by novel contexts is of great debate. Here, we investigated how two types of contextual information associated with target faces influence the recognition performance of the faces using meaningful (scene) or meaningless (scrambled scene) backgrounds...
Azumi eTanabe-Ishibashi; Takashi eIkeda; Naoyuki eOsaka
Many people have experienced the inability to recognize a familiar face in a changed context, a phenomenon known as the butcher-on-the-bus effect. Whether this context effect is a facilitation of memory by old contexts or a disturbance of memory by novel contexts is of great debate. Here, we investigated how two types of contextual information associated with target faces influence the recognition performance of the faces using meaningful (scene) or meaningless (scrambled scene) backgrounds. ...
Bulbul, Halil Ibrahim; Yavuzcan, H Guclu; Ozel, Mesut
In order to ensure that digital evidence is collected, preserved, examined, or transferred in a manner safeguarding the accuracy and reliability of the evidence, law enforcement and digital forensic units must establish and maintain an effective quality assurance system. The very first part of this system is standard operating procedures (SOP's) and/or models, conforming chain of custody requirements, those rely on digital forensics "process-phase-procedure-task-subtask" sequence. An acceptable and thorough Digital Forensics (DF) process depends on the sequential DF phases, and each phase depends on sequential DF procedures, respectively each procedure depends on tasks and subtasks. There are numerous amounts of DF Process Models that define DF phases in the literature, but no DF model that defines the phase-based sequential procedures for crime scene identified. An analytical crime scene procedure model (ACSPM) that we suggest in this paper is supposed to fill in this gap. The proposed analytical procedure model for digital investigations at a crime scene is developed and defined for crime scene practitioners; with main focus on crime scene digital forensic procedures, other than that of whole digital investigation process and phases that ends up in a court. When reviewing the relevant literature and interrogating with the law enforcement agencies, only device based charts specific to a particular device and/or more general perspective approaches to digital evidence management models from crime scene to courts are found. After analyzing the needs of law enforcement organizations and realizing the absence of crime scene digital investigation procedure model for crime scene activities we decided to inspect the relevant literature in an analytical way. The outcome of this inspection is our suggested model explained here, which is supposed to provide guidance for thorough and secure implementation of digital forensic procedures at a crime scene. In digital forensic
Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A
.... Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech...
Tanabe-Ishibashi, Azumi; Ikeda, Takashi; Osaka, Naoyuki
Many people have experienced the inability to recognize a familiar face in a changed context, a phenomenon known as the "butcher-on-the-bus" effect. Whether this context effect is a facilitation of memory by old contexts or a disturbance of memory by novel contexts is of great debate. Here, we investigated how two types of contextual information associated with target faces influence the recognition performance of the faces using meaningful (scene) or meaningless (scrambled scene) backgrounds. The results showed two different effects of contexts: (1) disturbance on face recognition by changes of scene backgrounds and (2) weak facilitation of face recognition by the re-presentation of the same backgrounds, be it scene or scrambled. The results indicate that the facilitation and disturbance of context effects are actually caused by two different subcomponents of the background information: semantic information available from scene backgrounds and visual array information commonly included in a scene and its scrambled picture. This view suggests visual working memory system can control such context information, so that it switches the way to deal with the contexts information; inhibiting it as a distracter or activating it as a cue for recognizing the current target.
Full Text Available Many people have experienced the inability to recognize a familiar face in a changed context, a phenomenon known as the butcher-on-the-bus effect. Whether this context effect is a facilitation of memory by old contexts or a disturbance of memory by novel contexts is of great debate. Here, we investigated how two types of contextual information associated with target faces influence the recognition performance of the faces using meaningful (scene or meaningless (scrambled scene backgrounds. The results showed two different effects of contexts: (1 disturbance on face recognition by changes of scene backgrounds and (2 weak facilitation of face recognition by the re-presentation of the same backgrounds, be it scene or scrambled. The results indicate that the facilitation and disturbance of context effects are actually caused by different two subcomponents of the background information: semantic information available from scene backgrounds and visual-array information commonly included in a scene and its scrambled picture. This view suggests visual working memory system can control such context information, so that it switches the way to deal with the contexts information; inhibiting it as a distracter or activating it as a cue for recognizing the current target.
Heide Smith, Jonas; Tosca, Susana Pajares; Egenfeldt-Nielsen, Simon
From Pong to PlayStation 3 and beyond, Understanding Video Games is the first general introduction to the exciting new field of video game studies. This textbook traces the history of video games, introduces the major theories used to analyze games such as ludology and narratology, reviews...... the economics of the game industry, examines the aesthetics of game design, surveys the broad range of game genres, explores player culture, and addresses the major debates surrounding the medium, from educational benefits to the effects of violence. Throughout the book, the authors ask readers to consider...... larger questions about the medium: * What defines a video game? * Who plays games? * Why do we play games? * How do games affect the player? Extensively illustrated, Understanding Video Games is an indispensable and comprehensive resource for those interested in the ways video games are reshaping...
Henningsen, Birgitte; Gundersen, Peter Bukovica; Hautopp, Heidi
This paper introduces to what we define as a collaborative video sketching process. This process links various sketching techniques with digital storytelling approaches and creative reflection processes in video productions. Traditionally, sketching has been used by designers across various...... forms and through empirical examples, we present and discuss the video recording of sketching sessions, as well as development of video sketches by rethinking, redoing and editing the recorded sessions. The empirical data is based on workshop sessions with researchers and students from universities...... and university colleges and primary and secondary school teachers. As researchers, we have had different roles in these action research case studies where various video sketching techniques were applied.The analysis illustrates that video sketching can take many forms, and two common features are important...
Full Text Available As academics we study, research and teach audiovisual media, yet rarely disseminate and mediate through it. Today, developments in production technologies have enabled academic researchers to create videos and mediate audiovisually. In academia it is taken for granted that everyone can write a text. Is it now time to assume that everyone can make a video essay? Using the online journal of academic videos Audiovisual Thinking and the videos published in it as a case study, this article seeks to reflect on the emergence and legacy of academic audiovisual dissemination. Anchoring academic video and audiovisual dissemination of knowledge in two critical traditions, documentary theory and semiotics, we will argue that academic video is in fact already present in a variety of academic disciplines, and that academic audiovisual essays are bringing trends and developments that have long been part of academic discourse to their logical conclusion.
Joongheon Kim; Eun-Seok Ryu
This paper presents the quality analysis results of high-definition video streaming in two-tiered camera sensor network applications. In the camera-sensing system, multiple cameras sense visual scenes in their target fields and transmit the video streams via IEEE 802.15.3c multigigabit wireless links. However, the wireless transmission introduces interferences to the other links. This paper analyzes the capacity degradation due to the interference impacts from the camera-sensing nodes to the ...
Ismail Amin Ali
Full Text Available A compressed video bitstream can be partitioned according to the coding priority of the data, allowing prioritized wireless communication or selective dropping in a congested channel. Known as data partitioning in the H.264/Advanced Video Coding (AVC codec, this paper introduces a further sub-partition of one of the H.264/AVC codec’s three data-partitions. Results show a 5 dB improvement in Peak Signal-to-Noise Ratio (PSNR through this innovation. In particular, the data partition containing intra-coded residuals is sub-divided into data from: those macroblocks (MBs naturally intra-coded, and those MBs forcibly inserted for non-periodic intra-refresh. Interactive user-to-user video streaming can benefit, as then HTTP adaptive streaming is inappropriate and the High Efficiency Video Coding (HEVC codec is too energy demanding.
Khosla, Deepak; Moore, Christopher K.; Chelian, Suhas
This paper presents a bio-inspired method for spatio-temporal recognition in static and video imagery. It builds upon and extends our previous work on a bio-inspired Visual Attention and object Recognition System (VARS). The VARS approach locates and recognizes objects in a single frame. This work presents two extensions of VARS. The first extension is a Scene Recognition Engine (SCE) that learns to recognize spatial relationships between objects that compose a particular scene category in static imagery. This could be used for recognizing the category of a scene, e.g., office vs. kitchen scene. The second extension is the Event Recognition Engine (ERE) that recognizes spatio-temporal sequences or events in sequences. This extension uses a working memory model to recognize events and behaviors in video imagery by maintaining and recognizing ordered spatio-temporal sequences. The working memory model is based on an ARTSTORE1 neural network that combines an ART-based neural network with a cascade of sustained temporal order recurrent (STORE)1 neural networks. A series of Default ARTMAP classifiers ascribes event labels to these sequences. Our preliminary studies have shown that this extension is robust to variations in an object's motion profile. We evaluated the performance of the SCE and ERE on real datasets. The SCE module was tested on a visual scene classification task using the LabelMe2 dataset. The ERE was tested on real world video footage of vehicles and pedestrians in a street scene. Our system is able to recognize the events in this footage involving vehicles and pedestrians.
Achieve professional quality sound on a limited budget! Harness all new, Hollywood style audio techniques to bring your independent film and video productions to the next level.In Sound for Digital Video, Second Edition industry experts Tomlinson Holman and Arthur Baum give you the tools and knowledge to apply recent advances in audio capture, video recording, editing workflow, and mixing to your own film or video with stunning results. This fresh edition is chockfull of techniques, tricks, and workflow secrets that you can apply to your own projects from preproduction
The Green Power Partnership develops videos on a regular basis that explore a variety of topics including, Green Power partnership, green power purchasing, Renewable energy certificates, among others.
José Miguel Garrido Miranda
Full Text Available In order to investigate the reasons that motivate students to play with strategy video games, an analysis of the observed discourse and practices of fifteen Chilean high school students during collective gaming sessions was conducted. By means of an ethno-methodological analysis, we preceded to identify and saturate emerging categories to determine the interests that impel these students to play. The findings, seen from a pedagogical perspective, suggest that the feeling of being part of a scene, solving increasingly complex situations and positively assessing the uncertainty produced by interaction with this type of environment, can become guiding elements for improving the design of teaching situations supported by the use of digital technologies in the classroom.
7th European Conference on Computer Vision; 2002; Copenhagen , Denmark. p. 148–162. 36. Lalonde J-F, Efros AA, Narasimhan SG. Estimating the natural...illumination conditions from a single outdoor image. Intl J Computer Vis. 2012;98:123– 145. 37. Xie L. Geographic and environmental interpretation
He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.
He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.
We propose a method to detect disocclusion in video sequences of three-dimensional scenes and to partition the disoccluded regions into objects, defined by coherent deformation corresponding to surfaces in the scene. Our method infers deformation fields that are piecewise smooth by construction without the need for an explicit regularizer and the associated choice of weight. It then partitions the disoccluded region and groups its components with objects by leveraging on the complementarity of motion and appearance cues: Where appearance changes within an object, motion can usually be reliably inferred and used for grouping. Where appearance is close to constant, it can be used for grouping directly. We integrate both cues in an energy minimization framework, incorporate prior assumptions explicitly into the energy, and propose a numerical scheme. © 2015 IEEE.
Yu, Litao; Yang, Yang; Huang, Zi; Wang, Peng; Song, Jingkuan; Shen, Heng Tao
In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyze video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of Web video event recognition, where Web videos often describe large-granular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from Web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous Web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model, which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video data sets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of Web video event recognition.
Full Text Available Radio wave propagation scene partitioning is necessary for wireless channel modeling. As far as we know, there are no standards of scene partitioning for high-speed rail (HSR scenarios, and therefore we propose the radio wave propagation scene partitioning scheme for HSR scenarios in this paper. Based on our measurements along the Wuhan-Guangzhou HSR, Zhengzhou-Xian passenger-dedicated line, Shijiazhuang-Taiyuan passenger-dedicated line, and Beijing-Tianjin intercity line in China, whose operation speeds are above 300 km/h, and based on the investigations on Beijing South Railway Station, Zhengzhou Railway Station, Wuhan Railway Station, Changsha Railway Station, Xian North Railway Station, Shijiazhuang North Railway Station, Taiyuan Railway Station, and Tianjin Railway Station, we obtain an overview of HSR propagation channels and record many valuable measurement data for HSR scenarios. On the basis of these measurements and investigations, we partitioned the HSR scene into twelve scenarios. Further work on theoretical analysis based on radio wave propagation mechanisms, such as reflection and diffraction, may lead us to develop the standard of radio wave propagation scene partitioning for HSR. Our work can also be used as a basis for the wireless channel modeling and the selection of some key techniques for HSR systems.
Sun, Dandan; Gao, Jiaobo; Sun, Kefeng; Hu, Yu; Li, Yu; Xie, Junhu; Zhang, Lei
This paper presents a simulation method of hyper-spectral dynamic scene and image sequence for hyper-spectral equipment evaluation and target detection algorithm. Because of high spectral resolution, strong band continuity, anti-interference and other advantages, in recent years, hyper-spectral imaging technology has been rapidly developed and is widely used in many areas such as optoelectronic target detection, military defense and remote sensing systems. Digital imaging simulation, as a crucial part of hardware in loop simulation, can be applied to testing and evaluation hyper-spectral imaging equipment with lower development cost and shorter development period. Meanwhile, visual simulation can produce a lot of original image data under various conditions for hyper-spectral image feature extraction and classification algorithm. Based on radiation physic model and material characteristic parameters this paper proposes a generation method of digital scene. By building multiple sensor models under different bands and different bandwidths, hyper-spectral scenes in visible, MWIR, LWIR band, with spectral resolution 0.01μm, 0.05μm and 0.1μm have been simulated in this paper. The final dynamic scenes have high real-time and realistic, with frequency up to 100 HZ. By means of saving all the scene gray data in the same viewpoint image sequence is obtained. The analysis results show whether in the infrared band or the visible band, the grayscale variations of simulated hyper-spectral images are consistent with the theoretical analysis results.
Cheng, Hui; Butler, Darren
Aerial surveillance has long been used by the military to locate, monitor and track the enemy. Recently, its scope has expanded to include law enforcement activities, disaster management and commercial applications. With the ever-growing amount of aerial surveillance video acquired daily, there is an urgent need for extracting actionable intelligence in a timely manner. Furthermore, to support high-level video understanding, this analysis needs to go beyond current approaches and consider the relationships, motivations and intentions of the objects in the scene. In this paper we propose a system for interpreting aerial surveillance videos that automatically generates a succinct but meaningful description of the observed regions, objects and events. For a given video, the semantics of important regions and objects, and the relationships between them, are summarised into a semantic concept graph. From this, a textual description is derived that provides new search and indexing options for aerial video and enables the fusion of aerial video with other information modalities, such as human intelligence, reports and signal intelligence. Using a Mixture-of-Experts video segmentation algorithm an aerial video is first decomposed into regions and objects with predefined semantic meanings. The objects are then tracked and coerced into a semantic concept graph and the graph is summarized spatially, temporally and semantically using ontology guided sub-graph matching and re-writing. The system exploits domain specific knowledge and uses a reasoning engine to verify and correct the classes, identities and semantic relationships between the objects. This approach is advantageous because misclassifications lead to knowledge contradictions and hence they can be easily detected and intelligently corrected. In addition, the graph representation highlights events and anomalies that a low-level analysis would overlook.
Han, Doug Hyun; Bolo, Nicolas; Daniels, Melissa A.; Arenella, Lynn; Lyoo, In Kyoon; Renshaw, Perry F.
Objective Recent studies have suggested that the brain circuitry mediating cue induced desire for video games is similar to that elicited by cues related to drugs and alcohol. We hypothesized that desire for internet video games during cue presentation would activate similar brain regions to those which have been linked with craving for drugs or pathological gambling. Methods This study involved the acquisition of diagnostic MRI and fMRI data from 19 healthy male adults (ages 18–23 years) following training and a standardized 10-day period of game play with a specified novel internet video game, “War Rock” (K-network®). Using segments of videotape consisting of five contiguous 90-second segments of alternating resting, matched control and video game-related scenes, desire to play the game was assessed using a seven point visual analogue scale before and after presentation of the videotape. Results In responding to internet video game stimuli, compared to neutral control stimuli, significantly greater activity was identified in left inferior frontal gyrus, left parahippocampal gyrus, right and left parietal lobe, right and left thalamus, and right cerebellum (FDR video game (MIGP) cohort showed significantly greater activity in right medial frontal lobe, right and left frontal pre-central gyrus, right parietal post-central gyrus, right parahippocampal gyrus, and left parietal precuneus gyrus. Controlling for total game time, reported desire for the internet video game in the MIGP cohort was positively correlated with activation in right medial frontal lobe and right parahippocampal gyrus. Discussion The present findings suggest that cue-induced activation to internet video game stimuli may be similar to that observed during cue presentation in persons with substance dependence or pathological gambling. In particular, cues appear to commonly elicit activity in the dorsolateral prefrontal, orbitofrontal cortex, parahippocampal gyrus, and thalamus. PMID:21220070
Provides an extensive overview of related work in the three different directions of scene generation and introduces specific solutions in detailIdeal reference for any reader involved in generating training scenarios, as well as in VR-based training in generalDiscusses theoeretically unlimited automatic generation of healthy anatomy within natural variability allowing tedious and time-intensive manual segmentation to be avoidedPresents high-quality synthesis of new textures based on samples and automatic mapping to complex geometries enabling the drawing and mapping of textures to 3D models to
Full Text Available ... a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer Support Program Community Connections Overview ... group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork Peer Support Program ...
van der Meij, Hans
This study investigates the effectiveness of a video tutorial for software training whose construction was based on a combination of insights from multimedia learning and Demonstration-Based Training. In the videos, a model of task performance was enhanced with instructional features that were
Monica Adams, head librarian at Robinson Secondary in Fairfax country, Virginia, states that librarians should have the technical knowledge to support projects related to digital video editing. The process of digital video editing and the cables, storage issues and the computer system with software is described.
Live drawing video experimenting with low tech techniques in the field of sketching and visual sense making. In collaboration with Rune Wehner and Teater Katapult.......Live drawing video experimenting with low tech techniques in the field of sketching and visual sense making. In collaboration with Rune Wehner and Teater Katapult....
Online videos are an increasingly important way technology is contributing to the improvement of physics teaching. Students and teachers have begun to rely on online videos to provide them with content knowledge and instructional strategies. Online audiences are expecting greater production value, and departments are sometimes requesting educators…
Chernyshov Alexander V.
Full Text Available The article focuses on the origins of the song videos as TV and Internet-genre. In addition, it considers problems of screen images creation depending on the musical form and the text of a songs in connection with relevant principles of accent and phraseological video editing and filming techniques as well as with additional frames and sound elements.
Full Text Available ... support group for me? Find a Group Upcoming Events Video Library Photo Gallery One-on-One Support ANetwork Peer ... group for me? Find a group Back Upcoming events Video Library Photo Gallery One-on-One Support Back ANetwork ...
Ørngreen, Rikke; Henningsen, Birgitte Sølbeck; Louw, Arnt Vestergaard
agenda focusing on video productions in combination with digital storytelling, followed by a presentation of the digital storytelling features. The paper concludes with a suggestion to initiate research in what is identified as Personal Digital Video (PDV) Stories within longitudinal settings, while...
Provenzo, Eugene F., Jr.
Video games are neither neutral nor harmless but represent very specific social and symbolic constructs. Research on the social content of today's video games reveals that sex bias and gender stereotyping are widely evident throughout the Nintendo games. Violence and aggression also pervade the great majority of the games. (MLF)
Voronin, V. V.; Marchuk, V. I.; Gapon, N. V.; Zhuravlev, A. V.; Maslennikov, S.; Stradanchenko, S.
This paper describes a novel inpainting approach for removing marked dynamic objects from videos captured with a camera, so long as the objects occlude parts of the scene with a static background. Proposed approach allow to remove objects or restore missing or tainted regions present in a video sequence by utilizing spatial and temporal information from neighboring scenes. The algorithm iteratively performs following operations: achieve frame; update the scene model; update positions of moving objects; replace parts of the frame occupied by the objects marked for remove with use of a background model. In this paper, we extend an image inpainting algorithm based texture and structure reconstruction by incorporating an improved strategy for video. An image inpainting approach based on the construction of a composite curve for the restoration of the edges of objects in a frame using the concepts of parametric and geometric continuity is presented. It is shown that this approach allows to restore the curved edges and provide more flexibility for curve design in damaged frame by interpolating the boundaries of objects by cubic splines. After edge restoration stage, a texture reconstruction using patch-based method is carried out. We demonstrate the performance of a new approach via several examples, showing the effectiveness of our algorithm and compared with state-of-the-art video inpainting methods.
Full Text Available ... Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos ... member of our patient care team. Managing Your Arthritis Managing Your Arthritis Managing Chronic Pain and Depression ...
Full Text Available ... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos was designed to help you learn more about Rheumatoid Arthritis (RA). You will learn how the diagnosis of ...
... Corner / Patient Webcasts / Rheumatoid Arthritis Educational Video Series Rheumatoid Arthritis Educational Video Series This series of five videos was designed to help you learn more about Rheumatoid Arthritis (RA). You will learn how the diagnosis of ...
Full Text Available ... questions Clinical Studies Publications Catalog Photos and Images Spanish Language Information Grants and Funding Extramural Research Division ... Low Vision Refractive Errors Retinopathy of Prematurity Science Spanish Videos Webinars NEI YouTube Videos: Amblyopia Embedded video ...
Boyaci, Huseyin; Doerschner, Katja; Snyder, Jacqueline L; Maloney, Laurence T
Researchers studying surface color perception have typically used stimuli that consist of a small number of matte patches (real or simulated) embedded in a plane perpendicular to the line of sight (a "Mondrian," Land & McCann, 1971). Reliable estimation of the color of a matte surface is a difficult if not impossible computational problem in such limited scenes (Maloney, 1999). In more realistic, three-dimensional scenes the difficulty of the problem increases, in part, because the effective illumination incident on the surface (the light field) now depends on surface orientation and location. We review recent work in multiple laboratories that examines (1) the degree to which the human visual system discounts the light field in judging matte surface lightness and color and (2) what illuminant cues the visual system uses in estimating the flow of light in a scene.
Baum, Bryan A.; Trepte, Qing
The authors propose a grouped threshold method for scene identification in Advanced Very High Resolution Radiometer imagery that may contain clouds, fire, smoke, or snow. The philosophy of the approach is to build modules that contain groups of spectral threshold tests that are applied concurrently, not sequentially, to each pixel in an image. The purpose of each group of tests is to identify uniquely a specific class in the image, such as smoke. A strength of this approach is that insight into the limits used in the threshold tests may be gained through the use of radiative transfer theory. Methodology and examples are provided for two different scenes, one containing clouds, forest fires, and smoke; and the other containing clouds over snow in the central United States. For both scenes, a limited amount of supporting information is provided by surface observers.
Full Text Available The method of generating the SAR raw data of complex airport scenes is studied in this paper. A formulation of the SAR raw signal model of airport scenes is given. Via generating the echoes from the background, aircrafts and buildings, respectively, the SAR raw data of the unified SAR imaging geometry is obtained from their vector additions. The multipath scattering and the shadowing between the background and different ground covers of standing airplanes and buildings are analyzed. Based on the scattering characteristics, coupling scattering models and SAR raw data models of different targets are given, respectively. A procedure is given to generate the SAR raw data of airport scenes. The SAR images from the simulated raw data demonstrate the validity of the proposed method.
Wang, Zhi; Zhu, Wenwu
This brief presents new architecture and strategies for distribution of social video content. A primary framework for socially-aware video delivery and a thorough overview of the possible approaches is provided. The book identifies the unique characteristics of socially-aware video access and social content propagation, revealing the design and integration of individual modules that are aimed at enhancing user experience in the social network context. The change in video content generation, propagation, and consumption for online social networks, has significantly challenged the traditional video delivery paradigm. Given the massive amount of user-generated content shared in online social networks, users are now engaged as active participants in the social ecosystem rather than as passive receivers of media content. This revolution is being driven further by the deep penetration of 3G/4G wireless networks and smart mobile devices that are seamlessly integrated with online social networking and media-sharing s...
Parraman, Carinna; Rizzi, Alessandro; McCann, John J.
In order to gain a deeper understanding of the appearance of coloured objects in a three-dimensional scene, the research introduces a multidisciplinary experimental approach. The experiment employed two identical 3-D Mondrians, which were viewed and compared side by side. Each scene was subjected to different lighting conditions. First, we used an illumination cube to diffuse the light and illuminate all the objects from each direction. This produced a low-dynamicrange (LDR) image of the 3-D Mondrian scene. Second, in order to make a high-dynamic range (HDR) image of the same objects, we used a directional 150W spotlight and an array of WLEDs assembled in a flashlight. The scenes were significant as each contained exactly the same three-dimensional painted colour blocks that were arranged in the same position in the still life. The blocks comprised 6 hue colours and 5 tones from white to black. Participants from the CREATE project were asked to consider the change in the appearance of a selection of colours according to lightness, hue, and chroma, and to rate how the change in illumination affected appearance. We measured the light coming to the eye from still-life surfaces with a colorimeter (Yxy). We captured the scene radiance using multiple exposures with a number of different cameras. We have begun a programme of digital image processing of these scene capture methods. This multi-disciplinary programme continues until 2010, so this paper is an interim report on the initial phases and a description of the ongoing project.
Hasan, Taufiq; Bořil, Hynek; Sangwan, Abhijeet; L Hansen, John H.
The ability to detect and organize `hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on the likelihood of the segmental features residing in certain regions of their joint probability density function space which are considered both exciting and rare. The proposed measure is used to rank order the partitioned segments to compress the overall video sequence and produce a contiguous set of highlights. Experiments are performed on baseball videos based on signal processing advancements for excitement assessment in the commentators' speech, audio energy, slow motion replay, scene cut density, and motion activity as features. Detailed analysis on correlation between user excitability and various speech production parameters is conducted and an effective scheme is designed to estimate the excitement level of commentator's speech from the sports videos. Subjective evaluation of excitability and ranking of video segments demonstrate a higher correlation with the proposed measure compared to well-established techniques indicating the effectiveness of the overall approach.
Mani, Lara; Cole, Paul; Stewart, Iain
Educational outreach plays a vital role in improving the resilience of vulnerable populations at risk from natural disasters. Currently, that activity is undertaken in many guises including the distribution of leaflets and posters, maps, presentations, education sessions and through radio and TV broadcasts. Such tried-and-tested communication modes generally target traditional stakeholder groups, but it is becoming increasingly important to engage with the new generation of learners who, due to advancements in technology, obtain information in ways different to their predecessors. That new generation is defined by a technological way of life and it remains a challenge to keep them motivated. On the eastern Caribbean island of St. Vincent, the La Soufriere Volcano lies in quiescence since the last eruption in 1979. Since then, an entire generation - over 56% of the population (Worldbank, 2015) - has little or no direct experience of a volcanic eruption. The island experiences, more frequently, other hazards (hurricanes, flooding, earthquakes landsliding), such that disaster preparedness measures give less priority to volcanic threats, which are deemed to pose less of a risk. With no accurate predictions to warn of the next eruption, it is especially important to educate residents about the potential of future volcanic hazards on the island, and to motivate them to prepare to mitigate their risk. This research critically examines the application of video games in supporting and enhancing existing public education and outreach programmes for volcanic hazards. St. Vincent's Volcano is a computer game designed to improve awareness and knowledge of the eruptive phenomena from La Soufriere that could pose a threat to residents. Within an interactive and immersive environment, players become acquainted with a 3D model of St. Vincent together with an overlay of the established volcanic hazard map (Robertson, 2005). Players are able to view visualisations of two historical
Popa, Teo; Choi, Jae
In this paper, we present the design and implementation of a new rendering method based on high dynamic range (HDR) lighting and exposure control. This rendering method is applied to create video images for a 3D virtual bronchoscopy system. One of the main optical parameters of a bronchoscope's camera is the sensor exposure. The exposure adjustment is needed since the dynamic range of most digital video cameras is narrower than the high dynamic range of real scenes. The dynamic range of a camera is defined as the ratio of the brightest point of an image to the darkest point of the same image where details are present. In a video camera exposure is controlled by shutter speed and the lens aperture. To create the virtual bronchoscopic images, we first rendered a raw image in absolute units (luminance); then, we simulated exposure by mapping the computed values to the values appropriate for video-acquired images using a tone mapping operator. We generated several images with HDR and others with low dynamic range (LDR), and then compared their quality by applying them to a 2D/3D video-based tracking system. We conclude that images with HDR are closer to real bronchoscopy images than those with LDR, and thus, that HDR lighting can improve the accuracy of image-based tracking.
Thompson, Kevin P.; Kircher, James R.; Marlow, Steven A.; Korniski, Ronald J.; Richwine, Robert A.
An all acousto-optic infrared scene projector (IRSP) has been developed for use in evaluating thermal-imaging guidance systems at the Kinetic Kill Vehicle Hardware-in-the-Loop Simulator (KHILS) facility located at Elgin AFB, Florida. The IRSP is a laser source based projector incorporating Scophony illumination and scanning methods to produce 96 X 96 pixel multi-wavelength images at very high frame rates (400 Hz). The IRSP is composed of five functionally similar optical trains, four of which are fed with a different `color' infrared laser. The separate scenes from each optical train are then combined and projected simultaneously into the imaging guidance system.
Larsen, Kasper Bro
Recognizing the Stranger is the first monographic study of recognition scenes and motifs in the Gospel of John. The recognition type-scene (anagnōrisis) was a common feature in ancient drama and narrative, highly valued by Aristotle as a touching moment of truth, e.g., in Oedipus’ tragic self......-discovery and Odysseus’ happy homecoming. The book offers a reconstruction of the conventions of the genre and argues that it is one of the most recurrent and significant literary forms in the Gospel. When portraying Jesus as the divine stranger from heaven, the Gospel employs and transforms the formal and ideological...
Rodner, Erik; Denzler, Joachim
The concept of probabilistic Latent Semantic Analysis (pLSA) has gained much interest as a tool for feature transformation in image categorization and scene recognition scenarios. However, a major issue of this technique is overfitting. Therefore, we propose to use an ensemble of pLSA models which are trained using random fractions of the training data. We analyze empirically the influence of the degree of randomization and the size of the ensemble on the overall classification performance of a scene recognition task. A thoughtful evaluation shows the benefits of this approach compared to a single pLSA model.
Puvvada, Krishna C; Simon, Jonathan Z
The ability to parse a complex auditory scene into perceptual objects is facilitated by a hierarchical auditory system. Successive stages in the hierarchy transform an auditory scene of multiple overlapping sources, from peripheral tonotopically based representations in the auditory nerve, into perceptually distinct auditory-object-based representations in the auditory cortex. Here, using magnetoencephalography recordings from men and women, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in distinct hierarchical stages of the auditory cortex. Using systems-theoretic methods of stimulus reconstruction, we show that the primary-like areas in the auditory cortex contain dominantly spectrotemporal-based representations of the entire auditory scene. Here, both attended and ignored speech streams are represented with almost equal fidelity, and a global representation of the full auditory scene with all its streams is a better candidate neural representation than that of individual streams being represented separately. We also show that higher-order auditory cortical areas, by contrast, represent the attended stream separately and with significantly higher fidelity than unattended streams. Furthermore, the unattended background streams are more faithfully represented as a single unsegregated background object rather than as separated objects. Together, these findings demonstrate the progression of the representations and processing of a complex acoustic scene up through the hierarchy of the human auditory cortex.SIGNIFICANCE STATEMENT Using magnetoencephalography recordings from human listeners in a simulated cocktail party environment, we investigate how a complex acoustic scene consisting of multiple speech sources is represented in separate hierarchical stages of the auditory cortex. We show that the primary-like areas in the auditory cortex use a dominantly spectrotemporal-based representation of the entire auditory
Pinard, C.; Chevalley, L.; Manzanera, A.; Filliat, D.
We propose a depth map inference system from monocular videos based on a novel dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Unlike most navigation datasets, the lack of rotation implies an easier structure from motion problem which can be leveraged for different kinds of tasks such as depth inference and obstacle avoidance. We also propose an architecture for end-to-end depth inference with a fully convolutional network. Results show that although tied to camera inner parameters, the problem is locally solvable and leads to good quality depth prediction.
Current vision systems are designed to perform in normal weather condition. However, no one can escape from severe weather conditions. Bad weather reduces scene contrast and visibility, which results in degradation in the performance of various computer vision algorithms such as object tracking, segmentation and recognition. Thus, current vision systems must include some mechanisms that enable them to perform up to the mark in bad weather conditions such as rain and fog. Rain causes the spatial and temporal intensity variations in images or video frames. These intensity changes are due to the
CERN video productions
"What's new @ CERN?", a new monthly video programme, will be broadcast on the Monday of every month on webcast.cern.ch. Aimed at the general public, the programme will cover the latest CERN news, with guests and explanatory features. Tune in on Monday 3 October at 4 pm (CET) to see the programme in English, and then at 4:20 pm (CET) for the French version. var flash_video_player=get_video_player_path(); insert_player_for_external('Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0753-kbps-640x360-25-fps-audio-64-kbps-44-kHz-stereo', 'mms://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-Multirate-200-to-753-kbps-640x360-25-fps.wmv', 'false', 480, 360, 'https://mediastream.cern.ch/MediaArchive/Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-posterframe-640x360-at-10-percent.jpg', '1383406', true, 'Video/Public/Movies/2011/CERN-MOVIE-2011-129/CERN-MOVIE-2011-129-0600-kbps-maxH-360-25-fps-...
Full Text Available Publicly available long video traces encoded according to H.264/AVC were analyzed from the fractal and multifractal points of view. It was shown that such video traces, as compressed videos (H.261, H.263, and MPEG-4 Version 2 exhibit inherent long-range dependency, that is, fractal, property. Moreover they have high bit rate variability, particularly at higher compression ratios. Such signals may be better characterized by multifractal (MF analysis, since this approach describes both local and global features of the process. From multifractal spectra of the frame size video traces it was shown that higher compression ratio produces broader and less regular MF spectra, indicating to higher MF nature and the existence of additive components in video traces. Considering individual frames (I, P, and B and their MF spectra one can approve additive nature of compressed video and the particular influence of these frames to a whole MF spectrum. Since compressed video occupies a main part of transmission bandwidth, results obtained from MF analysis of compressed video may contribute to more accurate modeling of modern teletraffic. Moreover, by appropriate choice of the method for estimating MF quantities, an inverse MF analysis is possible, that means, from a once derived MF spectrum of observed signal it is possible to recognize and extract parts of the signal which are characterized by particular values of multifractal parameters. Intensive simulations and results obtained confirm the applicability and efficiency of MF analysis of compressed video.
This book covers both algorithms and technologies of interactive videos, so that businesses in IT and data managements, scientists and software engineers in video processing and computer vision, coaches and instructors that use video technology in teaching, and finally end-users will greatly benefit from it. This book contains excellent scientific contributions made by a number of pioneering scientists and experts from around the globe. It consists of five parts. The first part introduces the reader to interactive video and video summarization and presents effective methodologies for automatic abstraction of a single video sequence, a set of video sequences, and a combined audio-video sequence. In the second part, a list of advanced algorithms and methodologies for automatic and semi-automatic analysis and editing of audio-video documents are presented. The third part tackles a more challenging level of automatic video re-structuring, filtering of video stream by extracting of highlights, events, and meaningf...
Romero, Mario; Summet, Jay; Stasko, John; Abowd, Gregory
In the established procedural model of information visualization, the first operation is to transform raw data into data tables . The transforms typically include abstractions that aggregate and segment relevant data and are usually defined by a human, user or programmer. The theme of this paper is that for video, data transforms should be supported by low level computer vision. High level reasoning still resides in the human analyst, while part of the low level perception is handled by the computer. To illustrate this approach, we present Viz-A-Vis, an overhead video capture and access system for activity analysis in natural settings over variable periods of time. Overhead video provides rich opportunities for long-term behavioral and occupancy analysis, but it poses considerable challenges. We present initial steps addressing two challenges. First, overhead video generates overwhelmingly large volumes of video impractical to analyze manually. Second, automatic video analysis remains an open problem for computer vision.
Westerberg, Andreas Rytter; Schoenau-Fog, Henrik
This paper dives into the subject of video game audio and how it can be categorized in order to deliver a message to a player in the most precise way. A new categorization, with a new take on the diegetic spaces, can be used a tool of inspiration for sound- and game-designers to rethink how...... they can use audio in video games. The conclusion of this study is that the current models' view of the diegetic spaces, used to categorize video game audio, is not t to categorize all sounds. This can however possibly be changed though a rethinking of how the player interprets audio....
Bavelier, Daphne; Green, C. Shawn; Han, Doug Hyun; Renshaw, Perry F.; Merzenich, Michael M.; Gentile, Douglas A.
The popular press is replete with stories about the effects of video and computer games on the brain. Sensationalist headlines claiming that video games ‘damage the brain’ or ‘boost brain power’ do not do justice to the complexities and limitations of the studies involved, and create a confusing overall picture about the effects of gaming on the brain. Here, six experts in the field shed light on our current understanding of the positive and negative ways in which playing video games can affe...
This book presents a complete pipeline forHDR image and video processing fromacquisition, through compression and quality evaluation, to display. At the HDR image and video acquisition stage specialized HDR sensors or multi-exposure techniques suitable for traditional cameras are discussed. Then, we present a practical solution for pixel values calibration in terms of photometric or radiometric quantities, which are required in some technically oriented applications. Also, we cover the problem of efficient image and video compression and encoding either for storage or transmission purposes, in
Lucas, Laurent; Loscos, Céline
While 3D vision has existed for many years, the use of 3D cameras and video-based modeling by the film industry has induced an explosion of interest for 3D acquisition technology, 3D content and 3D displays. As such, 3D video has become one of the new technology trends of this century.The chapters in this book cover a large spectrum of areas connected to 3D video, which are presented both theoretically and technologically, while taking into account both physiological and perceptual aspects. Stepping away from traditional 3D vision, the authors, all currently involved in these areas, provide th
Riad I. Hammoud
Full Text Available We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA and multi-media indexing and explorer (MINER. VIVA utilizes analyst call-outs (ACOs in the form of chat messages (voice-to-text to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1 a fusion of graphical track and text data using probabilistic methods; (2 an activity pattern learning framework to support querying an index of activities of interest (AOIs and targets of interest (TOIs by movement type and geolocation; and (3 a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV. VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
King, Daniel L; Ejova, Anastasia; Delfabbro, Paul H
There is a paucity of empirical research examining the possible association between gambling and video game play. In two studies, we examined the association between video game playing, erroneous gambling cognitions, and risky gambling behaviour. One hundred and fifteen participants, including 65 electronic gambling machine (EGM) players and 50 regular video game players, were administered a questionnaire that examined video game play, gambling involvement, problem gambling, and beliefs about gambling. We then assessed each groups' performance on a computerised gambling task that involved real money. A post-game survey examined perceptions of the skill and chance involved in the gambling task. The results showed that video game playing itself was not significantly associated with gambling involvement or problem gambling status. However, among those persons who both gambled and played video games, video game playing was uniquely and significantly positively associated with the perception of direct control over chance-based gambling events. Further research is needed to better understand the nature of this association, as it may assist in understanding the impact of emerging digital gambling technologies.
Rubén González Crespo
Full Text Available The present article seeks to make an approach to the class hierarchy of a scene built with the architecture Java 3D, to develop an ontology of a scene as from the semantic essential components for the semantic structuring of the Web3D. Java was selected because the language recommended by the W3C Consortium for the Development of the Web3D oriented applications as from X3D standard is Xj3D which compositionof their Schemas is based the architecture of Java3D In first instance identifies the domain and scope of the ontology, defining classes and subclasses that comprise from Java3D architecture and the essential elements of a scene, as its point of origin, the field of rotation, translation The limitation of the scene and the definition of shaders, then define the slots that are declared in RDF as a framework for describing the properties of the classes established from identifying thedomain and range of each class, then develops composition of the OWL ontology on SWOOP Finally, be perform instantiations of the ontology building for a Iconosphere object as from class expressions defined.
Alvarez, J.M.; Lumbreras, F.; Lopez, A.M.; Gevers, T.
Understanding road scenes is important in computer vision with different applications to improve road safety (e.g., advanced driver assistance systems) and to develop autonomous driving systems (e.g., Google driver-less vehicle). Current vision-based approaches rely on the robust combination of
Full Text Available This study aims to introduce new methods for classifying key features (power lines, pylons, and buildings comprising utility corridor scene using airborne LiDAR data and modelling power lines in 3D object space. The proposed approach starts from PL scene segmentation using Markov Random Field (MRF, which emphasizes on the roles of spatial context of linear and planar features as in a graphical model. The MRF classifier identifies power line features from linear features extracted from given corridor scenes. The non-power line objects are then investigated in a planar space to sub-classify them into building and non-building class. Based on the classification results, precise localization of individual pylons is conducted through investigating a prior knowledge of contextual relations between power line and pylon. Once the pylon localization is accomplished, a power line span is identified, within which power lines are modelled with catenary curve models in 3D. Once a local catenary curve model is established, this initial model progressively extends to capture entire power line points by adopting model hypothesis and verification. The model parameters are adjusted using a stochastic non-linear square method for producing 3D power line models. An evaluation of the proposed approach is performed over an urban PL corridor area that includes a complex PL scene.
Li, Congcong; Kowdle, Adarsh; Saxena, Ashutosh; Chen, Tsuhan
Scene understanding includes many related subtasks, such as scene categorization, depth estimation, object detection, etc. Each of these subtasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the subtasks while requiring only a "black box" interface to the original classifier for each subtask. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the subtasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling, and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot.
Bond, John W; Hammond, Christine
DNA material is now collected routinely from crime scenes for a wide range of offenses and its timely processing is acknowledged as a key element to its success in solving crime. An analysis of the processing of approximately 1500 samples of DNA material recovered from the property crime offenses of residential burglary, commercial burglary, and theft of motor vehicle in Northamptonshire, U.K. during 2006 identified saliva and cigarette ends as the main sources of DNA recovered (approximately 63% of samples) with blood, cellular DNA, and chewing gum accounting for the remainder. The conversion of these DNA samples into DNA profiles and then into matches with offender profiles held on the U.K. National DNA database is considered in terms of the ease with which Crime Scene Examiners can recover DNA rich samples of different sources, the location of the DNA at the crime scene, and its mobility. A logistical regression of the DNA material recovered has revealed a number of predictors, other than timeliness, that greatly influence its conversion into a DNA profile. The most significant predictor was found to be Crime Scene Examiner accreditation with offense type and DNA sample condition also being relevant. A similar logistical regression of DNA samples profiled that produced a match with an offender on the U.K. National DNA database showed no significance with any of the predictors considered.
Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.