Bakircioglu, Hakan; Gelenbe, Erol
Detection and recognition of target signatures in sensory data obtained by synthetic aperture radar (SAR), forward- looking infrared, or laser radar, have received considerable attention in the literature. In this paper, we propose a feature based target classification methodology to detect and classify targets in cluttered SAR images, that makes use of selective signature data from sensory data, together with a neural network technique which uses a set of trained networks based on the Random Neural Network (RNN) model (Gelenbe 89, 90, 91, 93) which is trained to act as a matched filter. We propose and investigate radial features of target shapes that are invariant to rotation, translation, and scale, to characterize target and clutter signatures. These features are then used to train a set of learning RNNs which can be used to detect targets within clutter with high accuracy, and to classify the targets or man-made objects from natural clutter. Experimental data from SAR imagery is used to illustrate and validate the proposed method, and to calculate Receiver Operating Characteristics which illustrate the performance of the proposed algorithm.
Sahar Yousefi; Morteza Zahedi
This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based on mathematical analysis is represented in three stages that eliminates align...
Full Text Available Image scene recognition is a core technology for many aerial remote sensing applications. Different landforms are inputted as different scenes in aerial imaging, and all landform information is regarded as valuable for aerial image scene recognition. However, the conventional features of the Bag-of-Words model are designed using local points or other related information and thus are unable to fully describe landform areas. This limitation cannot be ignored when the aim is to ensure accurate aerial scene recognition. A novel superpixel-based feature is proposed in this study to characterize aerial image scenes. Then, based on the proposed feature, a scene recognition method of the Bag-of-Words model for aerial imaging is designed. The proposed superpixel-based feature that utilizes landform information establishes top-task superpixel extraction of landforms to bottom-task expression of feature vectors. This characterization technique comprises the following steps: simple linear iterative clustering based superpixel segmentation, adaptive filter bank construction, Lie group-based feature quantification, and visual saliency model-based feature weighting. Experiments of image scene recognition are carried out using real image data captured by an unmanned aerial vehicle (UAV. The recognition accuracy of the proposed superpixel-based feature is 95.1%, which is higher than those of scene recognition algorithms based on other local features.
Clemmensen, Line Katrine Harder; Gomez, David Delgado; Ersbøll, Bjarne Kjær
The accuracy of data classification methods depends considerably on the data representation and on the selected features. In this work, the elastic net model selection is used to identify meaningful and important features in face recognition. Modelling the characteristics which distinguish one...... person from another using only subsets of features will both decrease the computational cost and increase the generalization capacity of the face recognition algorithm. Moreover, identifying which are the features that better discriminate between persons will also provide a deeper understanding...... of the face recognition problem. The elastic net model is able to select a subset of features with low computational effort compared to other state-of-the-art feature selection methods. Furthermore, the fact that the number of features usually is larger than the number of images in the data base makes feature...
Full Text Available Handwritten character recognition is a challenging area of research. Lots of research activities in the area of character recognition are already done for Indian languages such as Hindi, Bangla, Kannada, Tamil and Telugu. Literature review on handwritten character recognition indicates that in comparison with other Indian scripts research activities on Gujarati handwritten character recognition are very less. This paper aims to bring Gujarati character recognition in attention. Recognition of isolated Gujarati handwritten characters is proposed using three different kinds of features and their fusion. Chain code based, zone based and projection profiles based features are utilized as individual features. One of the significant contribution of proposed work is towards the generation of large and representative dataset of 88,000 handwritten Gujarati characters. Experiments are carried out on this developed dataset. Artificial Neural Network (ANN, Support Vector Machine (SVM and Naive Bayes (NB classifier based methods are implemented for handwritten Gujarati character recognition. Experimental results show substantial enhancement over state-of-the-art and authenticate our proposals.
Xiong, Yudian; Lu, Tongwei; Jiang, Yongyuan
As an important application in the field of text line recognition and office automation, Chinese character recognition has become an important subject of pattern recognition. However, due to the large number of Chinese characters and the complexity of its structure, there is a great difficulty in the Chinese character recognition. In order to solve this problem, this paper proposes a method of printed Chinese character recognition based on Gabor feature extraction and Convolution Neural Network(CNN). The main steps are preprocessing, feature extraction, training classification. First, the gray-scale Chinese character image is binarized and normalized to reduce the redundancy of the image data. Second, each image is convoluted with Gabor filter with different orientations, and the feature map of the eight orientations of Chinese characters is extracted. Third, the feature map through Gabor filters and the original image are convoluted with learning kernels, and the results of the convolution is the input of pooling layer. Finally, the feature vector is used to classify and recognition. In addition, the generalization capacity of the network is improved by Dropout technology. The experimental results show that this method can effectively extract the characteristics of Chinese characters and recognize Chinese characters.
1 Department of Electronics and Communication, G.L.A. University, 17-km stone, NH#2, Delhi-Mathura Road, .... Based upon these range of values, a decision is taken about the ...... triplet half-band filter bank and flexible k-out-of-n: A post.
Full Text Available In recent years, fire recognition based on image features has become a hotspot in fire monitoring. However, due to the complexity of forest environment, the accuracy of forest fireworks recognition based on image features is low. Based on this, this paper proposes a feature extraction algorithm based on YCrCb color space and K-means clustering. Firstly, the paper prepares and analyzes the color characteristics of a large number of forest fire image samples. Using the K-means clustering algorithm, the forest flame model is obtained by comparing the two commonly used color spaces, and the suspected flame area is discriminated and extracted. The experimental results show that the extraction accuracy of flame area based on YCrCb color model is higher than that of HSI color model, which can be applied in different scene forest fire identification, and it is feasible in practice.
Ren, X; Tian, Q; Zhang, J; Wu, S; Zeng, Y
In iris recognition, feature extraction can be influenced by factors such as illumination and contrast, and thus the features extracted may be unreliable, which can cause a high rate of false results in iris pattern recognition. In order to obtain stable features, an algorithm was proposed in this paper to extract key features of a pattern from multiple images. The proposed algorithm built an iris feature template by extracting key features and performed iris identity enrolment. Simulation results showed that the selected key features have high recognition accuracy on the CASIA Iris Set, where both contrast and illumination variance exist.
Full Text Available To improve the human-computer interaction (HCI to be as good as human-human interaction, building an efficient approach for human emotion recognition is required. These emotions could be fused from several modalities such as facial expression, hand gesture, acoustic data, and biophysiological data. In this paper, we address the frame-based perception of the universal human facial expressions (happiness, surprise, anger, disgust, fear, and sadness, with the help of several geometrical features. Unlike many other geometry-based approaches, the frame-based method does not rely on prior knowledge of a person-specific neutral expression; this knowledge is gained through human intervention and not available in real scenarios. Additionally, we provide a method to investigate the performance of the geometry-based approaches under various facial point localization errors. From an evaluation on two public benchmark datasets, we have found that using eight facial points, we can achieve the state-of-the-art recognition rate. However, this state-of-the-art geometry-based approach exploits features derived from 68 facial points and requires prior knowledge of the person-specific neutral expression. The expression recognition rate using geometrical features is adversely affected by the errors in the facial point localization, especially for the expressions with subtle facial deformations.
Full Text Available This paper proposes a determining algorithm for froth image features based on the amplitude spectrum energy statistics by applying Fast Fourier Transformation to analyze the energy distribution of various-sized froth. The proposed algorithm has been used to do a froth feature analysis of the froth images from the alumina flotation processing site, and the results show that the consistency rate reaches 98.1 % and the usability rate 94.2 %; with its good robustness and high efficiency, the algorithm is quite suitable for flotation processing state recognition.
Xi, Xiaoming; Yang, Gongping; Yin, Yilong; Yang, Lu
The finger vein is a promising biometric pattern for personal identification due to its advantages over other existing biometrics. In finger vein recognition, feature extraction is a critical step, and many feature extraction methods have been proposed to extract the gray, texture, or shape of the finger vein. We treat them as low-level features and present a high-level feature extraction framework. Under this framework, base attribute is first defined to represent the characteristics of a certain subcategory of a subject. Then, for an image, the correlation coefficient is used for constructing the high-level feature, which reflects the correlation between this image and all base attributes. Since the high-level feature can reveal characteristics of more subcategories and contain more discriminative information, we call it hyperinformation feature (HIF). Compared with low-level features, which only represent the characteristics of one subcategory, HIF is more powerful and robust. In order to demonstrate the potential of the proposed framework, we provide a case study to extract HIF. We conduct comprehensive experiments to show the generality of the proposed framework and the efficiency of HIF on our databases, respectively. Experimental results show that HIF significantly outperforms the low-level features.
Li, Baopu; Meng, Max Q-H
Tumor in digestive tract is a common disease and wireless capsule endoscopy (WCE) is a relatively new technology to examine diseases for digestive tract especially for small intestine. This paper addresses the problem of automatic recognition of tumor for WCE images. Candidate color texture feature that integrates uniform local binary pattern and wavelet is proposed to characterize WCE images. The proposed features are invariant to illumination change and describe multiresolution characteristics of WCE images. Two feature selection approaches based on support vector machine, sequential forward floating selection and recursive feature elimination, are further employed to refine the proposed features for improving the detection accuracy. Extensive experiments validate that the proposed computer-aided diagnosis system achieves a promising tumor recognition accuracy of 92.4% in WCE images on our collected data.
Li, Ming; Wang, Zengfu
Recently, dynamic facial expression recognition in videos has attracted growing attention. In this paper, we propose a novel dynamic facial expression recognition method by using geometric and texture features. In our system, the facial landmark movements and texture variations upon pairwise images are used to perform the dynamic facial expression recognition tasks. For one facial expression sequence, pairwise images are created between the first frame and each of its subsequent frames. Integration of both geometric and texture features further enhances the representation of the facial expressions. Finally, Support Vector Machine is used for facial expression recognition. Experiments conducted on the extended Cohn-Kanade database show that our proposed method can achieve a competitive performance with other methods.
In flexible automated manufacturing, robots can perform routine operations as well as recover from atypical events, provided that process-relevant information is available to the robot controller. Real time vision is among the most versatile sensing tools, yet the reliability of machine-based scene interpretation can be questionable. The effort described here is focused on the development of machine-based vision methods to support autonomous nuclear fuel manufacturing operations in hot cells. This thesis presents a method to efficiently recognize 3D objects from 2D images based on feature-based indexing. Object recognition is the identification of correspondences between parts of a current scene and stored views of known objects, using chains of segments or indexing vectors. To create indexed object models, characteristic model image features are extracted during preprocessing. Feature vectors representing model object contours are acquired from several points of view around each object and stored. Recognition is the process of matching stored views with features or patterns detected in a test scene. Two sets of algorithms were developed, one for preprocessing and indexed database creation, and one for pattern searching and matching during recognition. At recognition time, those indexing vectors with the highest match probability are retrieved from the model image database, using a nearest neighbor search algorithm. The nearest neighbor search predicts the best possible match candidates. Extended searches are guided by a search strategy that employs knowledge-base (KB) selection criteria. The knowledge-based system simplifies the recognition process and minimizes the number of iterations and memory usage. Novel contributions include the use of a feature-based indexing data structure together with a knowledge base. Both components improve the efficiency of the recognition process by improved structuring of the database of object features and reducing data base size
Pawan Kumar Singh
Full Text Available Handwritten digit recognition plays a significant role in many user authentication applications in the modern world. As the handwritten digits are not of the same size, thickness, style, and orientation, therefore, these challenges are to be faced to resolve this problem. A lot of work has been done for various non-Indic scripts particularly, in case of Roman, but, in case of Indic scripts, the research is limited. This paper presents a script invariant handwritten digit recognition system for identifying digits written in five popular scripts of Indian subcontinent, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu. A 130-element feature set which is basically a combination of six different types of moments, namely, geometric moment, moment invariant, affine moment invariant, Legendre moment, Zernike moment, and complex moment, has been estimated for each digit sample. Finally, the technique is evaluated on CMATER and MNIST databases using multiple classifiers and, after performing statistical significance tests, it is observed that Multilayer Perceptron (MLP classifier outperforms the others. Satisfactory recognition accuracies are attained for all the five mentioned scripts.
Wu, Jianwei; Wang, Zongyue
Vehicle license plate (VLP) recognition is of great importance to many traffic applications. Though researchers have paid much attention to VLP recognition there has not been a fully operational VLP recognition system yet for many reasons. This paper discusses a valid and practical method for vehicle license plate recognition based on geometry restraints and multi-feature decision including statistical and structural features. In general, the VLP recognition includes the following steps: the location of VLP, character segmentation, and character recognition. This paper discusses the three steps in detail. The characters of VLP are always declining caused by many factors, which makes it more difficult to recognize the characters of VLP, therefore geometry restraints such as the general ratio of length and width, the adjacent edges being perpendicular are used for incline correction. Image Moment has been proved to be invariant to translation, rotation and scaling therefore image moment is used as one feature for character recognition. Stroke is the basic element for writing and hence taking it as a feature is helpful to character recognition. Finally we take the image moment, the strokes and the numbers of each stroke for each character image and some other structural features and statistical features as the multi-feature to match each character image with sample character images so that each character image can be recognized by BP neural net. The proposed method combines statistical and structural features for VLP recognition, and the result shows its validity and efficiency.
Ren, Bo; Liu, Deyin; Qi, Lin
In order to deal with the insufficiency of recently algorithms based on Two Dimensions Fractional Fourier Transform (2D-FrFT), this paper proposes a multiple order features based method for emotion recognition. Most existing methods utilize the feature of single order or a couple of orders of 2D-FrFT. However, different orders of 2D-FrFT have different contributions on the feature extraction of emotion recognition. Combination of these features can enhance the performance of an emotion recognition system. The proposed approach obtains numerous features that extracted in different orders of 2D-FrFT in the directions of x-axis and y-axis, and uses the statistical magnitudes as the final feature vectors for recognition. The Support Vector Machine (SVM) is utilized for the classification and RML Emotion database and Cohn-Kanade (CK) database are used for the experiment. The experimental results demonstrate the effectiveness of the proposed method.
Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam; Akram, Sheeraz
Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time-frequency plain by taking the moving average on the diagonal directions of the time-frequency plane. This feature captured the time-frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.
Zhang, Ming; Xie, Fei; Zhao, Jing; Sun, Rui; Zhang, Lei; Zhang, Yue
The prosperity of license plate recognition technology has made great contribution to the development of Intelligent Transport System (ITS). In this paper, a robust and efficient license plate recognition method is proposed which is based on a combined feature extraction model and BPNN (Back Propagation Neural Network) algorithm. Firstly, the candidate region of the license plate detection and segmentation method is developed. Secondly, a new feature extraction model is designed considering three sets of features combination. Thirdly, the license plates classification and recognition method using the combined feature model and BPNN algorithm is presented. Finally, the experimental results indicate that the license plate segmentation and recognition both can be achieved effectively by the proposed algorithm. Compared with three traditional methods, the recognition accuracy of the proposed method has increased to 95.7% and the consuming time has decreased to 51.4ms.
Full Text Available Robust and fast traffic sign recognition is very important but difficult for safe driving assistance systems. This study addresses fast and robust traffic sign recognition to enhance driving safety. The proposed method includes three stages. First, a typical Hough transformation is adopted to implement coarse-grained location of the candidate regions of traffic signs. Second, a RIBP (Rotation Invariant Binary Pattern based feature in the affine and Gaussian space is proposed to reduce the time of traffic sign detection and achieve robust traffic sign detection in terms of scale, rotation, and illumination. Third, the techniques of ANN (Artificial Neutral Network based feature dimension reduction and classification are designed to reduce the traffic sign recognition time. Compared with the current work, the experimental results in the public datasets show that this work achieves robustness in traffic sign recognition with comparable recognition accuracy and faster processing speed, including training speed and recognition speed.
Liang, Wei; Zhang, Laibin; Mingda, Wang; Jinqiu, Hu [College of Mechanical and Transportation Engineering, China University of Petroleum, Beijing, (China)
The negative wave pressure method is one of the processes used to detect leaks on oil pipelines. The development of new leakage recognition processes is difficult because it is practically impossible to collect leakage pressure samples. The method of leakage feature extraction and the selection of the recognition model are also important in pipeline leakage detection. This study investigated a new feature extraction approach Singular Value Projection (SVP). It projects the singular value to a standard basis. A new pipeline recognition model based on the multi-class Support Vector Machines was also developed. It was found that SVP is a clear and concise recognition feature of the negative pressure wave. Field experiments proved that the model provided a high recognition accuracy rate. This approach to pipeline leakage detection based on the SVP and SVM has a high application value.
Xie, Zhihua; Zhang, Shuai; Liu, Guodong; Xiong, Jinquan
Visible face recognition systems, being vulnerable to illumination, expression, and pose, can not achieve robust performance in unconstrained situations. Meanwhile, near infrared face images, being light- independent, can avoid or limit the drawbacks of face recognition in visible light, but its main challenges are low resolution and signal noise ratio (SNR). Therefore, near infrared and visible fusion face recognition has become an important direction in the field of unconstrained face recognition research. In order to extract the discriminative complementary features between near infrared and visible images, in this paper, we proposed a novel near infrared and visible face fusion recognition algorithm based on DCT and LBP features. Firstly, the effective features in near-infrared face image are extracted by the low frequency part of DCT coefficients and the partition histograms of LBP operator. Secondly, the LBP features of visible-light face image are extracted to compensate for the lacking detail features of the near-infrared face image. Then, the LBP features of visible-light face image, the DCT and LBP features of near-infrared face image are sent to each classifier for labeling. Finally, decision level fusion strategy is used to obtain the final recognition result. The visible and near infrared face recognition is tested on HITSZ Lab2 visible and near infrared face database. The experiment results show that the proposed method extracts the complementary features of near-infrared and visible face images and improves the robustness of unconstrained face recognition. Especially for the circumstance of small training samples, the recognition rate of proposed method can reach 96.13%, which has improved significantly than 92.75 % of the method based on statistical feature fusion.
Sun, Bo; Li, Liandong; Zhou, Guoyan; He, Jun
Facial expression recognition in the wild is a very challenging task. We describe our work in static and continuous facial expression recognition in the wild. We evaluate the recognition results of gray deep features and color deep features, and explore the fusion of multimodal texture features. For the continuous facial expression recognition, we design two temporal-spatial dense scale-invariant feature transform (SIFT) features and combine multimodal features to recognize expression from image sequences. For the static facial expression recognition based on video frames, we extract dense SIFT and some deep convolutional neural network (CNN) features, including our proposed CNN architecture. We train linear support vector machine and partial least squares classifiers for those kinds of features on the static facial expression in the wild (SFEW) and acted facial expression in the wild (AFEW) dataset, and we propose a fusion network to combine all the extracted features at decision level. The final achievement we gained is 56.32% on the SFEW testing set and 50.67% on the AFEW validation set, which are much better than the baseline recognition rates of 35.96% and 36.08%.
Huo, Guang; Liu, Yuanning; Zhu, Xiaodong; Dong, Hongxing
This paper proposes a secondary iris recognition based on local features. The application of the energy-orientation feature (EOF) by two-dimensional Gabor filter to the extraction of the iris goes before the first recognition by the threshold of similarity, which sets the whole iris database into two categories-a correctly recognized class and a class to be recognized. Therefore, the former are accepted and the latter are transformed by histogram to achieve an energy-orientation histogram feature (EOHF), which is followed by a second recognition with the chi-square distance. The experiment has proved that the proposed method, because of its higher correct recognition rate, could be designated as the most efficient and effective among its companion studies in iris recognition algorithms.
The conventional LBP-based feature as represented by the local binary pattern (LBP) histogram still has room for performance improvements. This paper focuses on the dimension reduction of LBP micro-patterns and proposes an improved infrared face recognition method based on LBP histogram representation. To extract the local robust features in infrared face images, LBP is chosen to get the composition of micro-patterns of sub-blocks. Based on statistical test theory, Kruskal-Wallis (KW) feature selection method is proposed to get the LBP patterns which are suitable for infrared face recognition. The experimental results show combination of LBP and KW features selection improves the performance of infrared face recognition, the proposed method outperforms the traditional methods based on LBP histogram, discrete cosine transform(DCT) or principal component analysis(PCA).
Amol D. Rahulkar
Full Text Available The feature extraction plays a very important role in iris recognition. Recent researches on multiscale analysis provide good opportunity to extract more accurate information for iris recognition. In this work, a new directional iris texture features based on 2-D Fast Discrete Curvelet Transform (FDCT is proposed. The proposed approach divides the normalized iris image into six sub-images and the curvelet transform is applied independently on each sub-image. The anisotropic feature vector for each sub-image is derived using the directional energies of the curvelet coefficients. These six feature vectors are combined to create the resultant feature vector. During recognition, the nearest neighbor classifier based on Euclidean distance has been used for authentication. The effectiveness of the proposed approach has been tested on two different databases namely UBIRIS and MMU1. Experimental results show the superiority of the proposed approach.
The iris image is easily polluted by noise and uneven light. This paper proposed an improved extreme learning machine (ELM) based iris recognition algorithm with hybrid feature. 2D-Gabor filters and GLCM is employed to generate a multi-granularity hybrid feature vector. 2D-Gabor filter and GLCM feature work for capturing low-intermediate frequency and high frequency texture information, respectively. Finally, we utilize extreme learning machine for iris recognition. Experimental results reveal our proposed ELM based multi-granularity iris recognition algorithm (ELM-MGIR) has higher accuracy of 99.86%, and lower EER of 0.12% under the premise of real-time performance. The proposed ELM-MGIR algorithm outperforms other mainstream iris recognition algorithms.
Grandidier, F.; Sabourin, R.; Suen, C.Y.; Gilloux, M.
In this paper we introduce a new strategy for improving a discrete HMMbased handwriting recognition system, by integrating several information sources from specialized feature sets. For a given system, the basic idea is to keep the most discriminative features, and to replace the others with new
Islam, Md Rabiul
The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.
Full Text Available In this paper, we present a Synthetic Aperture Radar (SAR image target recognition algorithm based on multi-feature multiple representation learning classifier fusion. First, it extracts three features from the SAR images, namely principal component analysis, wavelet transform, and Two-Dimensional Slice Zernike Moments (2DSZM features. Second, we harness the sparse representation classifier and the cooperative representation classifier with the above-mentioned features to get six predictive labels. Finally, we adopt classifier fusion to obtain the final recognition decision. We researched three different classifier fusion algorithms in our experiments, and the results demonstrate thatusing Bayesian decision fusion gives thebest recognition performance. The method based on multi-feature multiple representation learning classifier fusion integrates the discrimination of multi-features and combines the sparse and cooperative representation classification performance to gain complementary advantages and to improve recognition accuracy. The experiments are based on the Moving and Stationary Target Acquisition and Recognition (MSTAR database,and they demonstrate the effectiveness of the proposed approach.
Full Text Available This paper proposes an improved performance algorithm of face recognition to identify two face mismatch pairs in cases of incorrect decisions. The primary feature of this method is to deploy the similarity score with respect to Gaussian components between two previously unseen faces. Unlike the conventional classical vector distance measurement, our algorithms also consider the plot of summation of the similarity index versus face feature vector distance. A mixture of Gaussian models of labeled faces is also widely applicable to different biometric system parameters. By comparative evaluations, it has been shown that the efficiency of the proposed algorithm is superior to that of the conventional algorithm by an average accuracy of up to 1.15% and 16.87% when compared with 3x3 Multi-Region Histogram (MRH direct-bag-of-features and Principal Component Analysis (PCA-based face recognition systems, respectively. The experimental results show that similarity score consideration is more discriminative for face recognition compared to feature distance. Experimental results of Labeled Face in the Wild (LFW data set demonstrate that our algorithms are suitable for real applications probe-to-gallery identification of face recognition systems. Moreover, this proposed method can also be applied to other recognition systems and therefore additionally improves recognition scores.
Diaz-Escobar, Julia; Kober, Vitaly
Nowadays most of digital information is obtained using mobile devices specially smartphones. In particular, it brings the opportunity for optical character recognition in camera-captured images. For this reason many recognition applications have been recently developed such as recognition of license plates, business cards, receipts and street signal; document classification, augmented reality, language translator and so on. Camera-captured images are usually affected by geometric distortions, nonuniform illumination, shadow, noise, which make difficult the recognition task with existing systems. It is well known that the Fourier phase contains a lot of important information regardless of the Fourier magnitude. So, in this work we propose a phase-based recognition system exploiting phase-congruency features for illumination/scale invariance. The performance of the proposed system is tested in terms of miss classifications and false alarms with the help of computer simulation.
Luo, Dan; Gao, Hua; Ekenel, Hazim Kemal; Ohya, Jun
The use of gesture as a natural interface plays an utmost important role for achieving intelligent Human Computer Interaction (HCI). Human gestures include different components of visual actions such as motion of hands, facial expression, and torso, to convey meaning. So far, in the field of gesture recognition, most previous works have focused on the manual component of gestures. In this paper, we present an appearance-based multimodal gesture recognition framework, which combines the different groups of features such as facial expression features and hand motion features which are extracted from image frames captured by a single web camera. We refer 12 classes of human gestures with facial expression including neutral, negative and positive meanings from American Sign Languages (ASL). We combine the features in two levels by employing two fusion strategies. At the feature level, an early feature combination can be performed by concatenating and weighting different feature groups, and LDA is used to choose the most discriminative elements by projecting the feature on a discriminative expression space. The second strategy is applied on decision level. Weighted decisions from single modalities are fused in a later stage. A condensation-based algorithm is adopted for classification. We collected a data set with three to seven recording sessions and conducted experiments with the combination techniques. Experimental results showed that facial analysis improve hand gesture recognition, decision level fusion performs better than feature level fusion.
Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali
In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.
J. Del Rio Vera
Full Text Available This paper presents a new supervised classification approach for automated target recognition (ATR in SAS images. The recognition procedure starts with a novel segmentation stage based on the Hilbert transform. A number of geometrical features are then extracted and used to classify observed objects against a previously compiled database of target and non-target features. The proposed approach has been tested on a set of 1528 simulated images created by the NURC SIGMAS sonar model, achieving up to 95% classification accuracy.
Khayat, Omid; Afarideh, Hossein
Track counting algorithms as one of the fundamental principles of nuclear science have been emphasized in the recent years. Accurate measurement of nuclear tracks on solid-state nuclear track detectors is the aim of track counting systems. Commonly track counting systems comprise a hardware system for the task of imaging and software for analysing the track images. In this paper, a track recognition algorithm based on 12 defined textual and shape-based features and a neuro-fuzzy classifier is proposed. Features are defined so as to discern the tracks from the background and small objects. Then, according to the defined features, tracks are detected using a trained neuro-fuzzy system. Features and the classifier are finally validated via 100 Alpha track images and 40 training samples. It is shown that principle textual and shape-based features concomitantly yield a high rate of track detection compared with the single-feature based methods.
Noor Abdalrazak Shnain
Full Text Available Facial recognition is one of the most challenging and interesting problems within the field of computer vision and pattern recognition. During the last few years, it has gained special attention due to its importance in relation to current issues such as security, surveillance systems and forensics analysis. Despite this high level of attention to facial recognition, the success is still limited by certain conditions; there is no method which gives reliable results in all situations. In this paper, we propose an efficient similarity index that resolves the shortcomings of the existing measures of feature and structural similarity. This measure, called the Feature-Based Structural Measure (FSM, combines the best features of the well-known SSIM (structural similarity index measure and FSIM (feature similarity index measure approaches, striking a balance between performance for similar and dissimilar images of human faces. In addition to the statistical structural properties provided by SSIM, edge detection is incorporated in FSM as a distinctive structural feature. Its performance is tested for a wide range of PSNR (peak signal-to-noise ratio, using ORL (Olivetti Research Laboratory, now AT&T Laboratory Cambridge and FEI (Faculty of Industrial Engineering, São Bernardo do Campo, São Paulo, Brazil databases. The proposed measure is tested under conditions of Gaussian noise; simulation results show that the proposed FSM outperforms the well-known SSIM and FSIM approaches in its efficiency of similarity detection and recognition of human faces.
Zhang, Guoliang; Jia, Songmin; Li, Xiuzhi; Zhang, Xiangyin
The majority of human action recognition methods use multifeature fusion strategy to improve the classification performance, where the contribution of different features for specific action has not been paid enough attention. We present an extendible and universal weighted score-level feature fusion method using the Dempster-Shafer (DS) evidence theory based on the pipeline of bag-of-visual-words. First, the partially distinctive samples in the training set are selected to construct the validation set. Then, local spatiotemporal features and pose features are extracted from these samples to obtain evidence information. The DS evidence theory and the proposed rule of survival of the fittest are employed to achieve evidence combination and calculate optimal weight vectors of every feature type belonging to each action class. Finally, the recognition results are deduced via the weighted summation strategy. The performance of the established recognition framework is evaluated on Penn Action dataset and a subset of the joint-annotated human metabolome database (sub-JHMDB). The experiment results demonstrate that the proposed feature fusion method can adequately exploit the complementarity among multiple features and improve upon most of the state-of-the-art algorithms on Penn Action and sub-JHMDB datasets.
Gutta, Sandeep; Cheng, Qi
Traditional biometric recognition systems often utilize physiological traits such as fingerprint, face, iris, etc. Recent years have seen a growing interest in electrocardiogram (ECG)-based biometric recognition techniques, especially in the field of clinical medicine. In existing ECG-based biometric recognition methods, feature extraction and classifier design are usually performed separately. In this paper, a multitask learning approach is proposed, in which feature extraction and classifier design are carried out simultaneously. Weights are assigned to the features within the kernel of each task. We decompose the matrix consisting of all the feature weights into sparse and low-rank components. The sparse component determines the features that are relevant to identify each individual, and the low-rank component determines the common feature subspace that is relevant to identify all the subjects. A fast optimization algorithm is developed, which requires only the first-order information. The performance of the proposed approach is demonstrated through experiments using the MIT-BIH Normal Sinus Rhythm database.
Full Text Available Abstract We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Full Text Available We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Miwa, Shotaro; Kage, Hiroshi; Hirai, Takashi; Sumi, Kazuhiko
We propose a probabilistic face recognition algorithm for Access Control System(ACS)s. Comparing with existing ACSs using low cost IC-cards, face recognition has advantages in usability and security that it doesn't require people to hold cards over scanners and doesn't accept imposters with authorized cards. Therefore face recognition attracts more interests in security markets than IC-cards. But in security markets where low cost ACSs exist, price competition is important, and there is a limitation on the quality of available cameras and image control. Therefore ACSs using face recognition are required to handle much lower quality images, such as defocused and poor gain-controlled images than high security systems, such as immigration control. To tackle with such image quality problems we developed a face recognition algorithm based on a probabilistic model which combines a variety of image-difference features trained by Real AdaBoost with their prior probability distributions. It enables to evaluate and utilize only reliable features among trained ones during each authentication, and achieve high recognition performance rates. The field evaluation using a pseudo Access Control System installed in our office shows that the proposed system achieves a constant high recognition performance rate independent on face image qualities, that is about four times lower EER (Equal Error Rate) under a variety of image conditions than one without any prior probability distributions. On the other hand using image difference features without any prior probabilities are sensitive to image qualities. We also evaluated PCA, and it has worse, but constant performance rates because of its general optimization on overall data. Comparing with PCA, Real AdaBoost without any prior distribution performs twice better under good image conditions, but degrades to a performance as good as PCA under poor image conditions.
Wan Min; Xiang Rujian; Wan Yongxing
The characteristics of objects, especially flying objects, are analyzed, which include characteristics of spectrum, image and motion. Feature extraction is also achieved. To improve the speed of object recognition, a feature database is used to simplify the data in the source database. The feature vs. object relationship maps are stored in the feature database. An object recognition model based on the feature database is presented, and the way to achieve object recognition is also explained
Zhu, Rong; Hu, Xueying; Tang, Jiajun; Hu, Sheng
Although image/video based fire recognition has received growing attention, an efficient and robust fire detection strategy is rarely explored. In this paper, we propose a novel approach to automatically identify the flame or smoke regions in an image. It is composed to three stages: (1) a block processing is applied to divide an image into several nonoverlapping image blocks, and these image blocks are identified as suspicious fire regions or not by using two color models and a color histogram-based similarity matching method in the HSV color space, (2) considering that compared to other information, the flame and smoke regions have significant visual characteristics, so that two kinds of image features are extracted for fire recognition, where local features are obtained based on the Scale Invariant Feature Transform (SIFT) descriptor and the Bags of Keypoints (BOK) technique, and texture features are extracted based on the Gray Level Co-occurrence Matrices (GLCM) and the Wavelet-based Analysis (WA) methods, and (3) a manifold learning-based classifier is constructed based on two image manifolds, which is designed via an improve Globular Neighborhood Locally Linear Embedding (GNLLE) algorithm, and the extracted hybrid features are used as input feature vectors to train the classifier, which is used to make decision for fire images or non fire images. Experiments and comparative analyses with four approaches are conducted on the collected image sets. The results show that the proposed approach is superior to the other ones in detecting fire and achieving a high recognition accuracy and a low error rate.
Yang, Jinfeng; Hong, Bofeng
Multimodal biometrics based on the finger identification is a hot topic in recent years. In this paper, a novel fingerprint-vein based biometric method is proposed to improve the reliability and accuracy of the finger recognition system. First, the second order steerable filters are used here to enhance and extract the minutiae features of the fingerprint (FP) and finger-vein (FV). Second, the texture features of fingerprint and finger-vein are extracted by a bank of Gabor filter. Third, a new triangle-region fusion method is proposed to integrate all the fingerprint and finger-vein features in feature-level. Thus, the fusion features contain both the finger texture-information and the minutiae triangular geometry structure. Finally, experimental results performed on the self-constructed finger-vein and fingerprint databases are shown that the proposed method is reliable and precise in personal identification.
Md. Rabiul Islam
Full Text Available The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.
Hasanuzzaman, Faiz M; Yang, Xiaodong; Tian, YingLi
Camera-based computer vision technology is able to assist visually impaired people to automatically recognize banknotes. A good banknote recognition algorithm for blind or visually impaired people should have the following features: 1) 100% accuracy, and 2) robustness to various conditions in different environments and occlusions. Most existing algorithms of banknote recognition are limited to work for restricted conditions. In this paper we propose a component-based framework for banknote recognition by using Speeded Up Robust Features (SURF). The component-based framework is effective in collecting more class-specific information and robust in dealing with partial occlusion and viewpoint changes. Furthermore, the evaluation of SURF demonstrates its effectiveness in handling background noise, image rotation, scale, and illumination changes. To authenticate the robustness and generalizability of the proposed approach, we have collected a large dataset of banknotes from a variety of conditions including occlusion, cluttered background, rotation, and changes of illumination, scaling, and viewpoints. The proposed algorithm achieves 100% recognition rate on our challenging dataset.
Full Text Available Contact-free palm-vein recognition is one of the most challenging and promising areas in hand biometrics. In view of the existing problems in contact-free palm-vein imaging, including projection transformation, uneven illumination and difficulty in extracting exact ROIs, this paper presents a novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information. First, a hierarchical enhancement algorithm, which combines a DOG filter and histogram equalization, is adopted to alleviate uneven illumination and to highlight vein textures. Second, RootSIFT, a more stable local invariant feature extraction method in comparison to SIFT, is adopted to overcome the projection transformation in contact-free mode. Subsequently, a novel hierarchical mismatching removal algorithm based on neighborhood searching and LBP histograms is adopted to improve the accuracy of feature matching. Finally, we rigorously evaluated the proposed approach using two different databases and obtained 0.996% and 3.112% Equal Error Rates (EERs, respectively, which demonstrate the effectiveness of the proposed approach.
Kang, Wenxiong; Liu, Yang; Wu, Qiuxia; Yue, Xishun
Contact-free palm-vein recognition is one of the most challenging and promising areas in hand biometrics. In view of the existing problems in contact-free palm-vein imaging, including projection transformation, uneven illumination and difficulty in extracting exact ROIs, this paper presents a novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information. First, a hierarchical enhancement algorithm, which combines a DOG filter and histogram equalization, is adopted to alleviate uneven illumination and to highlight vein textures. Second, RootSIFT, a more stable local invariant feature extraction method in comparison to SIFT, is adopted to overcome the projection transformation in contact-free mode. Subsequently, a novel hierarchical mismatching removal algorithm based on neighborhood searching and LBP histograms is adopted to improve the accuracy of feature matching. Finally, we rigorously evaluated the proposed approach using two different databases and obtained 0.996% and 3.112% Equal Error Rates (EERs), respectively, which demonstrate the effectiveness of the proposed approach.
Tian, Fuyang; Cao, Dong; Dong, Xiaoning; Zhao, Xinqiang; Li, Fade; Wang, Zhonghua
Behavioral features recognition was an important effect to detect oestrus and sickness in dairy herds and there is a need for heat detection aid. The detection method was based on the measure of the individual behavioural activity, standing time, and temperature of dairy using vibrational sensor and temperature sensor in this paper. The data of behavioural activity index, standing time, lying time and walking time were sent to computer by lower power consumption wireless communication system. The fast approximate K-means algorithm (FAKM) was proposed to deal the data of the sensor for behavioral features recognition. As a result of technical progress in monitoring cows using computers, automatic oestrus detection has become possible.
Chao, Tien-Hsin; Stoner, William W.
An optical neural network based on the neocognitron paradigm is introduced. A novel aspect of the architecture design is shift-invariant multichannel Fourier optical correlation within each processing layer. Multilayer processing is achieved by feeding back the ouput of the feature correlator interatively to the input spatial light modulator and by updating the Fourier filters. By training the neural net with characteristic features extracted from the target images, successful pattern recognition with intraclass fault tolerance and interclass discrimination is achieved. A detailed system description is provided. Experimental demonstrations of a two-layer neural network for space-object discrimination is also presented.
An optical neural network based upon the Neocognitron paradigm (K. Fukushima et al. 1983) is introduced. A novel aspect of the architectural design is shift-invariant multichannel Fourier optical correlation within each processing layer. Multilayer processing is achieved by iteratively feeding back the output of the feature correlator to the input spatial light modulator and updating the Fourier filters. By training the neural net with characteristic features extracted from the target images, successful pattern recognition with intra-class fault tolerance and inter-class discrimination is achieved. A detailed system description is provided. Experimental demonstration of a two-layer neural network for space objects discrimination is also presented.
The Department of Environmental Analysis at the Kentucky Transportation Cabinet has expressed an interest in feature-recognition capability because it may help analysts identify environmentally sensitive features in the landscape, : including those r...
Full Text Available LiDAR technology can provide very detailed and highly accurate geospatial information on an urban scene for the creation of Virtual Geographic Environments (VGEs for different applications. However, automatic 3D modeling and feature recognition from LiDAR point clouds are very complex tasks. This becomes even more complex when the data is incomplete (occlusion problem or uncertain. In this paper, we propose to build a knowledge base comprising of ontology and semantic rules aiming at automatic feature recognition from point clouds in support of 3D modeling. First, several modules for ontology are defined from different perspectives to describe an urban scene. For instance, the spatial relations module allows the formalized representation of possible topological relations extracted from point clouds. Then, a knowledge base is proposed that contains different concepts, their properties and their relations, together with constraints and semantic rules. Then, instances and their specific relations form an urban scene and are added to the knowledge base as facts. Based on the knowledge and semantic rules, a reasoning process is carried out to extract semantic features of the objects and their components in the urban scene. Finally, several experiments are presented to show the validity of our approach to recognize different semantic features of buildings from LiDAR point clouds.
Jin Jing; Wei Biao; Feng Peng; Tang Yuelin; Zhou Mi
Based on the interdependent relationship between fission neutrons ( 252 Cf) and fission chain ( 235 U system), the paper presents the time-frequency feature analysis and recognition in fission neutron signal based on support vector machine (SVM) through the analysis on signal characteristics and the measuring principle of the 252 Cf fission neutron signal. The time-frequency characteristics and energy features of the fission neutron signal are extracted by using wavelet decomposition and de-noising wavelet packet decomposition, and then applied to training and classification by means of support vector machine based on statistical learning theory. The results show that, it is effective to obtain features of nuclear signal via wavelet decomposition and de-noising wavelet packet decomposition, and the latter can reflect the internal characteristics of the fission neutron system better. With the training accomplished, the SVM classifier achieves an accuracy rate above 70%, overcoming the lack of training samples, and verifying the effectiveness of the algorithm. (authors)
Kee Moe Han
Full Text Available Music is the combination of melody linguistic information and the vocalists emotion. Since music is a work of art analyzing emotion in music by computer is a difficult task. Many approaches have been developed to detect the emotions included in music but the results are not satisfactory because emotion is very complex. In this paper the evaluations of audio features from the music files are presented. The extracted features are used to classify the different emotion classes of the vocalists. Musical features extraction is done by using Music Information Retrieval MIR tool box in this paper. The database of 100 music clips are used to classify the emotions perceived in music clips. Music may contain many emotions according to the vocalists mood such as happy sad nervous bored peace etc. In this paper the audio features related to the emotions of the vocalists are extracted to use in emotion recognition system based on music.
Full Text Available Automatic recognition of arrhythmias is particularly important in the diagnosis of heart diseases. This study presents an electrocardiogram (ECG recognition system based on multi-domain feature extraction to classify ECG beats. An improved wavelet threshold method for ECG signal pre-processing is applied to remove noise interference. A novel multi-domain feature extraction method is proposed; this method employs kernel-independent component analysis in nonlinear feature extraction and uses discrete wavelet transform to extract frequency domain features. The proposed system utilises a support vector machine classifier optimized with a genetic algorithm to recognize different types of heartbeats. An ECG acquisition experimental platform, in which ECG beats are collected as ECG data for classification, is constructed to demonstrate the effectiveness of the system in ECG beat classification. The presented system, when applied to the MIT-BIH arrhythmia database, achieves a high classification accuracy of 98.8%. Experimental results based on the ECG acquisition experimental platform show that the system obtains a satisfactory classification accuracy of 97.3% and is able to classify ECG beats efficiently for the automatic identification of cardiac arrhythmias.
Wang, Xiaohua; Xia, Chen; Hu, Min; Ren, Fuji
Facial expression recognition under partial occlusion is a challenging research. This paper proposes a novel framework for facial expression recognition under occlusion by fusing the global and local features. In global aspect, first, information entropy are employed to locate the occluded region. Second, principal Component Analysis (PCA) method is adopted to reconstruct the occlusion region of image. After that, a replace strategy is applied to reconstruct image by replacing the occluded region with the corresponding region of the best matched image in training set, Pyramid Weber Local Descriptor (PWLD) feature is then extracted. At last, the outputs of SVM are fitted to the probabilities of the target class by using sigmoid function. For the local aspect, an overlapping block-based method is adopted to extract WLD features, and each block is weighted adaptively by information entropy, Chi-square distance and similar block summation methods are then applied to obtain the probabilities which emotion belongs to. Finally, fusion at the decision level is employed for the data fusion of the global and local features based on Dempster-Shafer theory of evidence. Experimental results on the Cohn-Kanade and JAFFE databases demonstrate the effectiveness and fault tolerance of this method.
We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.
Li, Huibin; Ding, Huaxiong; Huang, Di; Wang, Yunhong; Zhao, Xi; Morvan, Jean-Marie; Chen, Liming
We present a fully automatic multimodal 2D + 3D feature-based facial expression recognition approach and demonstrate its performance on the BU-3DFE database. Our approach combines multi-order gradient-based local texture and shape descriptors in order to achieve efficiency and robustness. First, a large set of fiducial facial landmarks of 2D face images along with their 3D face scans are localized using a novel algorithm namely incremental Parallel Cascade of Linear Regression (iPar-CLR). Then, a novel Histogram of Second Order Gradients (HSOG) based local image descriptor in conjunction with the widely used first-order gradient based SIFT descriptor are used to describe the local texture around each 2D landmark. Similarly, the local geometry around each 3D landmark is described by two novel local shape descriptors constructed using the first-order and the second-order surface differential geometry quantities, i.e., Histogram of mesh Gradients (meshHOG) and Histogram of mesh Shape index (curvature quantization, meshHOS). Finally, the Support Vector Machine (SVM) based recognition results of all 2D and 3D descriptors are fused at both feature-level and score-level to further improve the accuracy. Comprehensive experimental results demonstrate that there exist impressive complementary characteristics between the 2D and 3D descriptors. We use the BU-3DFE benchmark to compare our approach to the state-of-the-art ones. Our multimodal feature-based approach outperforms the others by achieving an average recognition accuracy of 86.32%. Moreover, a good generalization ability is shown on the Bosphorus database.
Full Text Available Humans are able to recognize small number of people they know well by the way they walk. This ability represents basic motivation for using human gait as the means for biometric identification. Such biometrics can be captured at public places from a distance without subject's collaboration, awareness, and even consent. Although current approaches give encouraging results, we are still far from effective use in real-life applications. In general, methods set various constraints to circumvent the influence of covariate factors like changes of walking speed, view, clothing, footwear, and object carrying, that have negative impact on recognition performance. In this paper we propose a skeleton model based gait recognition system focusing on modelling gait dynamics and eliminating the influence of subjects appearance on recognition. Furthermore, we tackle the problem of walking speed variation and propose space transformation and feature fusion that mitigates its influence on recognition performance. With the evaluation on OU-ISIR gait dataset, we demonstrate state of the art performance of proposed methods.
Heba Hamdy Ali
Full Text Available Depth Maps-based Human Activity Recognition is the process of categorizing depth sequences with a particular activity. In this problem, some applications represent robust solutions in domains such as surveillance system, computer vision applications, and video retrieval systems. The task is challenging due to variations inside one class and distinguishes between activities of various classes and video recording settings. In this study, we introduce a detailed study of current advances in the depth maps-based image representations and feature extraction process. Moreover, we discuss the state of art datasets and subsequent classification procedure. Also, a comparative study of some of the more popular depth-map approaches has provided in greater detail. The proposed methods are evaluated on three depth-based datasets “MSR Action 3D”, “MSR Hand Gesture”, and “MSR Daily Activity 3D”. Experimental results achieved 100%, 95.83%, and 96.55% respectively. While combining depth and color features on “RGBD-HuDaAct” Dataset, achieved 89.1%. Keywords: Activity recognition, Depth, Feature extraction, Video, Human body detection, Hand gesture
Anthimopoulos, Marios M; Gianola, Lauro; Scarnato, Luca; Diem, Peter; Mougiakakou, Stavroula G
Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the bag-of-features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; Chen, Huiling; He, Fei; Pang, Yutong
For building a new iris template, this paper proposes a strategy to fuse different portions of iris based on machine learning method to evaluate local quality of iris. There are three novelties compared to previous work. Firstly, the normalized segmented iris is divided into multitracks and then each track is estimated individually to analyze the recognition accuracy rate (RAR). Secondly, six local quality evaluation parameters are adopted to analyze texture information of each track. Besides, particle swarm optimization (PSO) is employed to get the weights of these evaluation parameters and corresponding weighted coefficients of different tracks. Finally, all tracks' information is fused according to the weights of different tracks. The experimental results based on subsets of three public and one private iris image databases demonstrate three contributions of this paper. (1) Our experimental results prove that partial iris image cannot completely replace the entire iris image for iris recognition system in several ways. (2) The proposed quality evaluation algorithm is a self-adaptive algorithm, and it can automatically optimize the parameters according to iris image samples' own characteristics. (3) Our feature information fusion strategy can effectively improve the performance of iris recognition system.
Chen, Yameng; Sun, Gengxin; Lei, Yiming; Zhang, Jinpeng
Liver disease is one of the main causes of human healthy problem. Cirrhosis, of course, is the critical phase during the development of liver lesion, especially the hepatoma. Many clinical cases are still influenced by the subjectivity of physicians in some degree, and some objective factors such as illumination, scale, edge blurring will affect the judgment of clinicians. Then the subjectivity will affect the accuracy of diagnosis and the treatment of patients. In order to solve the difficulty above and improve the recognition rate of liver cirrhosis, we propose a method of multi-feature fusion to obtain more robust representations of texture in ultrasound liver images, the texture features we extract include local binary pattern(LBP), gray level co-occurrence matrix(GLCM) and histogram of oriented gradient(HOG). In this paper, we firstly make a fusion of multi-feature to recognize cirrhosis and normal liver based on parallel combination concept, and the experimental results shows that the classifier is effective for cirrhosis recognition which is evaluated by the satisfying classification rate, sensitivity and specificity of receiver operating characteristic(ROC), and cost time. Through the method we proposed, it will be helpful to improve the accuracy of diagnosis of cirrhosis and prevent the development of liver lesion towards hepatoma.
Kortelainen, Jukka; Seppänen, Tapio
Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.
Fong, Simon; Song, Wei; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L
In this paper, a novel training/testing process for building/using a classification model based on human activity recognition (HAR) is proposed. Traditionally, HAR has been accomplished by a classifier that learns the activities of a person by training with skeletal data obtained from a motion sensor, such as Microsoft Kinect. These skeletal data are the spatial coordinates (x, y, z) of different parts of the human body. The numeric information forms time series, temporal records of movement sequences that can be used for training a classifier. In addition to the spatial features that describe current positions in the skeletal data, new features called 'shadow features' are used to improve the supervised learning efficacy of the classifier. Shadow features are inferred from the dynamics of body movements, and thereby modelling the underlying momentum of the performed activities. They provide extra dimensions of information for characterising activities in the classification process, and thereby significantly improve the classification accuracy. Two cases of HAR are tested using a classification model trained with shadow features: one is by using wearable sensor and the other is by a Kinect-based remote sensor. Our experiments can demonstrate the advantages of the new method, which will have an impact on human activity detection research.
da Costa, R M; Gonzaga, A
The human eye is sensitive to visible light. Increasing illumination on the eye causes the pupil of the eye to contract, while decreasing illumination causes the pupil to dilate. Visible light causes specular reflections inside the iris ring. On the other hand, the human retina is less sensitive to near infra-red (NIR) radiation in the wavelength range from 800 nm to 1400 nm, but iris detail can still be imaged with NIR illumination. In order to measure the dynamic movement of the human pupil and iris while keeping the light-induced reflexes from affecting the quality of the digitalized image, this paper describes a device based on the consensual reflex. This biological phenomenon contracts and dilates the two pupils synchronously when illuminating one of the eyes by visible light. In this paper, we propose to capture images of the pupil of one eye using NIR illumination while illuminating the other eye using a visible-light pulse. This new approach extracts iris features called "dynamic features (DFs)." This innovative methodology proposes the extraction of information about the way the human eye reacts to light, and to use such information for biometric recognition purposes. The results demonstrate that these features are discriminating features, and, even using the Euclidean distance measure, an average accuracy of recognition of 99.1% was obtained. The proposed methodology has the potential to be "fraud-proof," because these DFs can only be extracted from living irises.
Full Text Available In this paper, a novel training/testing process for building/using a classification model based on human activity recognition (HAR is proposed. Traditionally, HAR has been accomplished by a classifier that learns the activities of a person by training with skeletal data obtained from a motion sensor, such as Microsoft Kinect. These skeletal data are the spatial coordinates (x, y, z of different parts of the human body. The numeric information forms time series, temporal records of movement sequences that can be used for training a classifier. In addition to the spatial features that describe current positions in the skeletal data, new features called ‘shadow features’ are used to improve the supervised learning efficacy of the classifier. Shadow features are inferred from the dynamics of body movements, and thereby modelling the underlying momentum of the performed activities. They provide extra dimensions of information for characterising activities in the classification process, and thereby significantly improve the classification accuracy. Two cases of HAR are tested using a classification model trained with shadow features: one is by using wearable sensor and the other is by a Kinect-based remote sensor. Our experiments can demonstrate the advantages of the new method, which will have an impact on human activity detection research.
Kakoty, Nayan M; Hazarika, Shyamanta M
With the advancement in machine learning and signal processing techniques, electromyogram (EMG) signals have increasingly gained importance in man-machine interaction. Multifingered hand prostheses using surface EMG for control has appeared in the market. However, EMG based control is still rudimentary, being limited to a few hand postures based on higher number of EMG channels. Moreover, control is non-intuitive, in the sense that the user is required to learn to associate muscle remnants actions to unrelated posture of the prosthesis. Herein lies the promise of a low channel EMG based grasp classification architecture for development of an embedded intelligent prosthetic controller. This paper reports classification of six grasp types used during 70% of daily living activities based on two channel forearm EMG. A feature vector through principal component analysis of discrete wavelet transform coefficients based features of the EMG signal is derived. Classification is through radial basis function kernel based support vector machine following preprocessing and maximum voluntary contraction normalization of EMG signals. 10-fold cross validation is done. We have achieved an average recognition rate of 97.5%. © 2011 IEEE
Full Text Available This study examines the complexities of using netted radar to recognize and resolve ballistic midcourse targets. The application of micro-motion feature extraction to ballistic mid-course targets is analyzed, and the current status of application and research on micro-motion feature recognition is concluded for singlefunction radar networks such as low- and high-resolution imaging radar networks. Advantages and disadvantages of these networks are discussed with respect to target recognition. Hybrid-mode radar networks combine low- and high-resolution imaging radar and provide a specific reference frequency that is the basis for ballistic target recognition. Main research trends are discussed for hybrid-mode networks that apply micromotion feature extraction to ballistic mid-course targets.
Du, C; Zhou, S; Sun, J; Zhao, J
On the basis of manifold learning theory, a new feature extraction method for Synthetic aperture radar (SAR) target recognition is proposed. First, the proposed algorithm estimates the within-class and between-class local neighbourhood surrounding each SAR sample. After computing the local tangent space for each neighbourhood, the proposed algorithm seeks for the optimal projecting matrix by preserving the local within-class property and simultaneously maximizing the local between-class separability. The use of uncorrelated constraint can also enhance the discriminating power of the optimal projecting matrix. Finally, the nearest neighbour classifier is applied to recognize SAR targets in the projected feature subspace. Experimental results on MSTAR datasets demonstrate that the proposed method can provide a higher recognition rate than traditional feature extraction algorithms in SAR target recognition
Rahulkar, Amol D
This book provides the new results in wavelet filter banks based feature extraction, and the classifier in the field of iris image recognition. It provides the broad treatment on the design of separable, non-separable wavelets filter banks, and the classifier. The design techniques presented in the book are applied on iris image analysis for person authentication. This book also brings together the three strands of research (wavelets, iris image analysis, and classifier). It compares the performance of the presented techniques with state-of-the-art available schemes. This book contains the compilation of basic material on the design of wavelets that avoids reading many different books. Therefore, it provide an easier path for the new-comers, researchers to master the contents. In addition, the designed filter banks and classifier can also be effectively used than existing filter-banks in many signal processing applications like pattern classification, data-compression, watermarking, denoising etc. that will...
Echeagaray-Patrón, B. A.; Kober, Vitaly
3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition and its wide range of applications such as access control, surveillance, human-computer interaction and biometric identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared with that of conventional face recognition algorithms based on descriptors.
Chidananda, H.; Reddy, T. Hanumantha
This paper presents a natural representation of numerical digit(s) using hand activity analysis based on number of fingers out stretched for each numerical digit in sequence extracted from a video. The analysis is based on determining a set of six features from a hand image. The most important features used from each frame in a video are the first fingertip from top, palm-line, palm-center, valley points between the fingers exists above the palm-line. Using this work user can convey any number of numerical digits using right or left or both the hands naturally in a video. Each numerical digit ranges from 0 to9. Hands (right/left/both) used to convey digits can be recognized accurately using the valley points and with this recognition whether the user is a right / left handed person in practice can be analyzed. In this work, first the hand(s) and face parts are detected by using YCbCr color space and face part is removed by using ellipse based method. Then, the hand(s) are analyzed to recognize the activity that represents a series of numerical digits in a video. This work uses pixel continuity algorithm using 2D coordinate geometry system and does not use regular use of calculus, contours, convex hull and datasets.
Jamal Ahmad Dargham
Full Text Available Face recognition is an important biometric method because of its potential applications in many fields, such as access control, surveillance, and human-computer interaction. In this paper, a face recognition system that fuses the outputs of three face recognition systems based on Gabor jets is presented. The first system uses the magnitude, the second uses the phase, and the third uses the phase-weighted magnitude of the jets. The jets are generated from facial landmarks selected using three selection methods. It was found out that fusing the facial features gives better recognition rate than either facial feature used individually regardless of the landmark selection method.
Full Text Available Forensic speaker recognition is experiencing a remarkable paradigm shift in terms of the evaluation framework and presentation of voice evidence. This paper proposes a new method of forensic automatic speaker recognition using the likelihood ratio framework to quantify the strength of voice evidence. The proposed method uses a reference database to calculate the within- and between-speaker variability. Some acoustic-phonetic features are extracted automatically using the software VoiceSauce. The effectiveness of the approach was tested using two Mandarin databases: A mobile telephone database and a landline database. The experiment's results indicate that these acoustic-phonetic features do have some discriminating potential and are worth trying in discrimination. The automatic acoustic-phonetic features have acceptable discriminative performance and can provide more reliable results in evidence analysis when fused with other kind of voice features.
Betta, G.; Capriglione, D.; Crenna, F.; Rossi, G. B.; Gasparetto, M.; Zappa, E.; Liguori, C.; Paolillo, A.
Security systems based on face recognition through video surveillance systems deserve great interest. Their use is important in several areas including airport security, identification of individuals and access control to critical areas. These systems are based either on the measurement of details of a human face or on a global approach whereby faces are considered as a whole. The recognition is then performed by comparing the measured parameters with reference values stored in a database. The result of this comparison is not deterministic because measurement results are affected by uncertainty due to random variations and/or to systematic effects. In these circumstances the recognition of a face is subject to the risk of a faulty decision. Therefore, a proper metrological characterization is needed to improve the performance of such systems. Suitable methods are proposed for a quantitative metrological characterization of face measurement systems, on which recognition procedures are based. The proposed methods are applied to three different algorithms based either on linear discrimination, on eigenface analysis, or on feature detection.
Betta, G; Capriglione, D; Crenna, F; Rossi, G B; Gasparetto, M; Zappa, E; Liguori, C; Paolillo, A
Security systems based on face recognition through video surveillance systems deserve great interest. Their use is important in several areas including airport security, identification of individuals and access control to critical areas. These systems are based either on the measurement of details of a human face or on a global approach whereby faces are considered as a whole. The recognition is then performed by comparing the measured parameters with reference values stored in a database. The result of this comparison is not deterministic because measurement results are affected by uncertainty due to random variations and/or to systematic effects. In these circumstances the recognition of a face is subject to the risk of a faulty decision. Therefore, a proper metrological characterization is needed to improve the performance of such systems. Suitable methods are proposed for a quantitative metrological characterization of face measurement systems, on which recognition procedures are based. The proposed methods are applied to three different algorithms based either on linear discrimination, on eigenface analysis, or on feature detection
ROH Yong-Wan; KIM Dong-Ju; LEE Woo-Seok; HONG Kwang-Seok
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB)spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively.These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness,happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR),linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Memiş, Abbas; Albayrak, Songül
This paper presents a sign language recognition system that uses spatio-temporal features on RGB video images and depth maps for dynamic gestures of Turkish Sign Language. Proposed system uses motion differences and accumulation approach for temporal gesture analysis. Motion accumulation method, which is an effective method for temporal domain analysis of gestures, produces an accumulated motion image by combining differences of successive video frames. Then, 2D Discrete Cosine Transform (DCT) is applied to accumulated motion images and temporal domain features transformed into spatial domain. These processes are performed on both RGB images and depth maps separately. DCT coefficients that represent sign gestures are picked up via zigzag scanning and feature vectors are generated. In order to recognize sign gestures, K-Nearest Neighbor classifier with Manhattan distance is performed. Performance of the proposed sign language recognition system is evaluated on a sign database that contains 1002 isolated dynamic signs belongs to 111 words of Turkish Sign Language (TSL) in three different categories. Proposed sign language recognition system has promising success rates.
Full Text Available Hand posture recognition is an essential module in applications such as human-computer interaction (HCI, games, and sign language systems, in which performance and robustness are the primary requirements. In this paper, we proposed automatic classification to recognize 21 hand postures that represent letters in Thai finger-spelling based on Histogram of Orientation Gradient (HOG feature (which is applied with more focus on the information within certain region of the image rather than each single pixel and Adaptive Boost (i.e., AdaBoost learning technique to select the best weak classifier and to construct a strong classifier that consists of several weak classifiers to be cascaded in detection architecture. We collected 21 static hand posture images from 10 subjects for testing and training in Thai letters finger-spelling. The parameters for the training process have been adjusted in three experiments, false positive rates (FPR, true positive rates (TPR, and number of training stages (N, to achieve the most suitable training model for each hand posture. All cascaded classifiers are loaded into the system simultaneously to classify different hand postures. A correlation coefficient is computed to distinguish the hand postures that are similar. The system achieves approximately 78% accuracy on average on all classifier experiments.
Nicole A Capela
Full Text Available Human activity recognition (HAR, using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter. The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree. Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
ROH; Yong-Wan; KIM; Dong-Ju; LEE; Woo-Seok; HONG; Kwang-Seok
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech.The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform(FFT) spectral entropy,delta FFT spectral entropy,Mel-frequency filter bank(MFB) spectral entropy,and Delta MFB spectral entropy.Spectral-based entropy features are simple.They reflect frequency characteristic and changing characteristic in frequency of speech.We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores.This reduces the false recognition rate to improve overall performance.Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results,respectively.These scores are first obtained from a pattern recognition procedure.The pattern recognition phase uses the Gaussian mixture model(GMM).We classify the four emotional states as anger,sadness,happiness and neutrality.The proposed method is evaluated using 45 sentences in each emotion for 30 subjects,15 males and 15 females.Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy,Zero Crossing Rate(ZCR),linear prediction coefficient(LPC),and pitch parameters.We demonstrate the effectiveness of the proposed approach.One of the proposed features,combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods.We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Yuan, Xue; Hao, Xiaoli; Chen, Houjin; Wei, Xueye
A new context-aware scale-invariant feature transform (CASIFT) approach is proposed, which is designed for the use in traffic sign recognition (TSR) systems. The following issues remain in previous works in which SIFT is used for matching or recognition: (1) SIFT is unable to provide color information; (2) SIFT only focuses on local features while ignoring the distribution of global shapes; (3) the template with the maximum number of matching points selected as the final result is instable, especially for images with simple patterns; and (4) SIFT is liable to result in errors when different images share the same local features. In order to resolve these problems, a new CASIFT approach is proposed. The contributions of the work are as follows: (1) color angular patterns are used to provide the color distinguishing information; (2) a CASIFT which effectively combines local and global information is proposed; and (3) a method for computing the similarity between two images is proposed, which focuses on the distribution of the matching points, rather than using the traditional SIFT approach of selecting the template with maximum number of matching points as the final result. The proposed approach is particularly effective in dealing with traffic signs which have rich colors and varied global shape distribution. Experiments are performed to validate the effectiveness of the proposed approach in TSR systems, and the experimental results are satisfying even for images containing traffic signs that have been rotated, damaged, altered in color, have undergone affine transformations, or images which were photographed under different weather or illumination conditions.
Xu, Huile; Liu, Jinyi; Hu, Haibo; Zhang, Yi
Wearable sensors-based human activity recognition introduces many useful applications and services in health care, rehabilitation training, elderly monitoring and many other areas of human interaction. Existing works in this field mainly focus on recognizing activities by using traditional features extracted from Fourier transform (FT) or wavelet transform (WT). However, these signal processing approaches are suitable for a linear signal but not for a nonlinear signal. In this paper, we investigate the characteristics of the Hilbert-Huang transform (HHT) for dealing with activity data with properties such as nonlinearity and non-stationarity. A multi-features extraction method based on HHT is then proposed to improve the effect of activity recognition. The extracted multi-features include instantaneous amplitude (IA) and instantaneous frequency (IF) by means of empirical mode decomposition (EMD), as well as instantaneous energy density (IE) and marginal spectrum (MS) derived from Hilbert spectral analysis. Experimental studies are performed to verify the proposed approach by using the PAMAP2 dataset from the University of California, Irvine for wearable sensors-based activity recognition. Moreover, the effect of combining multi-features vs. a single-feature are investigated and discussed in the scenario of a dependent subject. The experimental results show that multi-features combination can further improve the performance measures. Finally, we test the effect of multi-features combination in the scenario of an independent subject. Our experimental results show that we achieve four performance indexes: recall, precision, F-measure, and accuracy to 0.9337, 0.9417, 0.9353, and 0.9377 respectively, which are all better than the achievements of related works.
Full Text Available Wearable sensors-based human activity recognition introduces many useful applications and services in health care, rehabilitation training, elderly monitoring and many other areas of human interaction. Existing works in this field mainly focus on recognizing activities by using traditional features extracted from Fourier transform (FT or wavelet transform (WT. However, these signal processing approaches are suitable for a linear signal but not for a nonlinear signal. In this paper, we investigate the characteristics of the Hilbert-Huang transform (HHT for dealing with activity data with properties such as nonlinearity and non-stationarity. A multi-features extraction method based on HHT is then proposed to improve the effect of activity recognition. The extracted multi-features include instantaneous amplitude (IA and instantaneous frequency (IF by means of empirical mode decomposition (EMD, as well as instantaneous energy density (IE and marginal spectrum (MS derived from Hilbert spectral analysis. Experimental studies are performed to verify the proposed approach by using the PAMAP2 dataset from the University of California, Irvine for wearable sensors-based activity recognition. Moreover, the effect of combining multi-features vs. a single-feature are investigated and discussed in the scenario of a dependent subject. The experimental results show that multi-features combination can further improve the performance measures. Finally, we test the effect of multi-features combination in the scenario of an independent subject. Our experimental results show that we achieve four performance indexes: recall, precision, F-measure, and accuracy to 0.9337, 0.9417, 0.9353, and 0.9377 respectively, which are all better than the achievements of related works.
Full Text Available be recognized simultaneously, and occlusion and clutter (through distracter objects) is common. We propose a representation for object viewpoints using Hough transform based geometric matching features, which are robust in such circumstances. We show how...
Sultana, Maryam; Bhatti, Naeem; Javed, Sajid; Jung, Soon Ki
Facial expression recognition (FER) is an important task for various computer vision applications. The task becomes challenging when it requires the detection and encoding of macro- and micropatterns of facial expressions. We present a two-stage texture feature extraction framework based on the local binary pattern (LBP) variants and evaluate its significance in recognizing posed and nonposed facial expressions. We focus on the parametric limitations of the LBP variants and investigate their effects for optimal FER. The size of the local neighborhood is an important parameter of the LBP technique for its extraction in images. To make the LBP adaptive, we exploit the granulometric information of the facial images to find the local neighborhood size for the extraction of center-symmetric LBP (CS-LBP) features. Our two-stage texture representations consist of an LBP variant and the adaptive CS-LBP features. Among the presented two-stage texture feature extractions, the binarized statistical image features and adaptive CS-LBP features were found showing high FER rates. Evaluation of the adaptive texture features shows competitive and higher performance than the nonadaptive features and other state-of-the-art approaches, respectively.
He, Fei; Liu, Yuanning; Zhu, Xiaodong; Huang, Chun; Han, Ye; Dong, Hongxing
Gabor descriptors have been widely used in iris texture representations. However, fixed basic Gabor functions cannot match the changing nature of diverse iris datasets. Furthermore, a single form of iris feature cannot overcome difficulties in iris recognition, such as illumination variations, environmental conditions, and device variations. This paper provides multiple local feature representations and their fusion scheme based on a support vector regression (SVR) model for iris recognition using optimized Gabor filters. In our iris system, a particle swarm optimization (PSO)- and a Boolean particle swarm optimization (BPSO)-based algorithm is proposed to provide suitable Gabor filters for each involved test dataset without predefinition or manual modulation. Several comparative experiments on JLUBR-IRIS, CASIA-I, and CASIA-V4-Interval iris datasets are conducted, and the results show that our work can generate improved local Gabor features by using optimized Gabor filters for each dataset. In addition, our SVR fusion strategy may make full use of their discriminative ability to improve accuracy and reliability. Other comparative experiments show that our approach may outperform other popular iris systems.
Ghabri, Sawsen; Ouarda, Wael; Alimi, Adel M.
Security and surveillance are vital issues in today's world. The recent acts of terrorism have highlighted the urgent need for efficient surveillance. There is indeed a need for an automated system for video surveillance which can detect identity and activity of person. In this article, we propose a new paradigm to recognize an aggressive human behavior such as boxing action. Our proposed system for human activity detection includes the use of a fusion between Spatio Temporal Interest Point (STIP) and Histogram of Oriented Gradient (HoG) features. The novel feature called Spatio Temporal Histogram Oriented Gradient (STHOG). To evaluate the robustness of our proposed paradigm with a local application of HoG technique on STIP points, we made experiments on KTH human action dataset based on Multi Class Support Vector Machines classification. The proposed scheme outperforms basic descriptors like HoG and STIP to achieve 82.26% us an accuracy value of classification rate.
Full Text Available In this paper, we propose a robust tactile sensing image recognition scheme for automatic robotic assembly. First, an image reprocessing procedure is designed to enhance the contrast of the tactile image. In the second layer, geometric features and Fourier descriptors are extracted from the image. Then, kernel principal component analysis (kernel PCA is applied to transform the features into ones with better discriminating ability, which is the kernel PCA-based feature fusion. The transformed features are fed into the third layer for classification. In this paper, we design a classifier by combining the multiple kernel learning (MKL algorithm and support vector machine (SVM. We also design and implement a tactile sensing array consisting of 10-by-10 sensing elements. Experimental results, carried out on real tactile images acquired by the designed tactile sensing array, show that the kernel PCA-based feature fusion can significantly improve the discriminating performance of the geometric features and Fourier descriptors. Also, the designed MKL-SVM outperforms the regular SVM in terms of recognition accuracy. The proposed recognition scheme is able to achieve a high recognition rate of over 85% for the classification of 12 commonly used metal parts in industrial applications.
Gilbert, Andrew; Illingworth, John; Bowden, Richard
The field of Action Recognition has seen a large increase in activity in recent years. Much of the progress has been through incorporating ideas from single-frame object recognition and adapting them for temporal-based action recognition. Inspired by the success of interest points in the 2D spatial domain, their 3D (space-time) counterparts typically form the basic components used to describe actions, and in action recognition the features used are often engineered to fire sparsely. This is to ensure that the problem is tractable; however, this can sacrifice recognition accuracy as it cannot be assumed that the optimum features in terms of class discrimination are obtained from this approach. In contrast, we propose to initially use an overcomplete set of simple 2D corners in both space and time. These are grouped spatially and temporally using a hierarchical process, with an increasing search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining. This allows large amounts of data to be searched for frequently reoccurring patterns of features. At each level of the hierarchy, the mined compound features become more complex, discriminative, and sparse. This results in fast, accurate recognition with real-time performance on high-resolution video. As the compound features are constructed and selected based upon their ability to discriminate, their speed and accuracy increase at each level of the hierarchy. The approach is tested on four state-of-the-art data sets, the popular KTH data set to provide a comparison with other state-of-the-art approaches, the Multi-KTH data set to illustrate performance at simultaneous multiaction classification, despite no explicit localization information provided during training. Finally, the recent Hollywood and Hollywood2 data sets provide challenging complex actions taken from commercial movie sequences. For all four data sets, the proposed hierarchical
Hortos, William S.
Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at
Zou, Zheng; Wang, Niannian; Zhao, Peng; Zhao, Xuefeng
Ancient architecture has a very high historical and artistic value. The ancient buildings have a wide variety of textures and decorative paintings, which contain a lot of historical meaning. Therefore, the research and statistics work of these different compositional and decorative features play an important role in the subsequent research. However, until recently, the statistics of those components are mainly by artificial method, which consumes a lot of labor and time, inefficiently. At present, as the strong support of big data and GPU accelerated training, machine vision with deep learning as the core has been rapidly developed and widely used in many fields. This paper proposes an idea to recognize and detect the textures, decorations and other features of ancient building based on machine vision. First, classify a large number of surface textures images of ancient building components manually as a set of samples. Then, using the convolution neural network to train the samples in order to get a classification detector. Finally verify its precision.
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. PMID:28335510
Nguyen, Dat Tien; Kim, Ki Wan; Hong, Hyung Gil; Koo, Ja Hyung; Kim, Min Cheol; Park, Kang Ryoung
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Dat Tien Nguyen
Full Text Available Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT, speed-up robust feature (SURF, local binary patterns (LBP, histogram of oriented gradients (HOG, and weighted HOG. Recently, the convolutional neural network (CNN method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Petridis, Stavros; Pantic, Maja
Deep bottleneck features (DBNFs) have been used successfully in the past for acoustic speech recognition from audio. However, research on extracting DBNFs for visual speech recognition is very limited. In this work, we present an approach to extract deep bottleneck visual features based on deep
Xi, Xiaoming; Yang, Gongping; Yin, Yilong; Meng, Xianjing
Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG). For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG.
Full Text Available Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG. For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG.
Full Text Available Myo Armband became an immersive technology to help deaf people for communication each other. The problem on Myo sensor is unstable clock rate. It causes the different length data for the same period even on the same gesture. This research proposes Moment Invariant Method to extract the feature of sensor data from Myo. This method reduces the amount of data and makes the same length of data. This research is user-dependent, according to the characteristics of Myo Armband. The testing process was performed by using alphabet A to Z on SIBI, Indonesian Sign Language, with static and dynamic finger movements. There are 26 class of alphabets and 10 variants in each class. We use min-max normalization for guarantying the range of data. We use K-Nearest Neighbor method to classify dataset. Performance analysis with leave-one-out-validation method produced an accuracy of 82.31%. It requires a more advanced method of classification to improve the performance on the detection results.
Full Text Available This paper introduces a novel method for the recognition of human faces in digital images using a new feature extraction method that combines the global and local information in frontal view of facial images. Radial basis function (RBF neural network with a hybrid learning algorithm (HLA has been used as a classifier. The proposed feature extraction method includes human face localization derived from the shape information. An efficient distance measure as facial candidate threshold (FCT is defined to distinguish between face and nonface images. Pseudo-Zernike moment invariant (PZMI with an efficient method for selecting moment order has been used. A newly defined parameter named axis correction ratio (ACR of images for disregarding irrelevant information of face images is introduced. In this paper, the effect of these parameters in disregarding irrelevant information in recognition rate improvement is studied. Also we evaluate the effect of orders of PZMI in recognition rate of the proposed technique as well as RBF neural network learning speed. Simulation results on the face database of Olivetti Research Laboratory (ORL indicate that the proposed method for human face recognition yielded a recognition rate of 99.3%.
Wei, Wei; Jia, Qingxuan
Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.
Rao, K Sreenivasa
In this brief, the authors discuss recently explored spectral (sub-segmental and pitch synchronous) and prosodic (global and local features at word and syllable levels in different parts of the utterance) features for discerning emotions in a robust manner. The authors also delve into the complementary evidences obtained from excitation source, vocal tract system and prosodic features for the purpose of enhancing emotion recognition performance. Features based on speaking rate characteristics are explored with the help of multi-stage and hybrid models for further improving emotion recognition performance. Proposed spectral and prosodic features are evaluated on real life emotional speech corpus.
This brief presents a comprehensive introduction to feature coding, which serves as a key module for the typical object recognition pipeline. The text offers a rich blend of theory and practice while reflects the recent developments on feature coding, covering the following five aspects: (1) Review the state-of-the-art, analyzing the motivations and mathematical representations of various feature coding methods; (2) Explore how various feature coding algorithms evolve along years; (3) Summarize the main characteristics of typical feature coding algorithms and categorize them accordingly; (4) D
Yan, Yan; Lee, Feifei; Wu, Xueqian; Chen, Qiu
In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.
Full Text Available In this paper, feature extraction based on data-wave is proposed. The concept of data-wave is introduced to describe the rising and falling trends of the data over the long-term which are detected based on ripple and wave filters. Supported by data-wave, a novel symbol identifier with significant structure features is designed and these features are extracted by constructing pixel chains. On this basis, the corresponding recognition and positioning approach is presented. The effectiveness of the proposed approach is verified by experiments.
Wang, Yuehao; Peng, Lingling; Zhe, Fuchuan
In this paper we propose a novel face recognition approach based on slow feature analysis (SFA) in contourlet transform domain. This method firstly use contourlet transform to decompose the face image into low frequency and high frequency part, and then takes technological advantages of slow feature analysis for facial feature extraction. We named the new method combining the slow feature analysis and contourlet transform as CT-SFA. The experimental results on international standard face database demonstrate that the new face recognition method is effective and competitive.
Full Text Available Lately a lot of research effort is devoted for recognition of a human being using his biometric characteristics. Biometric recognition systems are used in various applications, e. g., identification for state border crossing or firearm, which allows only enrolled persons to use it. In this paper biometric characteristics and their properties are reviewed. Development of high accuracy system requires distinctive and permanent characteristics, whereas development of user friendly system requires collectable and acceptable characteristics. It is showed that properties of biometric characteristics do not influence research effort significantly. Properties of biometric characteristic features and their influence are discussed.Article in Lithuanian
Full Text Available Increase in number of elderly people who are living independently needs especial care in the form of healthcare monitoring systems. Recent advancements in depth video technologies have made human activity recognition (HAR realizable for elderly healthcare applications. In this paper, a depth video-based novel method for HAR is presented using robust multi-features and embedded Hidden Markov Models (HMMs to recognize daily life activities of elderly people living alone in indoor environment such as smart homes. In the proposed HAR framework, initially, depth maps are analyzed by temporal motion identification method to segment human silhouettes from noisy background and compute depth silhouette area for each activity to track human movements in a scene. Several representative features, including invariant, multi-view differentiation and spatiotemporal body joints features were fused together to explore gradient orientation change, intensity differentiation, temporal variation and local motion of specific body parts. Then, these features are processed by the dynamics of their respective class and learned, modeled, trained and recognized with specific embedded HMM having active feature values. Furthermore, we construct a new online human activity dataset by a depth sensor to evaluate the proposed features. Our experiments on three depth datasets demonstrated that the proposed multi-features are efficient and robust over the state of the art features for human action and activity recognition.
Odinokikh, G.; Fartukov, A.; Korobkin, M.; Yoo, J.
One of the basic stages of iris recognition pipeline is iris feature vector construction procedure. The procedure represents the extraction of iris texture information relevant to its subsequent comparison. Thorough investigation of feature vectors obtained from iris showed that not all the vector elements are equally relevant. There are two characteristics which determine the vector element utility: fragility and discriminability. Conventional iris feature extraction methods consider the concept of fragility as the feature vector instability without respect to the nature of such instability appearance. This work separates sources of the instability into natural and encodinginduced which helps deeply investigate each source of instability independently. According to the separation concept, a novel approach of iris feature vector construction is proposed. The approach consists of two steps: iris feature extraction using Gabor filtering with optimal parameters and quantization with separated preliminary optimized fragility thresholds. The proposed method has been tested on two different datasets of iris images captured under changing environmental conditions. The testing results show that the proposed method surpasses all the methods considered as a prior art by recognition accuracy on both datasets.
Full Text Available This paper introduces a method for human action recognition based on optical flow motion features extraction. Automatic spatial and temporal alignments are combined together in order to encourage the temporal consistence on each action by an enhanced dynamic time warping (DTW algorithm. At the same time, a fast method based on coarse-to-fine DTW constraint to improve computational performance without reducing accuracy is induced. The main contributions of this study include (1 a joint spatial-temporal multiresolution optical flow computation method which can keep encoding more informative motion information than recent proposed methods, (2 an enhanced DTW method to improve temporal consistence of motion in action recognition, and (3 coarse-to-fine DTW constraint on motion features pyramids to speed up recognition performance. Using this method, high recognition accuracy is achieved on different action databases like Weizmann database and KTH database.
Full Text Available The classification and recognition technology of underwater acoustic signal were always an important research content in the field of underwater acoustic signal processing. Currently, wavelet transform, Hilbert-Huang transform, and Mel frequency cepstral coefficients are used as a method of underwater acoustic signal feature extraction. In this paper, a method for feature extraction and identification of underwater noise data based on CNN and ELM is proposed. An automatic feature extraction method of underwater acoustic signals is proposed using depth convolution network. An underwater target recognition classifier is based on extreme learning machine. Although convolution neural networks can execute both feature extraction and classification, their function mainly relies on a full connection layer, which is trained by gradient descent-based; the generalization ability is limited and suboptimal, so an extreme learning machine (ELM was used in classification stage. Firstly, CNN learns deep and robust features, followed by the removing of the fully connected layers. Then ELM fed with the CNN features is used as the classifier to conduct an excellent classification. Experiments on the actual data set of civil ships obtained 93.04% recognition rate; compared to the traditional Mel frequency cepstral coefficients and Hilbert-Huang feature, recognition rate greatly improved.
Full Text Available Long-term place recognition in outdoor environments remains a challenge due to high appearance changes in the environment. The problem becomes even more difficult when the matching between two scenes has to be made with information coming from different visual sources, particularly with different spectral ranges. For instance, an infrared camera is helpful for night vision in combination with a visible camera. In this paper, we emphasize our work on testing usual feature point extractors under both constraints: repeatability across spectral ranges and long-term appearance. We develop a new feature extraction method dedicated to improve the repeatability across spectral ranges. We conduct an evaluation of feature robustness on long-term datasets coming from different imaging sources (optics, sensors size and spectral ranges with a Bag-of-Words approach. The tests we perform demonstrate that our method brings a significant improvement on the image retrieval issue in a visual place recognition context, particularly when there is a need to associate images from various spectral ranges such as infrared and visible: we have evaluated our approach using visible, Near InfraRed (NIR, Short Wavelength InfraRed (SWIR and Long Wavelength InfraRed (LWIR.
Tao, Dacheng; Li, Xuelong; Wu, Xindong; Maybank, Stephen J
The traditional image representations are not suited to conventional classification methods, such as the linear discriminant analysis (LDA), because of the under sample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. Motivated by the successes of the two dimensional LDA (2DLDA) for face recognition, we develop a general tensor discriminant analysis (GTDA) as a preprocessing step for LDA. The benefits of GTDA compared with existing preprocessing methods, e.g., principal component analysis (PCA) and 2DLDA, include 1) the USP is reduced in subsequent classification by, for example, LDA; 2) the discriminative information in the training tensors is preserved; and 3) GTDA provides stable recognition rates because the alternating projection optimization algorithm to obtain a solution of GTDA converges, while that of 2DLDA does not. We use human gait recognition to validate the proposed GTDA. The averaged gait images are utilized for gait representation. Given the popularity of Gabor function based image decompositions for image understanding and object recognition, we develop three different Gabor function based image representations: 1) the GaborD representation is the sum of Gabor filter responses over directions, 2) GaborS is the sum of Gabor filter responses over scales, and 3) GaborSD is the sum of Gabor filter responses over scales and directions. The GaborD, GaborS and GaborSD representations are applied to the problem of recognizing people from their averaged gait images.A large number of experiments were carried out to evaluate the effectiveness (recognition rate) of gait recognition based on first obtaining a Gabor, GaborD, GaborS or GaborSD image representation, then using GDTA to extract features and finally using LDA for classification. The proposed methods achieved good performance for gait recognition based on image sequences from the USF HumanID Database. Experimental comparisons are made with nine
Vivek Shrivastava; Navdeep Sharma
Optical Character Recognition deals in recognition and classification of characters from an image. For the recognition to be accurate, certain topological and geometrical properties are calculated, based on which a character is classified and recognized. Also, the Human psychology perceives characters by its overall shape and features such as strokes, curves, protrusions, enclosures etc. These properties, also called Features are extracted from the image by means of spatial pixel-...
Prevost, Donald; Doucet, Michel; Bergeron, Alain; Veilleux, Luc; Chevrette, Paul C.; Gingras, Denis J.
A rotation, scale and translation invariant pattern recognition technique is proposed.It is based on Fourier- Mellin Descriptors (FMD). Each FMD is taken as an independent feature of the object, and a set of those features forms a signature. FMDs are naturally rotation invariant. Translation invariance is achieved through pre- processing. A proper normalization of the FMDs gives the scale invariance property. This approach offers the double advantage of providing invariant signatures of the objects, and a dramatic reduction of the amount of data to process. The compressed invariant feature signature is next presented to a multi-layered perceptron neural network. This final step provides some robustness to the classification of the signatures, enabling good recognition behavior under anamorphically scaled distortion. We also present an original feature extraction technique, adapted to optical calculation of the FMDs. A prototype optical set-up was built, and experimental results are presented.
Sébastien M Crouzet
Full Text Available Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically-plausible computational models of (bottom-up pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task.
Full Text Available Sparse representation based on compressed sensing theory has been widely used in the field of face recognition, and has achieved good recognition results. but the face feature extraction based on sparse representation is too simple, and the sparse coefficient is not sparse. In this paper, we improve the classification algorithm based on the fusion of sparse representation and Gabor feature, and then improved algorithm for Gabor feature which overcomes the problem of large dimension of the vector dimension, reduces the computation and storage cost, and enhances the robustness of the algorithm to the changes of the environment.The classification efficiency of sparse representation is determined by the collaborative representation,we simplify the sparse constraint based on L1 norm to the least square constraint, which makes the sparse coefficients both positive and reduce the complexity of the algorithm. Experimental results show that the proposed method is robust to illumination, facial expression and pose variations of face recognition, and the recognition rate of the algorithm is improved.
Full Text Available Automatic recognition of mature fruits in a complex agricultural environment is still a challenge for an autonomous harvesting robot due to various disturbances existing in the background of the image. The bottleneck to robust fruit recognition is reducing influence from two main disturbances: illumination and overlapping. In order to recognize the tomato in the tree canopy using a low-cost camera, a robust tomato recognition algorithm based on multiple feature images and image fusion was studied in this paper. Firstly, two novel feature images, the a*-component image and the I-component image, were extracted from the L*a*b* color space and luminance, in-phase, quadrature-phase (YIQ color space, respectively. Secondly, wavelet transformation was adopted to fuse the two feature images at the pixel level, which combined the feature information of the two source images. Thirdly, in order to segment the target tomato from the background, an adaptive threshold algorithm was used to get the optimal threshold. The final segmentation result was processed by morphology operation to reduce a small amount of noise. In the detection tests, 93% target tomatoes were recognized out of 200 overall samples. It indicates that the proposed tomato recognition method is available for robotic tomato harvesting in the uncontrolled environment with low cost.
Zhao, Yuanshen; Gong, Liang; Huang, Yixiang; Liu, Chengliang
Automatic recognition of mature fruits in a complex agricultural environment is still a challenge for an autonomous harvesting robot due to various disturbances existing in the background of the image. The bottleneck to robust fruit recognition is reducing influence from two main disturbances: illumination and overlapping. In order to recognize the tomato in the tree canopy using a low-cost camera, a robust tomato recognition algorithm based on multiple feature images and image fusion was studied in this paper. Firstly, two novel feature images, the a*-component image and the I-component image, were extracted from the L*a*b* color space and luminance, in-phase, quadrature-phase (YIQ) color space, respectively. Secondly, wavelet transformation was adopted to fuse the two feature images at the pixel level, which combined the feature information of the two source images. Thirdly, in order to segment the target tomato from the background, an adaptive threshold algorithm was used to get the optimal threshold. The final segmentation result was processed by morphology operation to reduce a small amount of noise. In the detection tests, 93% target tomatoes were recognized out of 200 overall samples. It indicates that the proposed tomato recognition method is available for robotic tomato harvesting in the uncontrolled environment with low cost.
Hildebrandt, Mario; Kiltz, Stefan; Dittmann, Jana; Vielhauer, Claus
In crime scene forensics latent fingerprints are found on various substrates. Nowadays primarily physical or chemical preprocessing techniques are applied for enhancing the visibility of the fingerprint trace. In order to avoid altering the trace it has been shown that contact-less sensors offer a non-destructive acquisition approach. Here, the exploitation of fingerprint or substrate properties and the utilization of signal processing techniques are an essential requirement to enhance the fingerprint visibility. However, especially the optimal sensory is often substrate-dependent. An enhanced generic pattern recognition based contrast enhancement approach for scans of a chromatic white light sensor is introduced in Hildebrandt et al.1 using statistical, structural and Benford's law2 features for blocks of 50 micron. This approach achieves very good results for latent fingerprints on cooperative, non-textured, smooth substrates. However, on textured and structured substrates the error rates are very high and the approach thus unsuitable for forensic use cases. We propose the extension of the feature set with semantic features derived from known Gabor filter based exemplar fingerprint enhancement techniques by suggesting an Epsilon-neighborhood of each block in order to achieve an improved accuracy (called fingerprint ridge orientation semantics). Furthermore, we use rotation invariant Hu moments as an extension of the structural features and two additional preprocessing methods (separate X- and Y Sobel operators). This results in a 408-dimensional feature space. In our experiments we investigate and report the recognition accuracy for eight substrates, each with ten latent fingerprints: white furniture surface, veneered plywood, brushed stainless steel, aluminum foil, "Golden-Oak" veneer, non-metallic matte car body finish, metallic car body finish and blued metal. In comparison to Hildebrandt et al.,1 our evaluation shows a significant reduction of the error rates
Full Text Available Speech recognition is about what is being said, irrespective of who is saying. Speech recognition is a growing field. Major progress is taking place on the technology of automatic speech recognition (ASR. Still, there are lots of barriers in this field in terms of recognition rate, background noise, speaker variability, speaking rate, accent etc. Speech recognition rate mainly depends on the selection of features and feature extraction methods. This paper outlines the feature extraction techniques for speaker dependent speech recognition for isolated words. A brief survey of different feature extraction techniques like Mel-Frequency Cepstral Coefficients (MFCC, Linear Predictive Coding Coefficients (LPCC, Perceptual Linear Prediction (PLP, Relative Spectra Perceptual linear Predictive (RASTA-PLP analysis are presented and evaluation is done. Speech recognition has various applications from daily use to commercial use. We have made a speaker dependent system and this system can be useful in many areas like controlling a patient vehicle using simple commands.
GADH,RAJIT; LU,YONG; TAUTGES,TIMOTHY J.
Considerable progress has been made on automatic hexahedral mesh generation in recent years. Several automatic meshing algorithms have proven to be very reliable on certain classes of geometry. While it is always worth pursuing general algorithms viable on more general geometry, a combination of the well-established algorithms is ready to take on classes of complicated geometry. By partitioning the entire geometry into meshable pieces matched with appropriate meshing algorithm the original geometry becomes meshable and may achieve better mesh quality. Each meshable portion is recognized as a meshing feature. This paper, which is a part of the feature based meshing methodology, presents the work on shape recognition and volume decomposition to automatically decompose a CAD model into meshable volumes. There are four phases in this approach: (1) Feature Determination to extinct decomposition features, (2) Cutting Surfaces Generation to form the ''tailored'' cutting surfaces, (3) Body Decomposition to get the imprinted volumes; and (4) Meshing Algorithm Assignment to match volumes decomposed with appropriate meshing algorithms. The feature determination procedure is based on the CLoop feature recognition algorithm that is extended to be more general. Results are demonstrated over several parts with complicated topology and geometry.
HUYue-li; CAOJia-lin; ZHAOQian; FENGXu
Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.
HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu
Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.
Pisharady, Pramod Kumar; Poh, Loh Ai
This book presents a collection of computational intelligence algorithms that addresses issues in visual pattern recognition such as high computational complexity, abundance of pattern features, sensitivity to size and shape variations and poor performance against complex backgrounds. The book has 3 parts. Part 1 describes various research issues in the field with a survey of the related literature. Part 2 presents computational intelligence based algorithms for feature selection and classification. The algorithms are discriminative and fast. The main application area considered is hand posture recognition. The book also discusses utility of these algorithms in other visual as well as non-visual pattern recognition tasks including face recognition, general object recognition and cancer / tumor classification. Part 3 presents biologically inspired algorithms for feature extraction. The visual cortex model based features discussed have invariance with respect to appearance and size of the hand, and provide good...
Nasrollahi, Kamal; Moeslund, Thomas B.
Biometric recognition is still a very difficult task in real-world scenarios wherein unforeseen changes in degradations factors like noise, occlusion, blurriness and illumination can drastically affect the extracted features from the biometric signals. Very recently Haar-like rectangular features...... which have usually been used for object detection were introduced for biometric recognition resulting in systems that are robust against most of the mentioned degradations . The problem with these features is that one can define many different such features for a given biometric signal...... and it is not clear whether all of these features are required for the actual recognition or not. This is exactly what we are dealing with in this paper: How can an initial set of Haar-like rectangular features, that have been used for biometric recognition, be reduced to a set of most influential features...
Wong, Wai Keung; Lai, Zhihui; Xu, Yong; Wen, Jiajun; Ho, Chu Po
Tensor-based object recognition has been widely studied in the past several years. This paper focuses on the issue of joint feature selection from the tensor data and proposes a novel method called joint tensor feature analysis (JTFA) for tensor feature extraction and recognition. In order to obtain a set of jointly sparse projections for tensor feature extraction, we define the modified within-class tensor scatter value and the modified between-class tensor scatter value for regression. The k-mode optimization technique and the L(2,1)-norm jointly sparse regression are combined together to compute the optimal solutions. The convergent analysis, computational complexity analysis and the essence of the proposed method/model are also presented. It is interesting to show that the proposed method is very similar to singular value decomposition on the scatter matrix but with sparsity constraint on the right singular value matrix or eigen-decomposition on the scatter matrix with sparse manner. Experimental results on some tensor datasets indicate that JTFA outperforms some well-known tensor feature extraction and selection algorithms.
Laukka, Petri; Elfenbein, Hillary Anger; Thingujam, Nutankumar S; Rockstuhl, Thomas; Iraki, Frederick K; Chui, Wanda; Althoff, Jean
This study extends previous work on emotion communication across cultures with a large-scale investigation of the physical expression cues in vocal tone. In doing so, it provides the first direct test of a key proposition of dialect theory, namely that greater accuracy of detecting emotions from one's own cultural group-known as in-group advantage-results from a match between culturally specific schemas in emotional expression style and culturally specific schemas in emotion recognition. Study 1 used stimuli from 100 professional actors from five English-speaking nations vocally conveying 11 emotional states (anger, contempt, fear, happiness, interest, lust, neutral, pride, relief, sadness, and shame) using standard-content sentences. Detailed acoustic analyses showed many similarities across groups, and yet also systematic group differences. This provides evidence for cultural accents in expressive style at the level of acoustic cues. In Study 2, listeners evaluated these expressions in a 5 × 5 design balanced across groups. Cross-cultural accuracy was greater than expected by chance. However, there was also in-group advantage, which varied across emotions. A lens model analysis of fundamental acoustic properties examined patterns in emotional expression and perception within and across groups. Acoustic cues were used relatively similarly across groups both to produce and judge emotions, and yet there were also subtle cultural differences. Speakers appear to have a culturally nuanced schema for enacting vocal tones via acoustic cues, and perceivers have a culturally nuanced schema in judging them. Consistent with dialect theory's prediction, in-group judgments showed a greater match between these schemas used for emotional expression and perception. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Irhebhude, Martins E.; Edirisinghe, Eran A.
This paper presents an automatic, machine vision based, military personnel identification and classification system. Classification is done using a Support Vector Machine (SVM) on sets of Army, Air Force and Navy camouflage uniform personnel datasets. In the proposed system, the arm of service of personnel is recognised by the camouflage of a persons uniform, type of cap and the type of badge/logo. The detailed analysis done include; camouflage cap and plain cap differentiation using gray level co-occurrence matrix (GLCM) texture feature; classification on Army, Air Force and Navy camouflaged uniforms using GLCM texture and colour histogram bin features; plain cap badge classification into Army, Air Force and Navy using Speed Up Robust Feature (SURF). The proposed method recognised camouflage personnel arm of service on sets of data retrieved from google images and selected military websites. Correlation-based Feature Selection (CFS) was used to improve recognition and reduce dimensionality, thereby speeding the classification process. With this method success rates recorded during the analysis include 93.8% for camouflage appearance category, 100%, 90% and 100% rates of plain cap and camouflage cap categories for Army, Air Force and Navy categories, respectively. Accurate recognition was recorded using SURF for the plain cap badge category. Substantial analysis has been carried out and results prove that the proposed method can correctly classify military personnel into various arms of service. We show that the proposed method can be integrated into a face recognition system, which will recognise personnel in addition to determining the arm of service which the personnel belong. Such a system can be used to enhance the security of a military base or facility.
Surinta, Olarik; Karaaba, Mahir F.; Schomaker, Lambert R.B.; Wiering, Marco A.
Abstract In this paper we propose to use local gradient feature descriptors, namely the scale invariant feature transform keypoint descriptor and the histogram of oriented gradients, for handwritten character recognition. The local gradient feature descriptors are used to extract feature vectors
Duan, Xiaodong; Tan, Zheng-Hua
In this paper, we present a local feature learning method for face recognition to deal with varying poses. As opposed to the commonly used approaches of recovering frontal face images from profile views, the proposed method extracts the subject related part from a local feature by removing the pose...... related part in it on the basis of a pose feature. The method has a closed-form solution, hence being time efficient. For performance evaluation, cross pose face recognition experiments are conducted on two public face recognition databases FERET and FEI. The proposed method shows a significant...... recognition improvement under varying poses over general local feature approaches and outperforms or is comparable with related state-of-the-art pose invariant face recognition approaches. Copyright ©2015 by IEEE....
Ming, Guan; Fang, Lv
Biometric identification technology replaces traditional security technology, which has become a trend, and gait recognition also has become a hot spot of research because its feature is difficult to imitate and theft. This paper presents a gait recognition system based on integral outline of human body. The system has three important aspects: the preprocessing of gait image, feature extraction and classification. Finally, using a method of polling to evaluate the performance of the system, and summarizing the problems existing in the gait recognition and the direction of development in the future.
Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.
Shen, Linlin; Bai, Li
A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.
Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin
Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.
Dong, Pei; Li, Jie; Dong, Junyu; Qi, Lin
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. The challenge for action recognition is to capture and fuse the multi-dimension information in video data. In order to take into account these characteristics simultaneously, we present a novel method that fuses multiple dimensional features, such as chromatic images, depth and optical flow fields. We built our model based on the multi-stream deep convolutional networks with the help of temporal segment networks and extract discriminative spatial and temporal features by fusing ConvNets towers multi-dimension, in which different feature weights are assigned in order to take full advantage of this multi-dimension information. Our architecture is trained and evaluated on the currently largest and most challenging benchmark NTU RGB-D dataset. The experiments demonstrate that the performance of our method outperforms the state-of-the-art methods.
Full Text Available The feature fusion from separate source is the current technical difficulties of cross-corpus speech emotion recognition. The purpose of this paper is to, based on Deep Belief Nets (DBN in Deep Learning, use the emotional information hiding in speech spectrum diagram (spectrogram as image features and then implement feature fusion with the traditional emotion features. First, based on the spectrogram analysis by STB/Itti model, the new spectrogram features are extracted from the color, the brightness, and the orientation, respectively; then using two alternative DBN models they fuse the traditional and the spectrogram features, which increase the scale of the feature subset and the characterization ability of emotion. Through the experiment on ABC database and Chinese corpora, the new feature subset compared with traditional speech emotion features, the recognition result on cross-corpus, distinctly advances by 8.8%. The method proposed provides a new idea for feature fusion of emotion recognition.
Goltsev, Alexander; Gritsenko, Vladimir
In the paper, effective and simple features for image recognition (named LiRA-features) are investigated in the task of handwritten digit recognition. Two neural network classifiers are considered-a modified 3-layer perceptron LiRA and a modular assembly neural network. A method of feature selection is proposed that analyses connection weights formed in the preliminary learning process of a neural network classifier. In the experiments using the MNIST database of handwritten digits, the feature selection procedure allows reduction of feature number (from 60 000 to 7000) preserving comparable recognition capability while accelerating computations. Experimental comparison between the LiRA perceptron and the modular assembly neural network is accomplished, which shows that recognition capability of the modular assembly neural network is somewhat better. Copyright © 2011 Elsevier Ltd. All rights reserved.
P. C. PANCHARIYA
Full Text Available This paper describes an approach for extraction of features from data generated from an electronic tongue based on large amplitude pulse voltammetry. In this approach statistical features of the meaningful selected variables from current response signals are extracted and used for recognition of beverage samples. The proposed feature extraction approach not only reduces the computational complexity but also reduces the computation time and requirement of storage of data for the development of E-tongue for field applications. With the reduced information, a probabilistic neural network (PNN was trained for qualitative analysis of different beverages. Before the qualitative analysis of the beverages, the methodology has been tested for the basic artificial taste solutions i.e. sweet, sour, salt, bitter, and umami. The proposed procedure was compared with the more conventional and linear feature extraction technique employing principal component analysis combined with PNN. Using the extracted feature vectors, highly correct classification by PNN was achieved for eight types of juices and six types of soft drinks. The results indicated that the electronic tongue based on large amplitude pulse voltammetry with reduced feature was capable of discriminating not only basic artificial taste solutions but also the various sorts of the same type of natural beverages (fruit juices, vegetable juices, soft drinks, etc..
Silapachote, Piyanuch; Karuppiah, Deepak R; Hanson, Allen R
We propose a classification technique for face expression recognition using AdaBoost that learns by selecting the relevant global and local appearance features with the most discriminating information...
Rao, K Sreenivasa
This book discusses the contribution of articulatory and excitation source information in discriminating sound units. The authors focus on excitation source component of speech -- and the dynamics of various articulators during speech production -- for enhancement of speech recognition (SR) performance. Speech recognition is analyzed for read, extempore, and conversation modes of speech. Five groups of articulatory features (AFs) are explored for speech recognition, in addition to conventional spectral features. Each chapter provides the motivation for exploring the specific feature for SR task, discusses the methods to extract those features, and finally suggests appropriate models to capture the sound unit specific knowledge from the proposed features. The authors close by discussing various combinations of spectral, articulatory and source features, and the desired models to enhance the performance of SR systems.
Full Text Available This paper proposes a boosted linear discriminant analysis (LDA solution on features extracted by the multilinear principal component analysis (MPCA to enhance gait recognition performance. Three-dimensional gait objects are projected in the MPCA space first to obtain low-dimensional tensorial features. Then, lower-dimensional vectorial features are obtained through discriminative feature selection. These feature vectors are then fed into an LDA-style booster, where several regularized and weakened LDA learners work together to produce a strong learner through a novel feature weighting and sampling process. The LDA learner employs a simple nearest-neighbor classifier with a weighted angle distance measure for classification. The experimental results on the NIST/USF “Gait Challenge” data-sets show that the proposed solution has successfully improved the gait recognition performance and outperformed several state-of-the-art gait recognition algorithms.
.... (4) Invariants: both geometric and other types. (5) Human faces: Analysis of images of human faces, including feature extraction, face recognition, compression, and recognition of facial expressions...
.... (4) Invariants -- both geometric and other types. (5) Human faces: Analysis of images of human faces, including feature extraction, face recognition, compression, and recognition of facial expressions...
Full Text Available Both static features and motion features have shown promising performance in human activities recognition task. However, the information included in these features is insufficient for complex human activities. In this paper, we propose extracting relational information of static features and motion features for human activities recognition. The videos are represented by a classical Bag-of-Word (BoW model which is useful in many works. To get a compact and discriminative codebook with small dimension, we employ the divisive algorithm based on KL-divergence to reconstruct the codebook. After that, to further capture strong relational information, we construct a bipartite graph to model the relationship between words of different feature set. Then we use a k-way partition to create a new codebook in which similar words are getting together. With this new codebook, videos can be represented by a new BoW vector with strong relational information. Moreover, we propose a method to compute new clusters from the divisive algorithm’s projective function. We test our work on the several datasets and obtain very promising results.
Mala, S; Latha, K
Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.
Oikonomopoulos, A.; Pantic, Maja
In this paper we address the problem of human activity modelling and recognition by means of a hierarchical representation of mined dense spatiotemporal features. At each level of the hierarchy, the proposed method selects feature constellations that are increasingly discriminative and
feature space . . . . . . . . . . . . . . . 85 5.3.1 Preserving neighborhood relationships during coding . . . . . . 86 5.3.2 Letting only neighbors vote ...Letting only neighbors vote during pooling Pooling involves extracting an ensemble statistic from a potentially large group of in- puts. However...element. For slicing the 4D tensor S we adopt the MATLAB notation for simplicity of notation. function ConvCoD(x,D, α) Set: S = DT ∗ D Initialize: z = 0; β
Turmon, Michael; Jones, Harrison P.; Malanushenko, Olena V.; Pap, Judit M.
A maximum a posteriori (MAP) technique is developed to identify solar features in cotemporal and cospatial images of line-of-sight magnetic flux, continuum intensity, and equivalent width observed with the NASA/National Solar Observatory (NSO) Spectromagnetograph (SPM). The technique facilitates human understanding of patterns in large data sets and enables systematic studies of feature characteristics for comparison with models and observations of long-term solar activity and variability. The method uses Bayes’ rule to compute the posterior probability of any feature segmentation of a trio of observed images from per-pixel, class-conditional probabilities derived from independently-segmented training images. Simulated annealing is used to find the most likely segmentation. New algorithms for computing class-conditional probabilities from three-dimensional Gaussian mixture models and interpolated histogram densities are described and compared. A new extension to the spatial smoothing in the Bayesian prior model is introduced, which can incorporate a spatial dependence such as center-to-limb variation. How the spatial scale of training segmentations affects the results is discussed, and a new method for statistical separation of quiet Sun and quiet network is presented.
Full Text Available A fast pedestrian recognition algorithm based on multisensor fusion is presented in this paper. Firstly, potential pedestrian locations are estimated by laser radar scanning in the world coordinates, and then their corresponding candidate regions in the image are located by camera calibration and the perspective mapping model. For avoiding time consuming in the training and recognition process caused by large numbers of feature vector dimensions, region of interest-based integral histograms of oriented gradients (ROI-IHOG feature extraction method is proposed later. A support vector machine (SVM classifier is trained by a novel pedestrian sample dataset which adapt to the urban road environment for online recognition. Finally, we test the validity of the proposed approach with several video sequences from realistic urban road scenarios. Reliable and timewise performances are shown based on our multisensor fusing method.
Karadogan, Seliz; Larsen, Jan
The recognition of affect in speech has attracted a lot of interest recently; especially in the area of cognitive and computer sciences. Most of the previous studies focused on the recognition of basic emotions (such as happiness, sadness and anger) using categorical approach. Recently, the focus...... has been shifting towards dimensional affect recognition based on the idea that emotional states are not independent from one another but related in a systematic manner. In this paper, we design a continuous dimensional speech affect recognition model that combines acoustic and semantic features. We...... show that combining semantic and acoustic information for dimensional speech recognition improves the results. Moreover, we show that valence is better estimated using semantic features while arousal is better estimated using acoustic features....
Full Text Available Automatic face recognition remains an interesting but challenging computer vision open problem. Poor illumination is considered as one of the major issue, since illumination changes cause large variation in the facial features. To resolve this, illumination normalization preprocessing techniques are employed in this paper to enhance the face recognition rate. The methods such as Histogram Equalization (HE, Gamma Intensity Correction (GIC, Normalization chain and Modified Homomorphic Filtering (MHF are used for preprocessing. Owing to great success, the texture features are commonly used for face recognition. But these features are severely affected by lighting changes. Hence texture based models Local Binary Pattern (LBP, Local Derivative Pattern (LDP, Local Texture Pattern (LTP and Local Tetra Patterns (LTrPs are experimented under different lighting conditions. In this paper, illumination invariant face recognition technique is developed based on the fusion of illumination preprocessing with local texture descriptors. The performance has been evaluated using YALE B and CMU-PIE databases containing more than 1500 images. The results demonstrate that MHF based normalization gives significant improvement in recognition rate for the face images with large illumination conditions.
Monro, Donald M; Rakshit, Soumyadip; Zhang, Dexin
This paper presents a novel iris coding method based on differences of discrete cosine transform (DCT) coefficients of overlapped angular patches from normalized iris images. The feature extraction capabilities of the DCT are optimized on the two largest publicly available iris image data sets, 2,156 images of 308 eyes from the CASIA database and 2,955 images of 150 eyes from the Bath database. On this data, we achieve 100 percent Correct Recognition Rate (CRR) and perfect Receiver-Operating Characteristic (ROC) Curves with no registered false accepts or rejects. Individual feature bit and patch position parameters are optimized for matching through a product-of-sum approach to Hamming distance calculation. For verification, a variable threshold is applied to the distance metric and the False Acceptance Rate (FAR) and False Rejection Rate (FRR) are recorded. A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical Equal Error Rate (EER) is predicted to be as low as 2.59 x 10(-4) on the available data sets.
Kim, Jonghwa; André, Elisabeth
This paper investigates the potential of physiological signals as a reliable channel for automatic recognition of user's emotial state. For the emotion recognition, little attention has been paid so far to physiological signals compared to audio-visual emotion channels such as facial expression or speech. All essential stages of automatic recognition system using biosignals are discussed, from recording physiological dataset up to feature-based multiclass classification. Four-channel biosensors are used to measure electromyogram, electrocardiogram, skin conductivity and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to search the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by emotion recognition results.
Full Text Available The difference between adjacent frames of human walking contains useful information for human gait identification. Based on the previous idea a silhouettes difference based human gait recognition method named as average gait differential image (AGDI is proposed in this paper. The AGDI is generated by the accumulation of the silhouettes difference between adjacent frames. The advantage of this method lies in that as a feature image it can preserve both the kinetic and static information of walking. Comparing to gait energy image (GEI, AGDI is more fit to representation the variation of silhouettes during walking. Two-dimensional principal component analysis (2DPCA is used to extract features from the AGDI. Experiments on CASIA dataset show that AGDI has better identification and verification performance than GEI. Comparing to PCA, 2DPCA is a more efficient and less memory storage consumption feature extraction method in gait based recognition.
Nie, Aiqing; Jiang, Jingguo; Fu, Qiao
Previous research has found that conjunction faces (whose internal features, e.g. eyes, nose, and mouth, and external features, e.g. hairstyle and ears, are from separate studied faces) and feature faces (partial features of these are studied) can produce higher false alarms than both old and new faces (i.e. those that are exactly the same as the studied faces and those that have not been previously presented) in recognition. The event-related potentials (ERPs) that relate to conjunction and feature faces at recognition, however, have not been described as yet; in addition, the contributions of different facial features toward ERPs have not been differentiated. To address these issues, the present study compared the ERPs elicited by old faces, conjunction faces (the internal and the external features were from two studied faces), old internal feature faces (whose internal features were studied), and old external feature faces (whose external features were studied) with those of new faces separately. The results showed that old faces not only elicited an early familiarity-related FN400, but a more anterior distributed late old/new effect that reflected recollection. Conjunction faces evoked similar late brain waveforms as old internal feature faces, but not to old external feature faces. These results suggest that, at recognition, old faces hold higher familiarity than compound faces in the profiles of ERPs and internal facial features are more crucial than external ones in triggering the brain waveforms that are characterized as reflecting the result of familiarity.
Zhang, Shijun; Jing, Zhongliang; Li, Jianxun
The rotation invariant feature of the target is obtained using the multi-direction feature extraction property of the steerable filter. Combining the morphological operation top-hat transform with the self-organizing feature map neural network, the adaptive topological region is selected. Using the erosion operation, the topological region shrinkage is achieved. The steerable filter based morphological self-organizing feature map neural network is applied to automatic target recognition of binary standard patterns and real-world infrared sequence images. Compared with Hamming network and morphological shared-weight networks respectively, the higher recognition correct rate, robust adaptability, quick training, and better generalization of the proposed method are achieved.
Turati, Chiara; Macchi Cassia, Viola; Simion, Francesca; Leo, Irene
Existing data indicate that newborns are able to recognize individual faces, but little is known about what perceptual cues drive this ability. The current study showed that either the inner or outer features of the face can act as sufficient cues for newborns' face recognition (Experiment 1), but the outer part of the face enjoys an advantage…
Iqtait, M.; Mohamad, F. S.; Mamat, M.
Biometric is a pattern recognition system which is used for automatic recognition of persons based on characteristics and features of an individual. Face recognition with high recognition rate is still a challenging task and usually accomplished in three phases consisting of face detection, feature extraction, and expression classification. Precise and strong location of trait point is a complicated and difficult issue in face recognition. Cootes proposed a Multi Resolution Active Shape Models (ASM) algorithm, which could extract specified shape accurately and efficiently. Furthermore, as the improvement of ASM, Active Appearance Models algorithm (AAM) is proposed to extracts both shape and texture of specified object simultaneously. In this paper we give more details about the two algorithms and give the results of experiments, testing their performance on one dataset of faces. We found that the ASM is faster and gains more accurate trait point location than the AAM, but the AAM gains a better match to the texture.
Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
Full Text Available The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs and linear prediction cepstral coefficients (LPCCs. These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC, spectral bandwidth (SBW, spectral band energy (SBE, spectral crest factor (SCF, spectral flatness measure (SFM, Shannon entropy (SE, and Renyi entropy (RE were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN, and a bandpass channel with 1Ã¢Â€Â‰dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ÃŽÂ”MFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10Ã¢Â€Â“40Ã¢Â€Â‰dB. For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.
Khotimah, C.; Juniati, D.
Biometrics is a science that is now growing rapidly. Iris recognition is a biometric modality which captures a photo of the eye pattern. The markings of the iris are distinctive that it has been proposed to use as a means of identification, instead of fingerprints. Iris recognition was chosen for identification in this research because every human has a special feature that each individual is different and the iris is protected by the cornea so that it will have a fixed shape. This iris recognition consists of three step: pre-processing of data, feature extraction, and feature matching. Hough transformation is used in the process of pre-processing to locate the iris area and Daugman’s rubber sheet model to normalize the iris data set into rectangular blocks. To find the characteristics of the iris, it was used box counting method to get the fractal dimension value of the iris. Tests carried out by used k-fold cross method with k = 5. In each test used 10 different grade K of K-Nearest Neighbor (KNN). The result of iris recognition was obtained with the best accuracy was 92,63 % for K = 3 value on K-Nearest Neighbor (KNN) method.
Nasrollahi, Kamal; Moeslund, Thomas B.; Rashidi, Maryam
Developing a reliable, fast, and robust biometric recognition system is still a challenging task. This is because the inputs to these systems can be noisy, occluded, poorly illuminated, rotated, and of very low-resolutions. This paper proposes a probabilistic classifier using Haar-like features......, which mostly have been used for detection, for biometric recognition. The proposed system has been tested for three different biometrics: ear, iris, and hand vein patterns and it is shown that it is robust against most of the mentioned degradations and it outperforms state-of-the-art systems...
Full Text Available Because a number of image feature data to store, complex calculation to execute during the face recognition, therefore the face recognition process was realized only by PCs with high performance. In this paper, the OpenCV facial Haar-like features were used to identify face region; the Principal Component Analysis (PCA was employed in quick extraction of face features and the Euclidean Distance was also adopted in face recognition; as thus, data amount and computational complexity would be reduced effectively in face recognition, and the face recognition could be carried out on embedded platform. Finally, based on Tiny6410 embedded platform, a set of embedded face recognition systems was constructed. The test results showed that the system has stable operation and high recognition rate can be used in portable and mobile identification and authentication.
Kawewong, Aram; Tangruamsub, Sirinart; Hasegawa, Osamu
A novel Position-Invariant Robust Feature, designated as PIRF, is presented to address the problem of highly dynamic scene recognition. The PIRF is obtained by identifying existing local features (i.e. SIFT) that have a wide baseline visibility within a place (one place contains more than one sequential images). These wide-baseline visible features are then represented as a single PIRF, which is computed as an average of all descriptors associated with the PIRF. Particularly, PIRFs are robust against highly dynamical changes in scene: a single PIRF can be matched correctly against many features from many dynamical images. This paper also describes an approach to using these features for scene recognition. Recognition proceeds by matching an individual PIRF to a set of features from test images, with subsequent majority voting to identify a place with the highest matched PIRF. The PIRF system is trained and tested on 2000+ outdoor omnidirectional images and on COLD datasets. Despite its simplicity, PIRF offers a markedly better rate of recognition for dynamic outdoor scenes (ca. 90%) than the use of other features. Additionally, a robot navigation system based on PIRF (PIRF-Nav) can outperform other incremental topological mapping methods in terms of time (70% less) and memory. The number of PIRFs can be reduced further to reduce the time while retaining high accuracy, which makes it suitable for long-term recognition and localization.
Yuan, Baohua; Li, Shijin; Li, Ning
The features extracted from deep convolutional neural networks (CNNs) have shown their promise as generic descriptors for land-use scene recognition. However, most of the work directly adopts the deep features for the classification of remote sensing images, and does not encode the deep features for improving their discriminative power, which can affect the performance of deep feature representations. To address this issue, we propose an effective framework, LASC-CNN, obtained by locality-constrained affine subspace coding (LASC) pooling of a CNN filter bank. LASC-CNN obtains more discriminative deep features than directly extracted from CNNs. Furthermore, LASC-CNN builds on the top convolutional layers of CNNs, which can incorporate multiscale information and regions of arbitrary resolution and sizes. Our experiments have been conducted using two widely used remote sensing image databases, and the results show that the proposed method significantly improves the performance when compared to other state-of-the-art methods.
Cleary, Anne M; Ryals, Anthony J; Wagner, Samantha R
Research suggests that a feature-matching process underlies cue familiarity-detection when cued recall with graphemic cues fails. When a test cue (e.g., potchbork) overlaps in graphemic features with multiple unrecalled studied items (e.g., patchwork, pitchfork, pocketbook, pullcork), higher cue familiarity ratings are given during recall failure of all of the targets than when the cue overlaps in graphemic features with only one studied target and that target fails to be recalled (e.g., patchwork). The present study used semantic feature production norms (McRae et al., Behavior Research Methods, Instruments, & Computers, 37, 547-559, 2005) to examine whether the same holds true when the cues are semantic in nature (e.g., jaguar is used to cue cheetah). Indeed, test cues (e.g., cedar) that overlapped in semantic features (e.g., a_tree, has_bark, etc.) with four unretrieved studied items (e.g., birch, oak, pine, willow) received higher cue familiarity ratings during recall failure than test cues that overlapped in semantic features with only two (also unretrieved) studied items (e.g., birch, oak), which in turn received higher familiarity ratings during recall failure than cues that did not overlap in semantic features with any studied items. These findings suggest that the feature-matching theory of recognition during recall failure can accommodate recognition of semantic cues during recall failure, providing a potential mechanism for conceptually-based forms of cue recognition during target retrieval failure. They also provide converging evidence for the existence of the semantic features envisaged in feature-based models of semantic knowledge representation and for those more concretely specified by the production norms of McRae et al. (Behavior Research Methods, Instruments, & Computers, 37, 547-559, 2005).
Full Text Available Gait recognition aims to identify people by the way they walk. In this paper, a simple but e ective gait recognition method based on Outermost Contour is proposed. For each gait image sequence, an adaptive silhouette extraction algorithm is firstly used to segment the frames of the sequence and a series of postprocessing is applied to obtain the normalized silhouette images with less noise. Then a novel feature extraction method based on Outermost Contour is performed. Principal Component Analysis (PCA is adopted to reduce the dimensionality of the distance signals derived from the Outermost Contours of silhouette images. Then Multiple Discriminant Analysis (MDA is used to optimize the separability of gait features belonging to di erent classes. Nearest Neighbor (NN classifier and Nearest Neighbor classifier with respect to class Exemplars (ENN are used to classify the final feature vectors produced by MDA. In order to verify the e ectiveness and robustness of our feature extraction algorithm, we also use two other classifiers: Backpropagation Neural Network (BPNN and Support Vector Machine (SVM for recognition. Experimental results on a gait database of 100 people show that the accuracy of using MDA, BPNN and SVM can achieve 97.67%, 94.33% and 94.67%, respectively.
Sanpachai, H.; Settapong, M.
Biometrics is a promising technique that is used to identify individual traits and characteristics. Iris recognition is one of the most reliable biometric methods. As iris texture and color is fully developed within a year of birth, it remains unchanged throughout a person's life. Contrary to fingerprint, which can be altered due to several aspects including accidental damage, dry or oily skin and dust. Although iris recognition has been studied for more than a decade, there are limited commercial products available due to its arduous requirement such as camera resolution, hardware size, expensive equipment and computational complexity. However, at the present time, technology has overcome these obstacles. Iris recognition can be done through several sequential steps which include pre-processing, features extractions, post-processing, and matching stage. In this paper, we adopted the directional high-low pass filter for feature extraction. A box-counting fractal dimension and Iris code have been proposed as feature representations. Our approach has been tested on CASIA Iris Image database and the results are considered successful.
Boom, B.J.; Beumer, G.M.; Spreeuwers, Lieuwe Jan; Veldhuis, Raymond N.J.
Accurate face registration is of vital importance to the performance of a face recognition algorithm. We propose a new method: matching score based face registration, which searches for optimal alignment by maximizing the matching score output of a classifier as a function of the different
Rudovic, Ognjen; Patras, Ioannis; Pantic, Maja
We present a regression-based scheme for multi-view facial expression recognition based on 2蚠D geometric features. We address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition of the expressions can be performed using a
Wu, Yao; Qiu, Weigen
In order to improve the robustness of facial expression recognition, a method of face expression recognition based on Local Binary Pattern (LBP) combined with improved deep belief networks (DBNs) is proposed. This method uses LBP to extract the feature, and then uses the improved deep belief networks as the detector and classifier to extract the LBP feature. The combination of LBP and improved deep belief networks is realized in facial expression recognition. In the JAFFE (Japanese Female Facial Expression) database on the recognition rate has improved significantly.
Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Full Text Available In this paper, we present a new framework for face recognition with varying illumination based on DCT total variation minimization (DTV, a Gabor filter, a sub-micro-pattern analysis (SMP and discriminated accumulative feature transform (DAFT. We first suppress the illumination effect by using the DCT with the help of TV as a tool for face normalization. The DTV image is then emphasized by the Gabor filter. The facial features are encoded by our proposed method - the SMP. The SMP image is then transformed to the 2D histogram using DAFT. Our system is verified with experiments on the AR and the Yale face database B.
Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.
Li, Hong; Wei, Yantao; Li, Luoqing; Chen, C L P
In this paper, a hierarchical feature extraction method is proposed for image recognition. The key idea of the proposed method is to extract an effective feature, called local neural response (LNR), of the input image with nontrivial discrimination and invariance properties by alternating between local coding and maximum pooling operation. The local coding, which is carried out on the locally linear manifold, can extract the salient feature of image patches and leads to a sparse measure matrix on which maximum pooling is carried out. The maximum pooling operation builds the translation invariance into the model. We also show that other invariant properties, such as rotation and scaling, can be induced by the proposed model. In addition, a template selection algorithm is presented to reduce computational complexity and to improve the discrimination ability of the LNR. Experimental results show that our method is robust to local distortion and clutter compared with state-of-the-art algorithms.
Zubair, A. F.; Abu Mansor, M. S.
Computer Aided Process Planning (CAPP) is the bridge between CAD and CAM and pre-processing of the CAD data in the CAPP system is essential. For CNC turning part, conical faces of part model is inevitable to be recognised beside cylindrical and planar faces. As the sinus cosines of the cone radius structure differ according to different models, face identification in automatic feature recognition of the part model need special intention. This paper intends to focus hole on feature on conical faces that can be detected by CAD solid modeller ACIS via. SAT file. Detection algorithm of face topology were generated and compared. The study shows different faces setup for similar conical part models with different hole type features. Three types of holes were compared and different between merge faces and unmerge faces were studied.
Schreier, E. M.; Uslan, M. M.
The review examines six personal computer-based optical character recognition (OCR) systems designed for use by blind and visually impaired people. Considered are OCR components and terms, documentation, scanning and reading, command structure, conversion, unique features, accuracy of recognition, scanning time, speed, and cost. (DB)
Schafer, Phillip B; Jin, Dezhe Z
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Oztel, Ismail; Yolcu, Gozde; Oz, Cemil; Kazan, Serap; Bunyak, Filiz
Facial expressions have an important role in interpersonal communications and estimation of emotional states or intentions. Automatic recognition of facial expressions has led to many practical applications and became one of the important topics in computer vision. We present a facial expression recognition system that relies on geometry-based features extracted from eye and eyebrow regions of the face. The proposed system detects keypoints on frontal face images and forms a feature set using geometric relationships among groups of detected keypoints. Obtained feature set is refined and reduced using the sequential forward selection (SFS) algorithm and fed to a support vector machine classifier to recognize five facial expression classes. The proposed system, iFER (eye-eyebrow only facial expression recognition), is robust to lower face occlusions that may be caused by beards, mustaches, scarves, etc. and lower face motion during speech production. Preliminary experiments on benchmark datasets produced promising results outperforming previous facial expression recognition studies using partial face features, and comparable results to studies using whole face information, only slightly lower by ˜ 2.5 % compared to the best whole face facial recognition system while using only ˜ 1 / 3 of the facial region.
Wang, Liqiang; Wang, Xin; Xi, Fubiao; Dong, Jian
One of the important part of object target recognition is the feature extraction, which can be classified into feature extraction and automatic feature extraction. The traditional neural network is one of the automatic feature extraction methods, while it causes high possibility of over-fitting due to the global connection. The deep learning algorithm used in this paper is a hierarchical automatic feature extraction method, trained with the layer-by-layer convolutional neural network (CNN), which can extract the features from lower layers to higher layers. The features are more discriminative and it is beneficial to the object target recognition.
Das, Nibaran; Mollah, Ayatullah Faruk; Sarkar, Ram; Basu, Subhadip
The work presents a comparative assessment of seven different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron (MLP) based classifier. The seven feature sets employed here consist of shadow features, octant centroids, longest runs, angular distances, effective spans, dynamic centers of gravity, and some of their combinations. On experimentation with a database of 3000 samples, the maximum recognition rate of 95.80% is observed with both of two separat...
Full Text Available In order to improve the recognition rate of face recognition, face recognition algorithm based on histogram equalization, PCA and BP neural network is proposed. First, the face image is preprocessed by histogram equalization. Then, the classical PCA algorithm is used to extract the features of the histogram equalization image, and extract the principal component of the image. And then train the BP neural network using the trained training samples. This improved BP neural network weight adjustment method is used to train the network because the conventional BP algorithm has the disadvantages of slow convergence, easy to fall into local minima and training process. Finally, the BP neural network with the test sample input is trained to classify and identify the face images, and the recognition rate is obtained. Through the use of ORL database face image simulation experiment, the analysis results show that the improved BP neural network face recognition method can effectively improve the recognition rate of face recognition.
Li, Xuan; Li, Dehua
Face recognition as an important biometric identification method, with its friendly, natural, convenient advantages, has obtained more and more attention. This paper intends to research a face recognition system including face detection, feature extraction and face recognition, mainly through researching on related theory and the key technology of various preprocessing methods in face detection process, using KPCA method, focuses on the different recognition results in different preprocessing methods. In this paper, we choose YCbCr color space for skin segmentation and choose integral projection for face location. We use erosion and dilation of the opening and closing operation and illumination compensation method to preprocess face images, and then use the face recognition method based on kernel principal component analysis method for analysis and research, and the experiments were carried out using the typical face database. The algorithms experiment on MATLAB platform. Experimental results show that integration of the kernel method based on PCA algorithm under certain conditions make the extracted features represent the original image information better for using nonlinear feature extraction method, which can obtain higher recognition rate. In the image preprocessing stage, we found that images under various operations may appear different results, so as to obtain different recognition rate in recognition stage. At the same time, in the process of the kernel principal component analysis, the value of the power of the polynomial function can affect the recognition result.
Luts, J.; Heerschap, A.; Suykens, J.A.; Huffel, S. van
OBJECTIVE: This study investigates the use of automated pattern recognition methods on magnetic resonance data with the ultimate goal to assist clinicians in the diagnosis of brain tumours. Recently, the combined use of magnetic resonance imaging (MRI) and magnetic resonance spectroscopic imaging
Full Text Available In recent years wavelet transform has been found to be an effective tool for time–frequency analysis. Wavelet transform has been used as feature extraction in speech recognition applications and it has proved to be an effective technique for unvoiced phoneme classification. In this paper a new filter structure using admissible wavelet packet is analyzed for English phoneme recognition. These filters have the benefit of having frequency bands spacing similar to the auditory Equivalent Rectangular Bandwidth (ERB scale. Central frequencies of ERB scale are equally distributed along the frequency response of human cochlea. A new sets of features are derived using wavelet packet transform's multi-resolution capabilities and found to be better than conventional features for unvoiced phoneme problems. Some of the noises from NOISEX-92 database has been used for preparing the artificial noisy database to test the robustness of wavelet based features.
Zhang, Yu; Zhou, Guoxu; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej
Canonical correlation analysis (CCA) has been successfully applied to steady-state visual evoked potential (SSVEP) recognition for brain-computer interface (BCI) application. Although the CCA method outperforms the traditional power spectral density analysis through multi-channel detection, it requires additionally pre-constructed reference signals of sine-cosine waves. It is likely to encounter overfitting in using a short time window since the reference signals include no features from training data. We consider that a group of electroencephalogram (EEG) data trials recorded at a certain stimulus frequency on a same subject should share some common features that may bear the real SSVEP characteristics. This study therefore proposes a common feature analysis (CFA)-based method to exploit the latent common features as natural reference signals in using correlation analysis for SSVEP recognition. Good performance of the CFA method for SSVEP recognition is validated with EEG data recorded from ten healthy subjects, in contrast to CCA and a multiway extension of CCA (MCCA). Experimental results indicate that the CFA method significantly outperformed the CCA and the MCCA methods for SSVEP recognition in using a short time window (i.e., less than 1s). The superiority of the proposed CFA method suggests it is promising for the development of a real-time SSVEP-based BCI. Copyright © 2014 Elsevier B.V. All rights reserved.
Myagmarbayar, Nergui; Yuki, Yoshida; Imamoglu, Nevrez; Gonzalez, Jose; Otake, Mihoko; Yu, Wenwei
This research work is aimed to develop autonomous bio-monitoring mobile robots, which are capable of tracking and measuring patients' motions, recognizing the patients' behavior based on observation data, and providing calling for medical personnel in emergency situations in home environment. The robots to be developed will bring about cost-effective, safe and easier at-home rehabilitation to most motor-function impaired patients (MIPs). In our previous research, a full framework was established towards this research goal. In this research, we aimed at improving the human activity recognition by using contour data of the tracked human subject extracted from the depth images as the signal source, instead of the lower limb joint angle data used in the previous research, which are more likely to be affected by the motion of the robot and human subjects. Several geometric parameters, such as, the ratio of height to weight of the tracked human subject, and distance (pixels) between centroid points of upper and lower parts of human body, were calculated from the contour data, and used as the features for the activity recognition. A Hidden Markov Model (HMM) is employed to classify different human activities from the features. Experimental results showed that the human activity recognition could be achieved with a high correct rate.
Zhang, Daming; Zhang, Xueyong; Li, Lu; Liu, Huayong
This paper investigates a face recognition approach based on Scale Invariant Feature Transform (SIFT) feature and sparse representation. The approach takes advantage of SIFT which is local feature other than holistic feature in classical Sparse Representation based Classification (SRC) algorithm and possesses strong robustness to expression, pose and illumination variations. Since hexagonal image has more inherit merits than square image to make recognition process more efficient, we extract SIFT keypoint in hexagonal-sampling image. Instead of matching SIFT feature, firstly the sparse representation of each SIFT keypoint is given according the constructed dictionary; secondly these sparse vectors are quantized according dictionary; finally each face image is represented by a histogram and these so-called Bag-of-Words vectors are classified by SVM. Due to use of local feature, the proposed method achieves better result even when the number of training sample is small. In the experiments, the proposed method gave higher face recognition rather than other methods in ORL and Yale B face databases; also, the effectiveness of the hexagonal-sampling in the proposed method is verified.
This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg
Li, Chunyong; Xue, Jiguo; Quan, Cheng; Yue, Jingwei; Zhang, Chenggang
Biometric recognition technology based on eye-movement dynamics has been in development for more than ten years. Different visual tasks, feature extraction and feature recognition methods are proposed to improve the performance of eye movement biometric system. However, the correct identification and verification rates, especially in long-term experiments, as well as the effects of visual tasks and eye trackers' temporal and spatial resolution are still the foremost considerations in eye movement biometrics. With a focus on these issues, we proposed a new visual searching task for eye movement data collection and a new class of eye movement features for biometric recognition. In order to demonstrate the improvement of this visual searching task being used in eye movement biometrics, three other eye movement feature extraction methods were also tested on our eye movement datasets. Compared with the original results, all three methods yielded better results as expected. In addition, the biometric performance of these four feature extraction methods was also compared using the equal error rate (EER) and Rank-1 identification rate (Rank-1 IR), and the texture features introduced in this paper were ultimately shown to offer some advantages with regard to long-term stability and robustness over time and spatial precision. Finally, the results of different combinations of these methods with a score-level fusion method indicated that multi-biometric methods perform better in most cases.
Policy and Goal Recognizer (PaGR), a case- based system for multiagent keyhole recognition. PaGR is a knowledge recognition component within a decision...However, unlike our agent in the BVR domain, these recognition agents have access to perfect information. Single-agent keyhole plan recognition can be...listed below: 1. Facing Target 2. Closing on Target 3. Target Range 4. Within a Target’s Weapon Range 5. Has Target within Weapon Range 6. Is in Danger
Tsai, Chung-Chih; Lin, Heng-Yi; Taur, Jinshiuh; Tao, Chin-Wang
In this paper, we propose a novel possibilistic fuzzy matching strategy with invariant properties, which can provide a robust and effective matching scheme for two sets of iris feature points. In addition, the nonlinear normalization model is adopted to provide more accurate position before matching. Moreover, an effective iris segmentation method is proposed to refine the detected inner and outer boundaries to smooth curves. For feature extraction, the Gabor filters are adopted to detect the local feature points from the segmented iris image in the Cartesian coordinate system and to generate a rotation-invariant descriptor for each detected point. After that, the proposed matching algorithm is used to compute a similarity score for two sets of feature points from a pair of iris images. The experimental results show that the performance of our system is better than those of the systems based on the local features and is comparable to those of the typical systems.
Full Text Available This review article surveys extensively the current progresses made toward video-based human activity recognition. Three aspects for human activity recognition are addressed including core technology, human activity recognition systems, and applications from low-level to high-level representation. In the core technology, three critical processing stages are thoroughly discussed mainly: human object segmentation, feature extraction and representation, activity detection and classification algorithms. In the human activity recognition systems, three main types are mentioned, including single person activity recognition, multiple people interaction and crowd behavior, and abnormal activity recognition. Finally the domains of applications are discussed in detail, specifically, on surveillance environments, entertainment environments and healthcare systems. Our survey, which aims to provide a comprehensive state-of-the-art review of the field, also addresses several challenges associated with these systems and applications. Moreover, in this survey, various applications are discussed in great detail, specifically, a survey on the applications in healthcare monitoring systems.
Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo
Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent
Full Text Available Feature extraction is a key step in radar target recognition. The quality of the extracted features determines the performance of target recognition. However, obtaining the deep nature of the data is difficult using the traditional method. The autoencoder can learn features by making use of data and can obtain feature expressions at different levels of data. To eliminate the influence of noise, the method of radar target recognition based on stacked denoising sparse autoencoder is proposed in this paper. This method can extract features directly and efficiently by setting different hidden layers and numbers of iterations. Experimental results show that the proposed method is superior to the K-nearest neighbor method and the traditional stacked autoencoder.
Chherawala, Youssouf; Roy, Partha Pratim; Cheriet, Mohamed
The performance of handwriting recognition systems is dependent on the features extracted from the word image. A large body of features exists in the literature, but no method has yet been proposed to identify the most promising of these, other than a straightforward comparison based on the recognition rate. In this paper, we propose a framework for feature set evaluation based on a collaborative setting. We use a weighted vote combination of recurrent neural network (RNN) classifiers, each trained with a particular feature set. This combination is modeled in a probabilistic framework as a mixture model and two methods for weight estimation are described. The main contribution of this paper is to quantify the importance of feature sets through the combination weights, which reflect their strength and complementarity. We chose the RNN classifier because of its state-of-the-art performance. Also, we provide the first feature set benchmark for this classifier. We evaluated several feature sets on the IFN/ENIT and RIMES databases of Arabic and Latin script, respectively. The resulting combination model is competitive with state-of-the-art systems.
Guo, Dongwei; Wang, Zhe
Convolutional neural networks (CNN) achieve great success in computer vision, it can learn hierarchical representation from raw pixels and has outstanding performance in various image recognition tasks . However, CNN is easy to be fraudulent in terms of it is possible to produce images totally unrecognizable to human eyes that CNNs believe with near certainty are familiar objects. . In this paper, an associative memory model based on multiple features is proposed. Within this model, feature extraction and classification are carried out by CNN, T-SNE and exponential bidirectional associative memory neural network (EBAM). The geometric features extracted from CNN and the digital features extracted from T-SNE are associated by EBAM. Thus we ensure the recognition of robustness by a comprehensive assessment of the two features. In our model, we can get only 8% error rate with fraudulent data. In systems that require a high safety factor or some key areas, strong robustness is extremely important, if we can ensure the image recognition robustness, network security will be greatly improved and the social production efficiency will be extremely enhanced.
Sukhadan, Kalyani; Vijayarajan, V.; Krishnamoorthi, A.; Bessie Amali, D. Geraldine
In this we are developing a graphical user interface using MATLAB for the users to check the information related to books in real time. We are taking the photos of the book cover using GUI, then by using MSER algorithm it will automatically detect all the features from the input image, after this it will filter bifurcate non-text features which will be based on morphological difference between text and non-text regions. We implemented a text character alignment algorithm which will improve the accuracy of the original text detection. We will also have a look upon the built in MATLAB OCR recognition algorithm and an open source OCR which is commonly used to perform better detection results, post detection algorithm is implemented and natural language processing to perform word correction and false detection inhibition. Finally, the detection result will be linked to internet to perform online matching. More than 86% accuracy can be obtained by this algorithm.
Cui, Chen; Asari, Vijayan K.
Biometric features such as fingerprints, iris patterns, and face features help to identify people and restrict access to secure areas by performing advanced pattern analysis and matching. Face recognition is one of the most promising biometric methodologies for human identification in a non-cooperative security environment. However, the recognition results obtained by face recognition systems are a affected by several variations that may happen to the patterns in an unrestricted environment. As a result, several algorithms have been developed for extracting different facial features for face recognition. Due to the various possible challenges of data captured at different lighting conditions, viewing angles, facial expressions, and partial occlusions in natural environmental conditions, automatic facial recognition still remains as a difficult issue that needs to be resolved. In this paper, we propose a novel approach to tackling some of these issues by analyzing the local textural descriptions for facial feature representation. The textural information is extracted by an enhanced local binary pattern (ELBP) description of all the local regions of the face. The relationship of each pixel with respect to its neighborhood is extracted and employed to calculate the new representation. ELBP reconstructs a much better textural feature extraction vector from an original gray level image in different lighting conditions. The dimensionality of the texture image is reduced by principal component analysis performed on each local face region. Each low dimensional vector representing a local region is now weighted based on the significance of the sub-region. The weight of each sub-region is determined by employing the local variance estimate of the respective region, which represents the significance of the region. The final facial textural feature vector is obtained by concatenating the reduced dimensional weight sets of all the modules (sub-regions) of the face image
Full Text Available Abstract A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available A new noncooperative iris recognition method is proposed. In this method, the iris features are extracted using a Gabor descriptor. The feature extraction and comparison are scale, deformation, rotation, and contrast-invariant. It works with off-angle and low-resolution iris images. The Gabor wavelet is incorporated with scale-invariant feature transformation (SIFT for feature extraction to better extract the iris features. Both the phase and magnitude of the Gabor wavelet outputs were used in a novel way for local feature point description. Two feature region maps were designed to locally and globally register the feature points and each subregion in the map is locally adjusted to the dilation/contraction/deformation. We also developed a video-based non-cooperative iris recognition system by integrating video-based non-cooperative segmentation, segmentation evaluation, and score fusion units. The proposed method shows good performance for frontal and off-angle iris matching. Video-based recognition methods can improve non-cooperative iris recognition accuracy.
Full Text Available Spoken language recognition (SLR has been of increasing interest in multilingual speech recognition for identifying the languages of speech utterances. Most existing SLR approaches apply statistical modeling techniques with acoustic and phonotactic features. Among the popular approaches, the acoustic approach has become of greater interest than others because it does not require any prior language-specific knowledge. Previous research on the acoustic approach has shown less interest in applying linguistic knowledge; it was only used as supplementary features, while the current state-of-the-art system assumes independency among features. This paper proposes an SLR system based on the latent-dynamic conditional random field (LDCRF model using phonological features (PFs. We use PFs to represent acoustic characteristics and linguistic knowledge. The LDCRF model was employed to capture the dynamics of the PFs sequences for language classification. Baseline systems were conducted to evaluate the features and methods including Gaussian mixture model (GMM based systems using PFs, GMM using cepstral features, and the CRF model using PFs. Evaluated on the NIST LRE 2007 corpus, the proposed method showed an improvement over the baseline systems. Additionally, it showed comparable result with the acoustic system based on i-vector. This research demonstrates that utilizing PFs can enhance the performance.
Liu, Yahui; Zhang, Bob; Lu, Guangming; Zhang, David
The three-dimensional shape of the ear has been proven to be a stable candidate for biometric authentication because of its desirable properties such as universality, uniqueness, and permanence. In this paper, a special laser scanner designed for online three-dimensional ear acquisition was described. Based on the dataset collected by our scanner, two novel feature classes were defined from a three-dimensional ear image: the global feature class (empty centers and angles) and local feature class (points, lines, and areas). These features are extracted and combined in an optimal way for three-dimensional ear recognition. Using a large dataset consisting of 2,000 samples, the experimental results illustrate the effectiveness of fusing global and local features, obtaining an equal error rate of 2.2%.
Full Text Available The three-dimensional shape of the ear has been proven to be a stable candidate for biometric authentication because of its desirable properties such as universality, uniqueness, and permanence. In this paper, a special laser scanner designed for online three-dimensional ear acquisition was described. Based on the dataset collected by our scanner, two novel feature classes were defined from a three-dimensional ear image: the global feature class (empty centers and angles and local feature class (points, lines, and areas. These features are extracted and combined in an optimal way for three-dimensional ear recognition. Using a large dataset consisting of 2,000 samples, the experimental results illustrate the effectiveness of fusing global and local features, obtaining an equal error rate of 2.2%.
Yang, Guang; Yin, Yafeng; Park, Jeanrok; Man, Hong
As a uncommon biometric modality, human gait recognition has a great advantage of identify people at a distance without high resolution images. It has attracted much attention in recent years, especially in the fields of computer vision and remote sensing. In this paper, we propose a human gait recognition framework that consists of a reliable background subtraction method followed by the pyramid of Histogram of Gradient (pHOG) feature extraction on the silhouette image, and a Hidden Markov Model (HMM) based classifier. Through background subtraction, the silhouette of human gait in each frame is extracted and normalized from the raw video sequence. After removing the shadow and noise in each region of interest (ROI), pHOG feature is computed on the silhouettes images. Then the pHOG features of each gait class will be used to train a corresponding HMM. In the test stage, pHOG feature will be extracted from each test sequence and used to calculate the posterior probability toward each trained HMM model. Experimental results on the CASIA Gait Dataset B1 demonstrate that with our proposed method can achieve very competitive recognition rate.
Lin, Yuan-Pin; Chen, Jyh-Horng; Duann, Jeng-Ren; Lin, Chin-Teng; Jung, Tzyy-Ping
Electroencephalogram (EEG)-based emotion recognition has been an intensely growing field. Yet, how to achieve acceptable accuracy on a practical system with as fewer electrodes as possible is less concerned. This study evaluates a set of subject-independent features, based on differential power asymmetry of symmetric electrode pairs , with emphasis on its applicability to subject variability in music-induced emotion classification problem. Results of this study have evidently validated the feasibility of using subject-independent EEG features to classify four emotional states with acceptable accuracy in second-scale temporal resolution. These features could be generalized across subjects to detect emotion induced by music excerpts not limited to the music database that was used to derive the emotion-specific features.
Full Text Available In this paper, a novel approach for identifying normal and obscene videos is proposed. In order to classify different episodes of a video independently and discard the need to process all frames, first, key frames are extracted and skin regions are detected for groups of video frames starting with key frames. In the second step, three different features including 1- structural features based on single frame information, 2- features based on spatiotemporal volume and 3-motion-based features, are extracted for each episode of video. The PCA-LDA method is then applied to reduce the size of structural features and select more distinctive features. For the final step, we use fuzzy or a Weighted Support Vector Machine (WSVM classifier to identify video episodes. We also employ a multilayer Kohonen network as an initial clustering algorithm to increase the ability to discriminate between the extracted features into two classes of videos. Features based on motion and periodicity characteristics increase the efficiency of the proposed algorithm in videos with bad illumination and skin colour variation. The proposed method is evaluated using 1100 videos in different environmental and illumination conditions. The experimental results show a correct recognition rate of 94.2% for the proposed algorithm.
Jayanti Yusmah Sari
Full Text Available In recent years, palm vein recognition has been studied to overcome problems in conventional systems in biometrics technology (finger print, face, and iris. Those problems in biometrics includes convenience and performance. However, due to the clarity of the palm vein image, the veins could not be segmented properly. To overcome this problem, we propose a palm vein recognition system using Local Line Binary Pattern (LLBP method that can extract robust features from the palm vein images that has unclear veins. LLBP is an advanced method of Local Binary Pattern (LBP, a texture descriptor based on the gray level comparison of a neighborhood of pixels. There are four major steps in this paper, Region of Interest (ROI detection, image preprocessing, features extraction using LLBP method, and matching using Fuzzy k-NN classifier. The proposed method was applied on the CASIA Multi-Spectral Image Database. Experimental results showed that the proposed method using LLBP has a good performance with recognition accuracy of 97.3%. In the future, experiments will be conducted to observe which parameter that could affect processing time and recognition accuracy of LLBP is needed
Sang, Ru; Yu, Guiying; Fan, Weijun; Guo, Tiantai
As the main objects, imagoes have been researched in quarantine pest recognition in these days. However, pests in their larval stage are latent, and the larvae spread abroad much easily with the circulation of agricultural and forest products. It is presented in this paper that, as the new research objects, larvae are recognized by means of machine vision, image processing and pattern recognition. More visional information is reserved and the recognition rate is improved as color image segmentation is applied to images of larvae. Along with the characteristics of affine invariance, perspective invariance and brightness invariance, scale invariant feature transform (SIFT) is adopted for the feature extraction. The neural network algorithm is utilized for pattern recognition, and the automatic identification of larvae images is successfully achieved with satisfactory results.
Wu, Yao; Qiu, Weigen
In order to enhance the robustness of facial expression recognition, we propose a method of facial expression recognition based on improved Local Ternary Pattern (LTP) combined with Stacked Auto-Encoder (SAE). This method uses the improved LTP extraction feature, and then uses the improved depth belief network as the detector and classifier to extract the LTP feature. The combination of LTP and improved deep belief network is realized in facial expression recognition. The recognition rate on CK+ databases has improved significantly.
Olsen, Søren I.
This paper presents an approach of visual shape recognition based on exemplars of attributed keypoints. Training is performed by storing exemplars of keypoints detected in labeled training images. Recognition is made by keypoint matching and voting according to the labels for the matched keypoint....... The matching is insensitive to rotations, limited scalings and small deformations. The recognition is robust to noise, background clutter and partial occlusion. Recognition is possible from few training images and improve with the number of training images.......This paper presents an approach of visual shape recognition based on exemplars of attributed keypoints. Training is performed by storing exemplars of keypoints detected in labeled training images. Recognition is made by keypoint matching and voting according to the labels for the matched keypoints...
Alexander Mikhailovich Alyushin
Full Text Available The algorithms fordynamic spectrograms images recognition, processing and soundspeech signature (SS weredeveloped. The software for mobile phones, thatcan recognize speech signatureswas prepared. The investigation of the SS recognition speed on its boundarytypes was conducted. Recommendations on the boundary types choice in the optimal ratio of recognitionspeed and required space were given.
He, Y.; He, Y.
Urban shanty towns are communities that has contiguous old and dilapidated houses with more than 2000 square meters built-up area or more than 50 households. This study makes attempts to extract shanty towns in Nanning City using the product of Census and TripleSat satellite images. With 0.8-meter high-resolution remote sensing images, five texture characteristics (energy, contrast, maximum probability, and inverse difference moment) of shanty towns are trained and analyzed through GLCM. In this study, samples of shanty town are well classified with 98.2 % producer accuracy of unsupervised classification and 73.2 % supervised classification correctness. Low-rise and mid-rise residential blocks in Nanning City are classified into 4 different types by using k-means clustering and nearest neighbour classification respectively. This study initially establish texture feature descriptions of different types of residential areas, especially low-rise and mid-rise buildings, which would help city administrator evaluate residential blocks and reconstruction shanty towns.
Wolfrum, Philipp; Wolff, Christian; Lücke, Jörg; von der Malsburg, Christoph
Our aim here is to create a fully neural, functionally competitive, and correspondence-based model for invariant face recognition. By recurrently integrating information about feature similarities, spatial feature relations, and facial structure stored in memory, the system evaluates face identity ("what"-information) and face position ("where"-information) using explicit representations for both. The network consists of three functional layers of processing, (1) an input layer for image representation, (2) a middle layer for recurrent information integration, and (3) a gallery layer for memory storage. Each layer consists of cortical columns as functional building blocks that are modeled in accordance with recent experimental findings. In numerical simulations we apply the system to standard benchmark databases for face recognition. We find that recognition rates of our biologically inspired approach lie in the same range as recognition rates of recent and purely functionally motivated systems.
Haindl, Michal; Somol, Petr; Ververidis, D.; Kotropoulos, C.
Roč. 19, č. 4225 (2006), s. 569-577 ISSN 0302-9743. [Iberoamerican Congress on Pattern Recognition. CIARP 2006 /11./. Cancun, 14.11.2006-17.11.2006] R&D Projects: GA AV ČR 1ET400750407; GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection Subject RIV: BD - Theory of Information Impact factor: 0.402, year: 2005 http://library.utia.cas.cz/separaty/historie/haindl-feature selection based on mutual correlation.pdf
Komeili, Majid; Louis, Wael; Armanfard, Narges; Hatzinakos, Dimitrios
Electrocardiogram (ECG) and transient evoked otoacoustic emission (TEOAE) are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.
Barros, Pablo; Jirak, Doreen; Weber, Cornelius; Wermter, Stefan
Emotional state recognition has become an important topic for human-robot interaction in the past years. By determining emotion expressions, robots can identify important variables of human behavior and use these to communicate in a more human-like fashion and thereby extend the interaction possibilities. Human emotions are multimodal and spontaneous, which makes them hard to be recognized by robots. Each modality has its own restrictions and constraints which, together with the non-structured behavior of spontaneous expressions, create several difficulties for the approaches present in the literature, which are based on several explicit feature extraction techniques and manual modality fusion. Our model uses a hierarchical feature representation to deal with spontaneous emotions, and learns how to integrate multiple modalities for non-verbal emotion recognition, making it suitable to be used in an HRI scenario. Our experiments show that a significant improvement of recognition accuracy is achieved when we use hierarchical features and multimodal information, and our model improves the accuracy of state-of-the-art approaches from 82.5% reported in the literature to 91.3% for a benchmark dataset on spontaneous emotion expressions. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Full Text Available the employment of other algorithms and commands so as to better present and demonstrate the obtained results. Edge detection and enhancing images for use in an iris recognition system allow for efficient recognition and extraction of iris patterns. REFERENCES... Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing 2nd Edition, Instructor?s manual .Englewood Cliffs, Prentice Hall, pp 17-36. Proen?a, H. and Alexandre, L.A. 2007. Toward Noncooperative Iris Recognition: A classification approach using...
Olsen, Søren I.
An approach to exemplar based recognition of visual shapes is presented. The shape information is described by attributed interest points (keys) detected by an end-stop operator. The attributes describe the statistics of lines and edges local to the interest point, the position of neighboring int...... interest points, and (in the training phase) a list of recognition names. Recognition is made by a simple voting procedure. Preliminary experiments indicate that the recognition is robust to noise, small deformations, background clutter and partial occlusion....
Jorge, Carlos A.F.; Aghina, Mauricio A.C.; Mol, Antonio C.A.; Pereira, Claudio M.N.A.
This work reports the development of a Man Machine Interface based on speech recognition. The system must recognize spoken commands, and execute the desired tasks, without manual interventions of operators. The range of applications goes from the execution of commands in an industrial plant's control room, to navigation and interaction in virtual environments. Results are reported for isolated word recognition, the isolated words corresponding to the spoken commands. For the pre-processing stage, relevant parameters are extracted from the speech signals, using the cepstral analysis technique, that are used for isolated word recognition, and corresponds to the inputs of an artificial neural network, that performs recognition tasks. (author)
Full Text Available Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM. Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN, then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Zhang, Yajun; Liu, Zongtian; Zhou, Wen
Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM). Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN), then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.
Andrews, Timothy J; Baseler, Heidi; Jenkins, Rob; Burton, A Mike; Young, Andrew W
A full understanding of face recognition will involve identifying the visual information that is used to discriminate different identities and how this is represented in the brain. The aim of this study was to explore the importance of shape and surface properties in the recognition and neural representation of familiar faces. We used image morphing techniques to generate hybrid faces that mixed shape properties (more specifically, second order spatial configural information as defined by feature positions in the 2D-image) from one identity and surface properties from a different identity. Behavioural responses showed that recognition and matching of these hybrid faces was primarily based on their surface properties. These behavioural findings contrasted with neural responses recorded using a block design fMRI adaptation paradigm to test the sensitivity of Haxby et al.'s (2000) core face-selective regions in the human brain to the shape or surface properties of the face. The fusiform face area (FFA) and occipital face area (OFA) showed a lower response (adaptation) to repeated images of the same face (same shape, same surface) compared to different faces (different shapes, different surfaces). From the behavioural data indicating the critical contribution of surface properties to the recognition of identity, we predicted that brain regions responsible for familiar face recognition should continue to adapt to faces that vary in shape but not surface properties, but show a release from adaptation to faces that vary in surface properties but not shape. However, we found that the FFA and OFA showed an equivalent release from adaptation to changes in both shape and surface properties. The dissociation between the neural and perceptual responses suggests that, although they may play a role in the process, these core face regions are not solely responsible for the recognition of facial identity. Copyright © 2016 Elsevier Ltd. All rights reserved.
representations. For representing objects, we derive global descriptors encoding shape using viewpoint-invariant features obtained from multiple sensors observing the scene. Objects are also described using color independently. This allows for combining color and shape when it is required for the task. For more...... robust color description, color calibration is performed. The framework was used in three recognition tasks: object instance recognition, object category recognition, and object spatial relationship recognition. For the object instance recognition task, we present a system that utilizes color and scale...
Full Text Available Logo is a graphical symbol that is the identity of an organization, institution, or company. Logo is generally used to introduce to the public the existence of an organization, institution, or company. Through the existence of an agency logo can be seen by the public. Feature recognition is one of the processes that exist within an augmented reality system. One of uses augmented reality is able to recognize the identity of the logo through a camera.The first step to make a process of feature recognition is through the corner detection. Incorporation of several method such as FAST, SURF, and FLANN TREE for the feature detection process based corner detection feature matching up process, will have the better ability to detect the presence of a logo. Additionally when running the feature extraction process there are several issues that arise as scale invariant feature and rotation invariant feature. In this study the research object in the form of logo to the priority to make the process of feature recognition. FAST, SURF, and FLANN TREE method will detection logo with scale invariant feature and rotation invariant feature conditions. Obtained from this study will demonstration the accuracy from FAST, SURF, and FLANN TREE methods to solve the scale invariant and rotation invariant feature problems.
This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of each class. Then, we apply the Adaboost algorithm for the classification process. We assessed the proposed approach using the UR Fall Detection dataset. In this study six classes of activities are considered namely: walking, standing, bending, lying, squatting, and sitting. Results demonstrate the efficiency of the proposed methodology.
Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane
This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of each class. Then, we apply the Adaboost algorithm for the classification process. We assessed the proposed approach using the UR Fall Detection dataset. In this study six classes of activities are considered namely: walking, standing, bending, lying, squatting, and sitting. Results demonstrate the efficiency of the proposed methodology.
El Moubtahij Hicham
Full Text Available This paper presents an analytical approach of an offline handwritten Arabic text recognition system. It is based on the Hidden Markov Models (HMM Toolkit (HTK without explicit segmentation. The first phase is preprocessing, where the data is introduced in the system after quality enhancements. Then, a set of characteristics (features of local densities and features statistics are extracted by using the technique of sliding windows. Subsequently, the resulting feature vectors are injected to the Hidden Markov Model Toolkit (HTK. The simple database âArabic-Numbersâ and IFN/ENIT are used to evaluate the performance of this system. Keywords: Hidden Markov Models (HMM Toolkit (HTK, Sliding windows
Full Text Available This paper presents an innovative SIFT-based method for rigid video object recognition (hereafter called RVO-SIFT. Just like what happens in the vision system of human being, this method makes the object recognition and feature updating process organically unify together, using both trajectory and feature matching, and thereby it can learn new features not only in the training stage but also in the recognition stage, which can improve greatly the completeness of the video object’s features automatically and, in turn, increases the ratio of correct recognition drastically. The experimental results on real video sequences demonstrate its surprising robustness and efficiency.
Babu, D. R. Ramesh; Ravishankar, M.; Kumar, Manish; Wadera, Kevin; Raj, Aakash
Degraded character recognition is a challenging problem in the field of Optical Character Recognition (OCR). The performance of an optical character recognition depends upon printed quality of the input documents. Many OCRs have been designed which correctly identifies the fine printed documents. But, very few reported work has been found on the recognition of the degraded documents. The efficiency of the OCRs system decreases if the input image is degraded. In this paper, a novel approach based on gradient pattern for recognizing degraded printed character is proposed. The approach makes use of gradient pattern of an individual character for recognition. Experiments were conducted on character image that is either digitally written or a degraded character extracted from historical documents and the results are found to be satisfactory.
Easley, Glenn R.; Colonna, Flavia
We introduce a method for classifying objects based on special cases of the generalized discrete Radon transform. We adjust the transform and the corresponding ridgelet transform by means of circular shifting and a singular value decomposition (SVD) to obtain a translation, rotation and scaling invariant set of feature vectors. We then use a back-propagation neural network to classify the input feature vectors. We conclude with experimental results and compare these with other invariant recognition methods.
The current era of standards and accountability in U.S. public schooling narrows recognition and assessment to an almost exclusive focus on the production of test scores as legitimate markers of student achievement. This climate prevents rather than encourages democratic forms of exchange within and across social worlds. Via a case study of one…
Nguyen, Phuong Giang; Andersen, Hans Jørgen
The vast growth of image databases creates many challenges for computer vision applications, for instance image retrieval and object recognition. Large variation in imaging conditions such as illumination and geometrical properties (including scale, rotation, and viewpoint) gives rise to the need...
Full Text Available To improve the performance of phoneme based Automatic Speech Recognition (ASR in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA. These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4 from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.
Omara, Ibrahim; Xiao, Gang; Amrani, Moussa; Yan, Zifei; Zuo, Wangmeng
Recently, multimodal biometric systems have received considerable research interest in many applications especially in the fields of security. Multimodal systems can increase the resistance to spoof attacks, provide more details and flexibility, and lead to better performance and lower error rate. In this paper, we present a multimodal biometric system based on face and ear, and propose how to exploit the extracted deep features from Convolutional Neural Networks (CNNs) on the face and ear images to introduce more powerful discriminative features and robust representation ability for them. First, the deep features for face and ear images are extracted based on VGG-M Net. Second, the extracted deep features are fused by using a traditional concatenation and a Discriminant Correlation Analysis (DCA) algorithm. Third, multiclass support vector machine is adopted for matching and classification. The experimental results show that the proposed multimodal system based on deep features is efficient and achieves a promising recognition rate up to 100 % by using face and ear. In addition, the results indicate that the fusion based on DCA is superior to traditional fusion.
Karn, Pradeep; He, Xiao Hai; Yang, Shuai; Wu, Xiao Hong
Iris images acquired under different conditions often suffer from blur, occlusion due to eyelids and eyelashes, specular reflection, and other artifacts. Existing iris recognition systems do not perform well on these types of images. To overcome these problems, we propose an iris recognition method based on robust principal component analysis. The proposed method decomposes all training images into a low-rank matrix and a sparse error matrix, where the low-rank matrix is used for feature extraction. The sparsity concentration index approach is then applied to validate the recognition result. Experimental results using CASIA V4 and IIT Delhi V1iris image databases showed that the proposed method achieved competitive performances in both recognition accuracy and computational efficiency.
Yang, Gongping; Xiao, Rongyang; Yin, Yilong; Yang, Lu
Finger vein recognition is a promising biometric recognition technology, which verifies identities via the vein patterns in the fingers. Binary pattern based methods were thoroughly studied in order to cope with the difficulties of extracting the blood vessel network. However, current binary pattern based finger vein matching methods treat every bit of feature codes derived from different image of various individuals as equally important and assign the same weight value to them. In this paper, we propose a finger vein recognition method based on personalized weight maps (PWMs). The different bits have different weight values according to their stabilities in a certain number of training samples from an individual. Firstly we present the concept of PWM, and then propose the finger vein recognition framework, which mainly consists of preprocessing, feature extraction, and matching. Finally, we design extensive experiments to evaluate the effectiveness of our proposal. Experimental results show that PWM achieves not only better performance, but also high robustness and reliability. In addition, PWM can be used as a general framework for binary pattern based recognition. PMID:24025556
Full Text Available Finger vein recognition is a promising biometric recognition technology, which verifies identities via the vein patterns in the fingers. Binary pattern based methods were thoroughly studied in order to cope with the difficulties of extracting the blood vessel network. However, current binary pattern based finger vein matching methods treat every bit of feature codes derived from different image of various individuals as equally important and assign the same weight value to them. In this paper, we propose a finger vein recognition method based on personalized weight maps (PWMs. The different bits have different weight values according to their stabilities in a certain number of training samples from an individual. Firstly we present the concept of PWM, and then propose the finger vein recognition framework, which mainly consists of preprocessing, feature extraction, and matching. Finally, we design extensive experiments to evaluate the effectiveness of our proposal. Experimental results show that PWM achieves not only better performance, but also high robustness and reliability. In addition, PWM can be used as a general framework for binary pattern based recognition.
Full Text Available According to the features of Chinese personal name, we present an approach for Chinese personal name recognition based on conditional random fields (CRF and knowledge base in this paper. The method builds multiple features of CRF model by adopting Chinese character as processing unit, selects useful features based on selection algorithm of knowledge base and incremental feature template, and finally implements the automatic recognition of Chinese personal name from Chinese document. The experimental results on open real corpus demonstrated the effectiveness of our method and obtained high accuracy rate and high recall rate of recognition.
Reniers, Dennie; Telea, Alexandru
Tolerance-based feature transforms (TFTs) assign to each pixel in an image not only the nearest feature pixels on the boundary (origins), but all origins from the minimum distance up to a user-defined tolerance. In this paper, we compare four simple-to-implement methods for computing TFTs on binary
Full Text Available The paper describes the FPGA-based implementation of Lithuanian isolated word recognition algorithm. FPGA is selected for parallel process implementation using VHDL to ensure fast signal processing at low rate clock signal. Cepstrum analysis was applied to features extraction in voice. The dynamic time warping algorithm was used to compare the vectors of cepstrum coefficients. A library of 100 words features was created and stored in the internal FPGA BRAM memory. Experimental testing with speaker dependent records demonstrated the recognition rate of 94%. The recognition rate of 58% was achieved for speaker-independent records. Calculation of cepstrum coefficients lasted for 8.52 ms at 50 MHz clock, while 100 DTWs took 66.56 ms at 25 MHz clock.Article in Lithuanian
Full Text Available Biometric Authentication Technology has been widely used in this information age. As one of the most important technology of authentication, finger vein recognition attracts our attention because of its high security, reliable accuracy and excellent performance. However, the current finger vein recognition system is difficult to be applied widely because its complicated image pre-processing and not representative feature vectors. To solve this problem, a finger vein recognition method based on the convolution neural network (CNN is proposed in the paper. The image samples are directly input into the CNN model to extract its feature vector so that we can make authentication by comparing the Euclidean distance between these vectors. Finally, the Deep Learning Framework Caffe is adopted to verify this method. The result shows that there are great improvements in both speed and accuracy rate compared to the previous research. And the model has nice robustness in illumination and rotation.
All PROVE-IT(FRiV) project reports are listed below. 1. E. Granger, P. Radtke , and D. Gorodnichy, “Survey of academic research and prototypes for face...recognition in video”, Border Technology Division, Division Report 2014-25 (TR). 2. D. Gorodnichy, E.Granger, and P. Radtke , “Survey of commercial...Gorodnichy, E. Choy, W. Khreich, P. Radtke , J. Bergeron, and D. Bissessar, “Results from evaluation of three commercial off-the-shelf face
Zuo, F.; With, de P.H.N.; Ebrahimi, T.; Sikora, T.
In a home environment, video surveillance employing face detection and recognition is attractive for new applications. Facial feature (e.g. eyes and mouth) localization in the face is an essential task for face recognition because it constitutes an indispensable step for face geometry normalization.
Takacs, Gabriel; Chandrasekhar, Vijay; Tsai, Sam; Chen, David; Grzeszczuk, Radek; Girod, Bernd
We present an end-to-end feature description pipeline which uses a novel interest point detector and Rotation- Invariant Fast Feature (RIFF) descriptors. The proposed RIFF algorithm is 15× faster than SURF1 while producing large-scale retrieval results that are comparable to SIFT.2 Such high-speed features benefit a range of applications from Mobile Augmented Reality (MAR) to web-scale image retrieval and analysis.
Full Text Available Finger knuckle print is considered as one of the emerging hand biometric traits due to its potentiality toward the identification of individuals. This paper contributes a new method for personal recognition using finger knuckle print based on two approaches namely, geometric and texture analyses. In the first approach, the shape oriented features of the finger knuckle print are extracted by means of angular geometric analysis and then integrated to achieve better precision rate. Whereas, the knuckle texture feature analysis is carried out by means of multi-resolution transform known as Curvelet transform. This Curvelet transform has the ability to approximate curved singularities with minimum number of Curvelet coefficients. Since, finger knuckle patterns mainly consist of lines and curves, Curvelet transform is highly suitable for its representation. Further, the Curvelet transform decomposes the finger knuckle image into Curvelet sub-bands which are termed as ‘Curvelet knuckle’. Finally, principle component analysis is applied on each Curvelet knuckle for extracting its feature vector through the covariance matrix derived from their Curvelet coefficients. Extensive experiments were conducted using PolyU database and IIT finger knuckle database. The experimental results confirm that, our proposed method shows a high recognition rate of 98.72% with lower false acceptance rate of 0.06%.
Kevin J.Y. Lam
Full Text Available Embodied theories of language postulate that language meaning is stored in modality-specific brain areas generally involved in perception and action in the real world. However, the temporal dynamics of the interaction between modality-specific information and lexical-semantic processing remain unclear. We investigated the relative timing at which two types of modality-specific information (action-based and visual-form information contribute to lexical-semantic comprehension. To this end, we applied a behavioral priming paradigm in which prime and target words were related with respect to (1 action features, (2 visual features, or (3 semantically associative information. Using a Go/No-Go lexical decision task, priming effects were measured across four different inter-stimulus intervals (ISI = 100 ms, 250 ms, 400 ms, and 1,000 ms to determine the relative time course of the different features . Notably, action priming effects were found in ISIs of 100 ms, 250 ms, and 1,000 ms whereas a visual priming effect was seen only in the ISI of 1,000 ms. Importantly, our data suggest that features follow different time courses of activation during word recognition. In this regard, feature activation is dynamic, measurable in specific time windows but not in others. Thus the current study (1 demonstrates how multiple ISIs can be used within an experiment to help chart the time course of feature activation and (2 provides new evidence for embodied theories of language.
Li, Frédéric; Shirahama, Kimiaki; Nisar, Muhammad Adeel; Köping, Lukas; Grzegorzek, Marcin
Getting a good feature representation of data is paramount for Human Activity Recognition (HAR) using wearable sensors. An increasing number of feature learning approaches-in particular deep-learning based-have been proposed to extract an effective feature representation by analyzing large amounts of data. However, getting an objective interpretation of their performances faces two problems: the lack of a baseline evaluation setup, which makes a strict comparison between them impossible, and the insufficiency of implementation details, which can hinder their use. In this paper, we attempt to address both issues: we firstly propose an evaluation framework allowing a rigorous comparison of features extracted by different methods, and use it to carry out extensive experiments with state-of-the-art feature learning approaches. We then provide all the codes and implementation details to make both the reproduction of the results reported in this paper and the re-use of our framework easier for other researchers. Our studies carried out on the OPPORTUNITY and UniMiB-SHAR datasets highlight the effectiveness of hybrid deep-learning architectures involving convolutional and Long-Short-Term-Memory (LSTM) to obtain features characterising both short- and long-term time dependencies in the data.
Ibrahim M. El-Henawy
A comparison between different features of speech is given. The features based on the Cepstrum give accuracy of 94% for speech recognition while the features based on the short time energy in time domain give accuracy of 92%. The features based on formant frequencies give accuracy of 95.5%. It is clear that the features based on MFCCs with accuracy of 98% give the best accuracy rate. So the features depend on MFCCs with HMMs may be recommended for recognition of the spoken Arabic digits.
Veacheslav L. Perju
Full Text Available The new image complexity informative feature is proposed. The experimental estimation of the image complexity is carried out. There are elaborated two optical-electronic processors for image complexity calculation. The determination of the necessary number of the image's digitization elements depending on the image complexity was carried out. The accuracy of the image complexity feature calculation was made.
Full Text Available Representation based classification methods, such as Sparse Representation Classification (SRC and Linear Regression Classification (LRC have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.
Yang, Gongping; Xi, Xiaoming; Yin, Yilong
Finger vein patterns have recently been recognized as an effective biometric identifier. In this paper, we propose a finger vein recognition method based on a personalized best bit map (PBBM). Our method is rooted in a local binary pattern based method and then inclined to use the best bits only for matching. We first present the concept of PBBM and the generating algorithm. Then we propose the finger vein recognition framework, which consists of preprocessing, feature extraction, and matching. Finally, we design extensive experiments to evaluate the effectiveness of our proposal. Experimental results show that PBBM achieves not only better performance, but also high robustness and reliability. In addition, PBBM can be used as a general framework for binary pattern based recognition. PMID:22438735
German Ignacio Parisi
Full Text Available The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented towards human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR networks that obtain progressively generalized representations of sensory inputs and learn inherent spatiotemporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best 21 results for a public benchmark of domestic daily actions.
Full Text Available In this paper, an automatic radar waveform recognition system in a high noise environment is proposed. Signal waveform recognition techniques are widely applied in the field of cognitive radio, spectrum management and radar applications, etc. We devise a system to classify the modulating signals widely used in low probability of intercept (LPI radar detection systems. The radar signals are divided into eight types of classifications, including linear frequency modulation (LFM, BPSK (Barker code modulation, Costas codes and polyphase codes (comprising Frank, P1, P2, P3 and P4. The classifier is Elman neural network (ENN, and it is a supervised classification based on features extracted from the system. Through the techniques of image filtering, image opening operation, skeleton extraction, principal component analysis (PCA, image binarization algorithm and Pseudo–Zernike moments, etc., the features are extracted from the Choi–Williams time-frequency distribution (CWD image of the received data. In order to reduce the redundant features and simplify calculation, the features selection algorithm based on mutual information between classes and features vectors are applied. The superiority of the proposed classification system is demonstrated by the simulations and analysis. Simulation results show that the overall ratio of successful recognition (RSR is 94.7% at signal-to-noise ratio (SNR of −2 dB.
Al Mahafzah, Harbi; Imran, Mohammad; Supreetha Gowda H., D.
In this paper, we have developed Biometric recognition system adopting hand based modality Handvein,which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor, LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen well known classifiers such as KNN and SVM for classification. We have experimented and tabulated results of single algorithm recognition rate for Handvein under different distance measures and kernel options. The feature level fusion is carried out which increased the performance level.
Dielmann, Alfred; Renals, Steve
Joint Dialogue Act segmentation and classification of the new AMI meeting corpus has been performed through an integrated framework based on a switching dynamic Bayesian network and a set of continuous features and language models. The recognition process is based on a dictionary of 15 DA classes tailored for group decision-making. Experimental results show that a novel interpolated Factored Language Model results in a low error rate on the automatic segmentation task, an...
Full Text Available owadays, there have been so many development of robot that can receive command and do speech recognition and face recognition. In this research, we develop a humanoid robot system with a controller that based on Raspberry Pi 2. The methods we used are based on Audio recognition and detection, and also face recognition using PCA (Principal Component Analysis with OpenCV and Python. PCA is one of the algorithms to do face detection by doing reduction to the number of dimension of the image possessed. The result of this reduction process is then known as eigenface to do face recognition process. In this research, we still find a false recognition. It can be caused by many things, like database condition, maybe the images are too dark or less varied, blur test image, etc. The accuracy from 3 tests on different people is about 93% (28 correct recognitions out of 30.
Meng, Xianjing; Yang, Gongping; Yin, Yilong; Xiao, Rongyang
Finger vein patterns are considered as one of the most promising biometric authentication methods for its security and convenience. Most of the current available finger vein recognition methods utilize features from a segmented blood vessel network. As an improperly segmented network may degrade the recognition accuracy, binary pattern based methods are proposed, such as Local Binary Pattern (LBP), Local Derivative Pattern (LDP) and Local Line Binary Pattern (LLBP). However, the rich directional information hidden in the finger vein pattern has not been fully exploited by the existing local patterns. Inspired by the Webber Local Descriptor (WLD), this paper represents a new direction based local descriptor called Local Directional Code (LDC) and applies it to finger vein recognition. In LDC, the local gradient orientation information is coded as an octonary decimal number. Experimental results show that the proposed method using LDC achieves better performance than methods using LLBP. PMID:23202194
Full Text Available Finger vein patterns are considered as one of the most promising biometric authentication methods for its security and convenience. Most of the current available finger vein recognition methods utilize features from a segmented blood vessel network. As an improperly segmented network may degrade the recognition accuracy, binary pattern based methods are proposed, such as Local Binary Pattern (LBP, Local Derivative Pattern (LDP and Local Line Binary Pattern (LLBP. However, the rich directional information hidden in the finger vein pattern has not been fully exploited by the existing local patterns. Inspired by the Webber Local Descriptor (WLD, this paper represents a new direction based local descriptor called Local Directional Code (LDC and applies it to finger vein recognition. In LDC, the local gradient orientation information is coded as an octonary decimal number. Experimental results show that the proposed method using LDC achieves better performance than methods using LLBP.
Liu, Shuai; Liu, Yuan-Ning; Zhu, Xiao-Dong; Huo, Guang; Liu, Wen-Tao; Feng, Jia-Kai
Aiming at multicategory iris recognition under illumination and noise interference, this paper proposes a method of iris double recognition based on a modified evolutionary neural network. An equalization histogram and Laplace of Gaussian operator are used to process the iris to suppress illumination and noise interference and Haar wavelet to convert the iris feature to binary feature encoding. Calculate the Hamming distance for the test iris and template iris , and compare with classification threshold, determine the type of iris. If the iris cannot be identified as a different type, there needs to be a secondary recognition. The connection weights in back-propagation (BP) neural network use modified evolutionary neural network to adaptively train. The modified neural network is composed of particle swarm optimization with mutation operator and BP neural network. According to different iris libraries in different circumstances of experimental results, under illumination and noise interference, the correct recognition rate of this algorithm is higher, the ROC curve is closer to the coordinate axis, the training and recognition time is shorter, and the stability and the robustness are better.
Full Text Available -optimal classification accuracy. Therefore to improve the classification accuracy, a new feature vector, combining joint angles and the relative position of the arm joints with respect to the head, is proposed. A k-means classifier is used to cluster each gesture. New...
Teulings, Hans-leo L; Schomaker, L R; Impedovo, S.
A handwriting pattern is considered as a sequence of ballistic strokes. Replications of a pattern may be generated from a single, higher-level memory representation, acting as a motor program. Therefore, those stroke features which show the most invariant pattern are probably related to the
Choe, Howard C.; Karlsen, Robert E.; Gerhart, Grant R.; Meitzler, Thomas J.
We present, in this paper, a wavelet-based acoustic signal analysis to remotely recognize military vehicles using their sound intercepted by acoustic sensors. Since expedited signal recognition is imperative in many military and industrial situations, we developed an algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system. This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching. The current status of the algorithm does not require any training. The training is replaced by human selection of reference signals (e.g., squeak or engine exhaust sound) distinctive to each individual vehicle based on human perception. This allows a fast archiving of any new vehicle type in the database once the signal is collected. The wavelet preprocessing provides time-frequency multiresolution analysis using discrete wavelet transform (DWT). Within each resolution level, feature vectors are generated from statistical parameters and energy content of the wavelet coefficients. After applying our algorithm on the intercepted acoustic signals, the resultant feature vectors are compared with the reference vehicle feature vectors in the database using statistical pattern matching to determine the type of vehicle from where the signal originated. Certainly, statistical pattern matching can be replaced by an artificial neural network (ANN); however, the ANN would require training data sets and time to train the net. Unfortunately, this is not always possible for many real world situations, especially collecting data sets from unfriendly ground vehicles to train the ANN. Our methodology using wavelet preprocessing and statistical pattern matching provides robust acoustic signal recognition. We also present an example of vehicle recognition using acoustic signals collected from two different military ground vehicles. In this paper, we will
Prima Dewi Purnamasari
Full Text Available The development of automatic emotion detection systems has recently gained significant attention due to the growing possibility of their implementation in several applications, including affective computing and various fields within biomedical engineering. Use of the electroencephalograph (EEG signal is preferred over facial expression, as people cannot control the EEG signal generated by their brain; the EEG ensures a stronger reliability in the psychological signal. However, because of its uniqueness between individuals and its vulnerability to noise, use of EEG signals can be rather complicated. In this paper, we propose a methodology to conduct EEG-based emotion recognition by using a filtered bispectrum as the feature extraction subsystem and an artificial neural network (ANN as the classifier. The bispectrum is theoretically superior to the power spectrum because it can identify phase coupling between the nonlinear process components of the EEG signal. In the feature extraction process, to extract the information contained in the bispectrum matrices, a 3D pyramid filter is used for sampling and quantifying the bispectrum value. Experiment results show that the mean percentage of the bispectrum value from 5 × 5 non-overlapped 3D pyramid filters produces the highest recognition rate. We found that reducing the number of EEG channels down to only eight in the frontal area of the brain does not significantly affect the recognition rate, and the number of data samples used in the training process is then increased to improve the recognition rate of the system. We have also utilized a probabilistic neural network (PNN as another classifier and compared its recognition rate with that of the back-propagation neural network (BPNN, and the results show that the PNN produces a comparable recognition rate and lower computational costs. Our research shows that the extracted bispectrum values of an EEG signal using 3D filtering as a feature extraction
Lv, Xiong; Wang, Shuang; Li, Xiangyang; Jiang, Shuqiang
Object recognition has wide applications in the area of human-machine interaction and multimedia retrieval. However, due to the problem of visual polysemous and concept polymorphism, it is still a great challenge to obtain reliable recognition result for the 2D images. Recently, with the emergence and easy availability of RGB-D equipment such as Kinect, this challenge could be relieved because the depth channel could bring more information. A very special and important case of object recognition is hand-held object recognition, as hand is a straight and natural way for both human-human interaction and human-machine interaction. In this paper, we study the problem of 3D object recognition by combining heterogenous features with different modalities and extraction techniques. For hand-craft feature, although it reserves the low-level information such as shape and color, it has shown weakness in representing hiconvolutionalgh-level semantic information compared with the automatic learned feature, especially deep feature. Deep feature has shown its great advantages in large scale dataset recognition but is not always robust to rotation or scale variance compared with hand-craft feature. In this paper, we propose a method to combine hand-craft point cloud features and deep learned features in RGB and depth channle. First, hand-held object segmentation is implemented by using depth cues and human skeleton information. Second, we combine the extracted hetegerogenous 3D features in different stages using linear concatenation and multiple kernel learning (MKL). Then a training model is used to recognize 3D handheld objects. Experimental results validate the effectiveness and gerneralization ability of the proposed method.
Kristensen, Rasmus Lyngby; Tan, Zheng-Hua; Ma, Zhanyu
This paper conducts a survey of modern binary pattern flavored feature extractors applied to the Facial Expression Recognition (FER) problem. In total, 26 different feature extractors are included, of which six are selected for in depth description. In addition, the paper unifies important FER...
Kamminga, Jacob Wilhelm; Le Viet Duc, Duc Viet; Meijers, Jan Pieter; Bisby, Helena C.; Meratnia, Nirvana; Havinga, Paul J.M.
Fundamental challenges faced by real-time animal activity recognition include variation in motion data due to changing sensor orientations, numerous features, and energy and processing constraints of animal tags. This paper aims at finding small optimal feature sets that are lightweight and robust
Freire, Alejo; Lee, Kang
Tested in two studies 4- to 7-year-olds' face recognition by manipulating the faces' configural and featural information. Found that even with only a single 5-second exposure, most children could use configural and featural cues to make identity judgments. Repeated exposure and feedback improved others' performance. Even proficient memories were…
Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M
This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Full Text Available Facial expression has many applications in human-computer interaction. Although feature extraction and selection have been well studied, the speciﬁcity of each expression variation is not fully explored in state-of-the-art works. In this work, the problem of multiclass expression recognition is converted into triplet-wise expression recognition. For each expression triplet, a new feature optimization model based on action unit (AU weighting and patch weight optimization is proposed to represent the speciﬁcity of the expression triplet. The sparse representation-based approach is then proposed to detect the active AUs of the testing sample for better generalization. The algorithm achieved competitive accuracies of 89.67% and 94.09% for the Jaffe and Cohn–Kanade (CK+ databases, respectively. Better cross-database performance has also been observed.
Khare, Manish; Jeon, Moongu
Providing accurate recognition of human activities is a challenging problem for visual surveillance applications. In this paper, we present a simple and efficient algorithm for human activity recognition based on a wavelet transform. We adopt discrete wavelet transform (DWT) coefficients as a feature of human objects to obtain advantages of its multiresolution approach. The proposed method is tested on multiple levels of DWT. Experiments are carried out on different standard action datasets including KTH and i3D Post. The proposed method is compared with other state-of-the-art methods in terms of different quantitative performance measures. The proposed method is found to have better recognition accuracy in comparison to the state-of-the-art methods.
Abboud, Ali J.; Sellahewa, Harin; Jassim, Sabah A.
Recent advances in biometric technology have pushed towards more robust and reliable systems. We aim to build systems that have low recognition errors and are less affected by variation in recording conditions. Recognition errors are often attributed to the usage of low quality biometric samples. Hence, there is a need to develop new intelligent techniques and strategies to automatically measure/quantify the quality of biometric image samples and if necessary restore image quality according to the need of the intended application. In this paper, we present no-reference image quality measures in the spatial domain that have impact on face recognition. The first is called symmetrical adaptive local quality index (SALQI) and the second is called middle halve (MH). Also, an adaptive strategy has been developed to select the best way to restore the image quality, called symmetrical adaptive histogram equalization (SAHE). The main benefits of using quality measures for adaptive strategy are: (1) avoidance of excessive unnecessary enhancement procedures that may cause undesired artifacts, and (2) reduced computational complexity which is essential for real time applications. We test the success of the proposed measures and adaptive approach for a wavelet-based face recognition system that uses the nearest neighborhood classifier. We shall demonstrate noticeable improvements in the performance of adaptive face recognition system over the corresponding non-adaptive scheme.
Nasrollahi, Kamal; Moeslund, Thomas B.
Face recognition is still a very challenging task when the input face image is noisy, occluded by some obstacles, of very low-resolution, not facing the camera, and not properly illuminated. These problems make the feature extraction and consequently the face recognition system unstable....... The proposed system in this paper introduces the novel idea of using Haar-like features, which have commonly been used for object detection, along with a probabilistic classifier for face recognition. The proposed system is simple, real-time, effective and robust against most of the mentioned problems....... Experimental results on public databases show that the proposed system indeed outperforms the state-of-the-art face recognition systems....
Keong Chen Wong; Yusof Yusri
This paper presents an algorithm for efficiently recognizing and determining the convexity of an edge blend feature. The algorithm first recognizes all of the edge blend features from the Boundary Representation of a part; then a series of convexity test have been run on the recognized edge blend features. The novelty of the presented algorithm lies in, instead of each recognized blend feature is suppressed as most of researchers did, the recognized blend features of this research are gone th...
Sadeghi, Zahra; Testolin, Alberto
In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.
Gong, Wenjuan; Gonzàlez, Jordi; Roca, Francesc Xavier
We present a novel method for human action recognition (HAR) based on estimated poses from image sequences. We use 3D human pose data as additional information and propose a compact human pose representation, called a weak pose, in a low-dimensional space while still keeping the most discriminative information for a given pose. With predicted poses from image features, we map the problem from image feature space to pose space, where a Bag of Poses (BOP) model is learned for the final goal of HAR. The BOP model is a modified version of the classical bag of words pipeline by building the vocabulary based on the most representative weak poses for a given action. Compared with the standard k-means clustering, our vocabulary selection criteria is proven to be more efficient and robust against the inherent challenges of action recognition. Moreover, since for action recognition the ordering of the poses is discriminative, the BOP model incorporates temporal information: in essence, groups of consecutive poses are considered together when computing the vocabulary and assignment. We tested our method on two well-known datasets: HumanEva and IXMAS, to demonstrate that weak poses aid to improve action recognition accuracies. The proposed method is scene-independent and is comparable with the state-of-art method.
Erlandson, Erik J.; Trenkle, John M.; Vogt, Robert C., III
Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition system for machine-printed Arabic text has been implemented. Arabic is a script language, and is therefore difficult to segment at the character level. Character segmentation has been avoided by recognizing text imagery of complete words. The Arabic recognition system computes a vector of image-morphological features on a query word image. This vector is matched against a precomputed database of vectors from a lexicon of Arabic words. Vectors from the database with the highest match score are returned as hypotheses for the unknown image. Several feature vectors may be stored for each word in the database. Database feature vectors generated using multiple fonts and noise models allow the system to be tuned to its input stream. Used in conjunction with database pruning techniques, this Arabic recognition system has obtained promising word recognition rates on low-quality multifont text imagery.
Severcan, M.; Uzunalioglu, H.
A model-based object recognition system is developed for recognition of polyhedral objects. The system consists of feature extraction, modelling and matching stages. Linear features are used for object descriptions. Lines are obtained from edges using rotation transform. For modelling and recognition process, geometric hashing method is utilized. Each object is modelled using 2-D views taken from the viewpoints on the viewing sphere. A hidden line elimination algorithm is used to find these views from the wire frame model of the objects. The recognition experiments yielded satisfactory results. (author). 8 refs, 5 figs
Feng, Guang; Li, Hengjian; Dong, Jiwen; Chen, Xi; Yang, Huiru
In this paper, we proposed a joint and collaborative representation with Volterra kernel convolution feature (JCRVK) for face recognition. Firstly, the candidate face images are divided into sub-blocks in the equal size. The blocks are extracted feature using the two-dimensional Voltera kernels discriminant analysis, which can better capture the discrimination information from the different faces. Next, the proposed joint and collaborative representation is employed to optimize and classify the local Volterra kernels features (JCR-VK) individually. JCR-VK is very efficiently for its implementation only depending on matrix multiplication. Finally, recognition is completed by using the majority voting principle. Extensive experiments on the Extended Yale B and AR face databases are conducted, and the results show that the proposed approach can outperform other recently presented similar dictionary algorithms on recognition accuracy.
Xie, Zhenzhu; Yang, Fumeng; Zhang, Yuming; Wu, Congzhong
In this paper,we address the face sketch recognition problem. Firstly, we utilize the eigenface algorithm to convert a sketch image into a synthesized sketch face image. Subsequently, considering the low-level vision problem in synthesized face sketch image .Super resolution reconstruction algorithm based on CNN(convolutional neural network) is employed to improve the visual effect. To be specific, we uses a lightweight super-resolution structure to learn a residual mapping instead of directly mapping the feature maps from the low-level space to high-level patch representations, which making the networks are easier to optimize and have lower computational complexity. Finally, we adopt LDA(Linear Discriminant Analysis) algorithm to realize face sketch recognition on synthesized face image before super resolution and after respectively. Extensive experiments on the face sketch database(CUFS) from CUHK demonstrate that the recognition rate of SVM(Support Vector Machine) algorithm improves from 65% to 69% and the recognition rate of LDA(Linear Discriminant Analysis) algorithm improves from 69% to 75%.What'more,the synthesized face image after super resolution can not only better describer image details such as hair ,nose and mouth etc, but also improve the recognition accuracy effectively.
Lee, Jen-Chun; Chang, Chien-Ping; Chen, Wei-Kuei
Directional empirical mode decomposition (DEMD) has recently been proposed to make empirical mode decomposition suitable for the processing of texture analysis. Using DEMD, samples are decomposed into a series of images, referred to as two-dimensional intrinsic mode functions (2-D IMFs), from finer to large scale. A DEMD-based 2 linear discriminant analysis (LDA) for palm vein recognition is proposed. The proposed method progresses through three steps: (i) a set of 2-D IMF features of various scale and orientation are extracted using DEMD, (ii) the 2LDA method is then applied to reduce the dimensionality of the feature space in both the row and column directions, and (iii) the nearest neighbor classifier is used for classification. We also propose two strategies for using the set of 2-D IMF features: ensemble DEMD vein representation (EDVR) and multichannel DEMD vein representation (MDVR). In experiments using palm vein databases, the proposed MDVR-based 2LDA method achieved recognition accuracy of 99.73%, thereby demonstrating its feasibility for palm vein recognition.
Yao, Shoukui; Qin, Xiaojuan
Since the resolution of remote sensing infrared images is low, the features of ship targets become unstable. The issue of how to recognize ships with fuzzy features is an open problem. In this paper, we propose a novel ship target recognition algorithm based on Gaussian mixture models (GMMs). In the proposed algorithm, there are mainly two steps. At the first step, the Hu moments of these ship target images are calculated, and the GMMs are trained on the moment features of ships. At the second step, the moment feature of each ship image is assigned to the trained GMMs for recognition. Because of the scale, rotation, translation invariance property of Hu moments and the power feature-space description ability of GMMs, the GMMs-based ship target recognition algorithm can recognize ship reliably. Experimental results of a large simulating image set show that our approach is effective in distinguishing different ship types, and obtains a satisfactory ship recognition performance.
Zhang, Yu; Wu, Jianxin; Cai, Jianfei
In large-scale visual recognition and image retrieval tasks, feature vectors, such as Fisher vector (FV) or the vector of locally aggregated descriptors (VLAD), have achieved state-of-the-art results. However, the combination of the large numbers of examples and high-dimensional vectors necessitates dimensionality reduction, in order to reduce its storage and CPU costs to a reasonable range. In spite of the popularity of various feature compression methods, this paper shows that the feature (dimension) selection is a better choice for high-dimensional FV/VLAD than the feature (dimension) compression methods, e.g., product quantization. We show that strong correlation among the feature dimensions in the FV and the VLAD may not exist, which renders feature selection a natural choice. We also show that, many dimensions in FV/VLAD are noise. Throwing them away using feature selection is better than compressing them and useful dimensions altogether using feature compression methods. To choose features, we propose an efficient importance sorting algorithm considering both the supervised and unsupervised cases, for visual recognition and image retrieval, respectively. Combining with the 1-bit quantization, feature selection has achieved both higher accuracy and less computational cost than feature compression methods, such as product quantization, on the FV and the VLAD image representations.
Alim, M. Affan; Baig, Misbah M.; Mehboob, Shahzain; Naseem, Imran
In this paper, we propose a framework for low cost secure electronic voting system based on face recognition. Essentially Local Binary Pattern (LBP) is used for face feature characterization in texture format followed by chi-square distribution is used for image classification. Two parallel systems are developed based on smart phone and web applications for face learning and verification modules. The proposed system has two tire security levels by using person ID followed by face verification. Essentially class specific threshold is associated for controlling the security level of face verification. Our system is evaluated three standard databases and one real home based database and achieve the satisfactory recognition accuracies. Consequently our propose system provides secure, hassle free voting system and less intrusive compare with other biometrics.
Babloyantz, A.; Ivanov, V.V.; Zrelov, P.V.
A new approach for the detection of slight changes in the form of the ECG signal is proposed. It is based on the approximation of raw ECG data inside each RR-interval by the expansion in polynomials of special type and on the classification of samples represented by sets of expansion coefficients using a layered feed-forward neural network. The transformation applied provides significantly simpler data structure, stability to noise and to other accidental factors. A by-product of the method is the compression of ECG data with factor 5
Petar S. Aleksic
Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
In this dissertation, we focus on several aspects of models that aim to predict performance of a face recognition system. Performance prediction models are commonly based on the following two types of performance predictor features: a) image quality features; and b) features derived solely from
Full Text Available Speech is the basic mean of communication among human beings.With the increase of transaction between human and machine, necessity of automatic dialogue and removing human factor has been considered. The aim of this study was to determine a set of affective features the speech signal is based on emotions. In this study system was designs that include three mains sections, features extraction, features selection and classification. After extraction of useful features such as, mel frequency cepstral coefficient (MFCC, linear prediction cepstral coefficients (LPC, perceptive linear prediction coefficients (PLP, ferment frequency, zero crossing rate, cepstral coefficients and pitch frequency, Mean, Jitter, Shimmer, Energy, Minimum, Maximum, Amplitude, Standard Deviation, at a later stage with filter methods such as Pearson Correlation Coefficient, t-test, relief and information gain, we came up with a method to rank and select effective features in emotion recognition. Then Result, are given to the classification system as a subset of input. In this classification stage, multi support vector machine are used to classify seven type of emotion. According to the results, that method of relief, together with multi support vector machine, has the most classification accuracy with emotion recognition rate of 93.94%.
Full Text Available The traditional classification methods for limb motion recognition based on sEMG have been deeply researched and shown promising results. However, information loss during feature extraction reduces the recognition accuracy. To obtain higher accuracy, the deep learning method was introduced. In this paper, we propose a parallel multiple-scale convolution architecture. Compared with the state-of-art methods, the proposed architecture fully considers the characteristics of the sEMG signal. Larger sizes of kernel filter than commonly used in other CNN-based hand recognition methods are adopted. Meanwhile, the characteristics of the sEMG signal, that is, muscle independence, is considered when designing the architecture. All the classification methods were evaluated on the NinaPro database. The results show that the proposed architecture has the highest recognition accuracy. Furthermore, the results indicate that parallel multiple-scale convolution architecture with larger size of kernel filter and considering muscle independence can significantly increase the classification accuracy.
Bartko, Susan J.; Winters, Boyer D.; Cowell, Rosemary A.; Saksida, Lisa M.; Bussey, Timothy J.
The perirhinal cortex (PRh) has a well-established role in object recognition memory. More recent studies suggest that PRh is also important for two-choice visual discrimination tasks. Specifically, it has been suggested that PRh contains conjunctive representations that help resolve feature ambiguity, which occurs when a task cannot easily be…
Ribeiro, Alexandra B.; Nielsen, Allan Aasbjerg
qualitative microprobe results: present elements Al, Si, Cr, Fe, As (associated with others). Selected groups of calibrated images (same light conditions and magnification) submitted to discriminant analysis, in order to find a pattern of recognition in the soil features corresponding to contamination already...
V. K. Hohlov
Full Text Available The article forms the rationale for selecting the informative features of the helicopter and aircraft acoustic signals to solve a problem of their recognition and shows that the most informative ones are the counts of extrema in the energy spectra of the input signals, which represent non-centered random variables. An apparatus of the multiple initial regression coefficients was selected as a mathematical tool of research. The application of digital re-circulators with positive and negative feedbacks, which have the comb-like frequency characteristics, solves the problem of selecting informative features. A distinguishing feature of such an approach is easy agility of the comb frequency characteristics just through the agility of a delay value of digital signal in the feedback circuit. Adding an adaptation block to the selection block of the informative features enables us to ensure the invariance of used informative feature and counts of local extrema of the spectral power density to the airspeed of a helicopter. The paper gives reasons for the principle of adaptation and the structure of the adaptation block. To form the discriminator characteristics are used the cross-correlation statistical characteristics of the quadrature components of acoustic signal realizations, obtained by Hilbert transform. The paper proposes to provide signal recognition using a regression algorithm that allows handling the non-centered informative features and using a priori information about coefficients of initial regression of signal and noise.The simulation in Matlab Simulink has shown that selected informative features of signals in regressive processing of signal realizations of 0.5 s duration have good separability, and based on a set of 100 acoustic signal realizations in each class in full-scale conditions, has proved ensuring a correct recognition probability of 0.975, at least. The considered principles of informative features selection and adaptation can
Jeon, Woojay; Juang, Biing-Hwang
To address the limitations of traditional cepstrum or LPC based front-end processing methods for automatic speech recognition, more elaborate methods based on physiological models of the human auditory system may be used to achieve more robust speech recognition in adverse environments. For this purpose, a modified version of a model of the primary auditory cortex featuring a three dimensional mapping of auditory spectra [Wang and Shamma, IEEE Trans. Speech Audio Process. 3, 382-395 (1995)] is adopted and investigated for its use as an improved front-end processing method. The study is conducted in two ways: first, by relating the model's redundant representation to traditional spectral representations and showing that the former not only encompasses information provided by the latter, but also reveals more relevant information that makes it superior in describing the identifying features of speech signals; and second, by observing the statistical features of the representation for various classes of sound to show how different identifying features manifest themselves as specific patterns on the cortical map, thereby becoming a place-coded data set on which detection theory could be applied to simulate auditory perception and cognition.
Paleari, Marco; Luciani, Riccardo; Ariano, Paolo
This work reports on preliminary results about on hand movement recognition with Near InfraRed Spectroscopy (NIRS) and surface ElectroMyoGraphy (sEMG). Either basing on physical contact (touchscreens, data-gloves, etc.), vision techniques (Microsoft Kinect, Sony PlayStation Move, etc.), or other modalities, hand movement recognition is a pervasive function in today environment and it is at the base of many gaming, social, and medical applications. Albeit, in recent years, the use of muscle information extracted by sEMG has spread out from the medical applications to contaminate the consumer world, this technique still falls short when dealing with movements of the hand. We tested NIRS as a technique to get another point of view on the muscle phenomena and proved that, within a specific movements selection, NIRS can be used to recognize movements and return information regarding muscles at different depths. Furthermore, we propose here three different multimodal movement recognition approaches and compare their performances.
The presented work is aimed at designing a system that will model and recognize actions and its interaction with objects. Such a system is aimed at facilitating robot task learning. Activity modeling and recognition is very important for its potential applications in surveillance, human-machine i......The presented work is aimed at designing a system that will model and recognize actions and its interaction with objects. Such a system is aimed at facilitating robot task learning. Activity modeling and recognition is very important for its potential applications in surveillance, human......-machine interface, entertainment, biomechanics etc. Recent developments in neuroscience suggest that all actions are a compositions of smaller units called primitives. Current works based on primitives for action recognition uses a supervised framework for specifying the primitives. We propose a method to extract...... primitives automatically. These primitives are to be used to generate actions based on certain rules for combining. These rules are expressed as a stochastic context free grammar. A model merging approach is adopted to learn a Hidden Markov Model to t the observed data sequences. The states of the HMM...
Ma, Yucong; Yang, Yang; Yao, Yuan; Li, Shengyuan; Zhao, Xuefeng
Ship structures are subjected to corrosion inevitably in service. Existed image-based methods are influenced by the noises in images because they recognize corrosion by extracting features. In this paper, a novel method of image-based corrosion recognition for ship steel structures is proposed. The method utilizes convolutional neural networks (CNN) and will not be affected by noises in images. A CNN used to recognize corrosion was designed through fine-turning an existing CNN architecture and trained by datasets built using lots of images. Combining the trained CNN classifier with a sliding window technique, the corrosion zone in an image can be recognized.
R, Elakkiya; K, Selvamani
Subunit segmenting and modelling in medical sign language is one of the important studies in linguistic-oriented and vision-based Sign Language Recognition (SLR). Many efforts were made in the precedent to focus the functional subunits from the view of linguistic syllables but the problem is implementing such subunit extraction using syllables is not feasible in real-world computer vision techniques. And also, the present recognition systems are designed in such a way that it can detect the signer dependent actions under restricted and laboratory conditions. This research paper aims at solving these two important issues (1) Subunit extraction and (2) Signer independent action on visual sign language recognition. Subunit extraction involved in the sequential and parallel breakdown of sign gestures without any prior knowledge on syllables and number of subunits. A novel Bayesian Parallel Hidden Markov Model (BPaHMM) is introduced for subunit extraction to combine the features of manual and non-manual parameters to yield better results in classification and recognition of signs. Signer independent action aims in using a single web camera for different signer behaviour patterns and for cross-signer validation. Experimental results have proved that the proposed signer independent subunit level modelling for sign language classification and recognition has shown improvement and variations when compared with other existing works.
Gordon, Gaile G.
This paper explores the representation of the human face by features based on the curvature of the face surface. Curature captures many features necessary to accurately describe the face, such as the shape of the forehead, jawline, and cheeks, which are not easily detected from standard intensity images. Moreover, the value of curvature at a point on the surface is also viewpoint invariant. Until recently range data of high enough resolution and accuracy to perform useful curvature calculations on the scale of the human face had been unavailable. Although several researchers have worked on the problem of interpreting range data from curved (although usually highly geometrically structured) surfaces, the main approaches have centered on segmentation by signs of mean and Gaussian curvature which have not proved sufficient in themselves for the case of the human face. This paper details the calculation of principal curvature for a particular data set, the calculation of general surface descriptors based on curvature, and the calculation of face specific descriptors based both on curvature features and a priori knowledge about the structure of the face. These face specific descriptors can be incorporated into many different recognition strategies. A system that implements one such strategy, depth template comparison, giving recognition rates between 80% and 90% is described.
Kim, J. Y.; Kim, C. H.; Kim, B. H.
In this study, the researches classifying the artificial and natural flaws in welding parts are performed using the pattern recognition technology. For this purpose the signal pattern recognition package including the user defined function was developed and the total procedure including the digital signal processing, feature extraction, feature selection and classifier selection is treated by bulk. Specially it is composed with and discussed using the statistical classifier such as the linear discriminant function classifier, the empirical Bayesian classifier. Also, the pattern recognition technology is applied to classification problem of natural flaw(i.e multiple classification problem-crack, lack of penetration, lack of fusion, porosity, and slag inclusion, the planar and volumetric flaw classification problem). According to this results, if appropriately teamed the neural network classifier is better than stastical classifier in the classification problem of natural flaw. And it is possible to acquire the recognition rate of 80% above through it is different a little according to domain extracting the feature and the classifier.
Full Text Available This paper presents a novel and effective method for facial expression recognition including happiness, disgust, fear, anger, sadness, surprise, and neutral state. The proposed method utilizes a regularized discriminant analysis-based boosting algorithm (RDAB with effective Gabor features to recognize the facial expressions. Entropy criterion is applied to select the effective Gabor feature which is a subset of informative and nonredundant Gabor features. The proposed RDAB algorithm uses RDA as a learner in the boosting algorithm. The RDA combines strengths of linear discriminant analysis (LDA and quadratic discriminant analysis (QDA. It solves the small sample size and ill-posed problems suffered from QDA and LDA through a regularization technique. Additionally, this study uses the particle swarm optimization (PSO algorithm to estimate optimal parameters in RDA. Experiment results demonstrate that our approach can accurately and robustly recognize facial expressions.
Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi
Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.
Zhou, Daoxiang; Yang, Dan; Zhang, Xiaohong; Huang, Sheng; Feng, Shu
Currently, considerable efforts have been devoted to devise image representation. However, handcrafted methods need strong domain knowledge and show low generalization ability, and conventional feature learning methods require enormous training data and rich parameters tuning experience. A lightened feature learner is presented to solve these problems with application to face recognition, which shares similar topology architecture as a convolutional neural network. Our model is divided into three components: cascaded convolution filters bank learning layer, nonlinear processing layer, and feature pooling layer. Specifically, in the filters learning layer, we use K-means to learn convolution filters. Features are extracted via convoluting images with the learned filters. Afterward, in the nonlinear processing layer, hyperbolic tangent is employed to capture the nonlinear feature. In the feature pooling layer, to remove the redundancy information and incorporate the spatial layout, we exploit multilevel spatial pyramid second-order pooling technique to pool the features in subregions and concatenate them together as the final representation. Extensive experiments on four representative datasets demonstrate the effectiveness and robustness of our model to various variations, yielding competitive recognition results on extended Yale B and FERET. In addition, our method achieves the best identification performance on AR and labeled faces in the wild datasets among the comparative methods.
Sun, Junding; Wu, Xiaosheng
This paper presents a simple, efficient, yet robust approach, named joint orthogonal combination of local ternary pattern, for automatic forward-looking infrared target recognition. It gives more advantages to describe the macroscopic textures and microscopic textures by fusing variety of scales than the traditional LBP-based methods. In addition, it can effectively reduce the feature dimensionality. Further, the rotation invariant and uniform scheme, the robust LTP, and soft concave-convex partition are introduced to enhance its discriminative power. Experimental results demonstrate that the proposed method can achieve competitive results compared with the state-of-the-art methods.
Li, Jing; Hong, Wenxue
The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.
Pan, Hong; Olsen, Søren Ingvor; Zhu, Yaping
Feature extraction and learning is critical for object recognition and detection. By embedding context cue of image attributes into the kernel descriptors, we propose a set of novel kernel descriptors called context kernel descriptors (CKD). The motivation of CKD is to use the spatial consistency...... even in high-dimensional space. In addition, the latent connection between Rényi quadratic entropy and the mapping data in kernel feature space further facilitates us to capture the geometric structure as well as the information about the underlying labels of the CKD using CSQMI. Thus the resulting...... codebook and reduced CKD are discriminative. We report superior performance of our algorithm for object recognition on benchmark datasets like Caltech-101 and CIFAR-10, as well as for detection on a challenging chicken feet dataset....
Andersen, Hans Jørgen; Nguyen, Phuong Giang
In image recognition, the common approach for extracting local features using a scale-space representation has usually three main steps; first interest points are extracted at different scales, next from a patch around each interest point the rotation is calculated with corresponding orientation...... and compensation, and finally a descriptor is computed for the derived patch (i.e. feature of the patch). To avoid the memory and computational intensive process of constructing the scale-space, we use a method where no scale-space is required This is done by dividing the given image into a number of triangles...... with sizes dependent on the content of the image, at the location of each triangle. In this paper, we will demonstrate that by rotation of the interest regions at the triangles it is possible in grey scale images to achieve a recognition precision comparable with that of MOPS. The test of the proposed method...
Yu, Jason S.; Dagli, Cihan H.
The invariant image preprocessing of moment invariants generates an invariant representation of object features which are insensitive to position, orientation, size, illusion, and contrast change. In this study ARTMAP is used for 3-D object recognition of manufacturing parts through these invariant characteristics. The analog of moment invariants created through the image preprocessing is interpreted by a binary code which is used to predict the manufacturing part through ARTMAP.
Full Text Available As a method of representing the test sample with few training samples from an overcomplete dictionary, sparse representation classification (SRC has attracted much attention in synthetic aperture radar (SAR automatic target recognition (ATR recently. In this paper, we develop a novel SAR vehicle recognition method based on sparse representation classification along with aspect information (SRCA, in which the correlation between the vehicle’s aspect angle and the sparse representation vector is exploited. The detailed procedure presented in this paper can be summarized as follows. Initially, the sparse representation vector of a test sample is solved by sparse representation algorithm with a principle component analysis (PCA feature-based dictionary. Then, the coefficient vector is projected onto a sparser one within a certain range of the vehicle’s aspect angle. Finally, the vehicle is classified into a certain category that minimizes the reconstruction error with the novel sparse representation vector. Extensive experiments are conducted on the moving and stationary target acquisition and recognition (MSTAR dataset and the results demonstrate that the proposed method performs robustly under the variations of depression angle and target configurations, as well as incomplete observation.
Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
Wahab, Abdul; Quek, Chai; Tan, Chin Keong; Takeda, Kazuya
Advancements in biometrics-based authentication have led to its increasing prominence and are being incorporated into everyday tasks. Existing vehicle security systems rely only on alarms or smart card as forms of protection. A biometric driver recognition system utilizing driving behaviors is a highly novel and personalized approach and could be incorporated into existing vehicle security system to form a multimodal identification system and offer a greater degree of multilevel protection. In this paper, detailed studies have been conducted to model individual driving behavior in order to identify features that may be efficiently and effectively used to profile each driver. Feature extraction techniques based on Gaussian mixture models (GMMs) are proposed and implemented. Features extracted from the accelerator and brake pedal pressure were then used as inputs to a fuzzy neural network (FNN) system to ascertain the identity of the driver. Two fuzzy neural networks, namely, the evolving fuzzy neural network (EFuNN) and the adaptive network-based fuzzy inference system (ANFIS), are used to demonstrate the viability of the two proposed feature extraction techniques. The performances were compared against an artificial neural network (NN) implementation using the multilayer perceptron (MLP) network and a statistical method based on the GMM. Extensive testing was conducted and the results show great potential in the use of the FNN for real-time driver identification and verification. In addition, the profiling of driver behaviors has numerous other potential applications for use by law enforcement and companies dealing with buses and truck drivers.
Some materials feel colder to the touch than others, and we can use this difference in perceived coldness for material recognition. This review focuses on the mechanisms underlying material recognition based on thermal cues. It provides an overview of the physical, perceptual, and cognitive processes involved in material recognition. It also describes engineering domains in which material recognition based on thermal cues have been applied. This includes haptic interfaces that seek to reproduce the sensations associated with contact in virtual environments and tactile sensors aim for automatic material recognition. The review concludes by considering the contributions of this line of research in both science and engineering.
Han, Eun Jung; Kang, Byung Jun; Park, Kang Ryoung; Lee, Sangyoun
Facial expression recognition can be widely used for various applications, such as emotion-based human-machine interaction, intelligent robot interfaces, face recognition robust to expression variation, etc. Previous studies have been classified as either shape- or appearance-based recognition. The shape-based method has the disadvantage that the individual variance of facial feature points exists irrespective of similar expressions, which can cause a reduction of the recognition accuracy. The appearance-based method has a limitation in that the textural information of the face is very sensitive to variations in illumination. To overcome these problems, a new facial-expression recognition method is proposed, which combines both shape and appearance information, based on the support vector machine (SVM). This research is novel in the following three ways as compared to previous works. First, the facial feature points are automatically detected by using an active appearance model. From these, the shape-based recognition is performed by using the ratios between the facial feature points based on the facial-action coding system. Second, the SVM, which is trained to recognize the same and different expression classes, is proposed to combine two matching scores obtained from the shape- and appearance-based recognitions. Finally, a single SVM is trained to discriminate four different expressions, such as neutral, a smile, anger, and a scream. By determining the expression of the input facial image whose SVM output is at a minimum, the accuracy of the expression recognition is much enhanced. The experimental results showed that the recognition accuracy of the proposed method was better than previous researches and other fusion methods.
Salomons, O.W.; van Houten, Frederikus J.A.M.; Kals, H.J.J.
Research in feature-based design is reviewed. Feature-based design is regarded as a key factor towards CAD/CAPP integration from a process planning point of view. From a design point of view, feature-based design offers possibilities for supporting the design process better than current CAD systems
Carlos E. Galván-Tejada
Full Text Available This work presents a human activity recognition (HAR model based on audio features. The use of sound as an information source for HAR models represents a challenge because sound wave analyses generate very large amounts of data. However, feature selection techniques may reduce the amount of data required to represent an audio signal sample. Some of the audio features that were analyzed include Mel-frequency cepstral coefficients (MFCC. Although MFCC are commonly used in voice and instrument recognition, their utility within HAR models is yet to be confirmed, and this work validates their usefulness. Additionally, statistical features were extracted from the audio samples to generate the proposed HAR model. The size of the information is necessary to conform a HAR model impact directly on the accuracy of the model. This problem also was tackled in the present work; our results indicate that we are capable of recognizing a human activity with an accuracy of 85% using the HAR model proposed. This means that minimum computational costs are needed, thus allowing portable devices to identify human activities using audio as an information source.
Köping, Lukas; Shirahama, Kimiaki; Grzegorzek, Marcin
Today's wearable devices like smartphones, smartwatches and intelligent glasses collect a large amount of data from their built-in sensors like accelerometers and gyroscopes. These data can be used to identify a person's current activity and in turn can be utilised for applications in the field of personal fitness assistants or elderly care. However, developing such systems is subject to certain restrictions: (i) since more and more new sensors will be available in the future, activity recognition systems should be able to integrate these new sensors with a small amount of manual effort and (ii) such systems should avoid high acquisition costs for computational power. We propose a general framework that achieves an effective data integration based on the following two characteristics: Firstly, a smartphone is used to gather and temporally store data from different sensors and transfer these data to a central server. Thus, various sensors can be integrated into the system as long as they have programming interfaces to communicate with the smartphone. The second characteristic is a codebook-based feature learning approach that can encode data from each sensor into an effective feature vector only by tuning a few intuitive parameters. In the experiments, the framework is realised as a real-time activity recognition system that integrates eight sensors from a smartphone, smartwatch and smartglasses, and its effectiveness is validated from different perspectives such as accuracies, sensor combinations and sampling rates. Copyright © 2018 Elsevier Ltd. All rights reserved.
Fachinger, Uwe; Schöpke, Birte
AAL systems require, in addition to sophisticated and reliable technology, adequate business models for their launch and sustainable establishment. This paper presents the basic features of alternative business models for a sensor-based fall recognition system which was developed within the context of the "Lower Saxony Research Network Design of Environments for Ageing" (GAL). The models were developed parallel to the R&D process with successive adaptation and concretization. An overview of the basic features (i.e. nine partial models) of the business model is given and the mutual exclusive alternatives for each partial model are presented. The partial models are interconnected and the combinations of compatible alternatives lead to consistent alternative business models. However, in the current state, only initial concepts of alternative business models can be deduced. The next step will be to gather additional information to work out more detailed models.
Full Text Available Since human-wildlife conflicts are increasing, the development of cost-effective methods for reducing damage or conflict levels is important in wildlife management. A wide range of devices to detect and deter animals causing conflict are used for this purpose, although their effectiveness is often highly variable, due to habituation to disruptive or disturbing stimuli. Automated recognition of behaviours could form a critical component of a system capable of altering the disruptive stimuli to avoid this. In this paper we present a novel method to automatically recognise goose behaviour based on vocalisations from flocks of free-living barnacle geese (Branta leucopsis. The geese were observed and recorded in a natural environment, using a shielded shotgun microphone. The classification used Support Vector Machines (SVMs, which had been trained with labeled data. Greenwood Function Cepstral Coefficients (GFCC were used as features for the pattern recognition algorithm, as they can be adjusted to the hearing capabilities of different species. Three behaviours are classified based in this approach, and the method achieves a good recognition of foraging behaviour (86–97% sensitivity, 89–98% precision and a reasonable recognition of flushing (79–86%, 66–80% and landing behaviour(73–91%, 79–92%. The Support Vector Machine has proven to be a robust classifier for this kind of classification, as generality and non-linearcapabilities are important. We conclude that vocalisations can be used to automatically detect behaviour of conflict wildlife species, and as such, may be used as an integrated part of awildlife management system.
Full Text Available In optical character recognition applications, the feature extraction method(s used to recognize document images play an important role. The features are the properties of the pattern that can be statistical, structural and/or transforms or series expansion. The structural features are difficult to compute particularly from hand-printed images. The structure of the strokes present inside the hand-printed images can be estimated using statistical means. In this paper three features have been purposed, those are based on the distribution of B/W pixels on the neighborhood of a pixel in an image. We name these features as Spiral Neighbor Density, Layer Pixel Density and Ray Density. The recognition performance of these features has been compared with two more features Neighborhood Pixels Weight and Total Distances in Four Directions already studied in our work. We have used more than 20000 Devanagari handwritten character images for conducting experiments. The experiments are conducted with two classifiers i.e. PNN and k-NN.
As a typical deep-learning model, Convolutional Neural Networks (CNNs) can be exploited to automatically extract features from images using the hierarchical structure inspired by mammalian visual system. For image classification tasks, traditional CNN models employ the softmax function for classification. However, owing to the limited capacity of the softmax function, there are some shortcomings of traditional CNN models in image classification. To deal with this problem, a new method combining Biomimetic Pattern Recognition (BPR) with CNNs is proposed for image classification. BPR performs class recognition by a union of geometrical cover sets in a high-dimensional feature space and therefore can overcome some disadvantages of traditional pattern recognition. The proposed method is evaluated on three famous image classification benchmarks, that is, MNIST, AR, and CIFAR-10. The classification accuracies of the proposed method for the three datasets are 99.01%, 98.40%, and 87.11%, respectively, which are much higher in comparison with the other four methods in most cases. PMID:28316614
Sormaz, Mladen; Young, Andrew W; Andrews, Timothy J
Theoretical accounts of face processing often emphasise feature shapes as the primary visual cue to the recognition of facial expressions. However, changes in facial expression also affect the surface properties of the face. In this study, we investigated whether this surface information can also be used in the recognition of facial expression. First, participants identified facial expressions (fear, anger, disgust, sadness, happiness) from images that were manipulated such that they varied mainly in shape or mainly in surface properties. We found that the categorization of facial expression is possible in either type of image, but that different expressions are relatively dependent on surface or shape properties. Next, we investigated the relative contributions of shape and surface information to the categorization of facial expressions. This employed a complementary method that involved combining the surface properties of one expression with the shape properties from a different expression. Our results showed that the categorization of facial expressions in these hybrid images was equally dependent on the surface and shape properties of the image. Together, these findings provide a direct demonstration that both feature shape and surface information make significant contributions to the recognition of facial expressions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Full Text Available Modal frequency is an important indicator for structural health assessment. Previous studies have shown that this indicator is substantially affected by the fluctuation of ambient conditions, such as temperature and humidity. Therefore, recognizing the pattern between modal frequency and ambient conditions is necessary for reliable long-term structural health assessment. In this article, a novel machine-learning algorithm is proposed to automatically select relevance features in modal frequency-ambient condition pattern recognition based on structural dynamic response and ambient condition measurement. In contrast to the traditional feature selection approaches by examining a large number of combinations of extracted features, the proposed algorithm conducts continuous relevance feature selection by introducing a sophisticated hyperparameterization on the weight parameter vector controlling the relevancy of different features in the prediction model. The proposed algorithm is then utilized for structural health assessment for a reinforced concrete building based on 1-year daily measurements. It turns out that the optimal model class including the relevance features for each vibrational mode is capable to capture the pattern between the corresponding modal frequency and the ambient conditions.
Ma, Lin; Zhang, D; Li, Naimin; Cai, Yan; Zuo, Wangmeng; Wang, Kuanguan
Iris analysis studies the relationship between human health and changes in the anatomy of the iris. Apart from the fact that iris recognition focuses on modeling the overall structure of the iris, iris diagnosis emphasizes the detecting and analyzing of local variations in the characteristics of irises. This paper focuses on studying the geometrical structure changes in irises that are caused by gastrointestinal diseases, and on measuring the observable deformations in the geometrical structures of irises that are related to roundness, diameter and other geometric forms of the pupil and the collarette. Pupil and collarette based features are defined and extracted. A series of experiments are implemented on our experimental pathological iris database, including manual clustering of both normal and pathological iris images, manual classification by non-specialists, manual classification by individuals with a medical background, classification ability verification for the proposed features, and disease recognition by applying the proposed features. The results prove the effectiveness and clinical diagnostic significance of the proposed features and a reliable recognition performance for automatic disease diagnosis. Our research results offer a novel systematic perspective for iridology studies and promote the progress of both theoretical and practical work in iris diagnosis.
Hasanuzzaman, Faiz M; Yang, Xiaodong; Tian, Yingli
We develop a novel camera-based computer vision technology to automatically recognize banknotes for assisting visually impaired people. Our banknote recognition system is robust and effective with the following features: 1) high accuracy: high true recognition rate and low false recognition rate, 2) robustness: handles a variety of currency designs and bills in various conditions, 3) high efficiency: recognizes banknotes quickly, and 4) ease of use: helps blind users to aim the target for image capture. To make the system robust to a variety of conditions including occlusion, rotation, scaling, cluttered background, illumination change, viewpoint variation, and worn or wrinkled bills, we propose a component-based framework by using Speeded Up Robust Features (SURF). Furthermore, we employ the spatial relationship of matched SURF features to detect if there is a bill in the camera view. This process largely alleviates false recognition and can guide the user to correctly aim at the bill to be recognized. The robustness and generalizability of the proposed system is evaluated on a dataset including both positive images (with U.S. banknotes) and negative images (no U.S. banknotes) collected under a variety of conditions. The proposed algorithm, achieves 100% true recognition rate and 0% false recognition rate. Our banknote recognition system is also tested by blind users.
Full Text Available The High Resolution Range Profile (HRRP recognition has attracted great concern in the field of Radar Automatic Target Recognition (RATR. However, traditional HRRP recognition methods failed to model high dimensional sequential data efficiently and have a poor anti-noise ability. To deal with these problems, a novel stochastic neural network model named Attention-based Recurrent Temporal Restricted Boltzmann Machine (ARTRBM is proposed in this paper. RTRBM is utilized to extract discriminative features and the attention mechanism is adopted to select major features. RTRBM is efficient to model high dimensional HRRP sequences because it can extract the information of temporal and spatial correlation between adjacent HRRPs. The attention mechanism is used in sequential data recognition tasks including machine translation and relation classification, which makes the model pay more attention to the major features of recognition. Therefore, the combination of RTRBM and the attention mechanism makes our model effective for extracting more internal related features and choose the important parts of the extracted features. Additionally, the model performs well with the noise corrupted HRRP data. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR dataset show that our proposed model outperforms other traditional methods, which indicates that ARTRBM extracts, selects, and utilizes the correlation information between adjacent HRRPs effectively and is suitable for high dimensional data or noise corrupted data.
Buddharaju, Pradeep; Pavlidis, Ioannis T; Tsiamyrtzis, Panagiotis; Bazakos, Mike
The current dominant approaches to face recognition rely on facial characteristics that are on or over the skin. Some of these characteristics have low permanency can be altered, and their phenomenology varies significantly with environmental factors (e.g., lighting). Many methodologies have been developed to address these problems to various degrees. However, the current framework of face recognition research has a potential weakness due to its very nature. We present a novel framework for face recognition based on physiological information. The motivation behind this effort is to capitalize on the permanency of innate characteristics that are under the skin. To establish feasibility, we propose a specific methodology to capture facial physiological patterns using the bioheat information contained in thermal imagery. First, the algorithm delineates the human face from the background using the Bayesian framework. Then, it localizes the superficial blood vessel network using image morphology. The extracted vascular network produces contour shapes that are characteristic to each individual. The branching points of the skeletonized vascular network are referred to as Thermal Minutia Points (TMPs) and constitute the feature database. To render the method robust to facial pose variations, we collect for each subject to be stored in the database five different pose images (center, midleft profile, left profile, midright profile, and right profile). During the classification stage, the algorithm first estimates the pose of the test image. Then, it matches the local and global TMP structures extracted from the test image with those of the corresponding pose images in the database. We have conducted experiments on a multipose database of thermal facial images collected in our laboratory, as well as on the time-gap database of the University of Notre Dame. The good experimental results show that the proposed methodology has merit, especially with respect to the problem of
Ben Jmaa, Ahmed; Mahdi, Walid; Ben Jemaa, Yousra; Ben Hamadou, Abdelmajid
We present in this paper a new approach for Arabic sign language (ArSL) alphabet recognition using hand gesture analysis. This analysis consists in extracting a histogram of oriented gradient (HOG) features from a hand image and then using them to generate an SVM Models. Which will be used to recognize the ArSL alphabet in real-time from hand gesture using a Microsoft Kinect camera. Our approach involves three steps: (i) Hand detection and localization using a Microsoft Kinect camera, (ii) hand segmentation and (iii) feature extraction using Arabic alphabet recognition. One each input image first obtained by using a depth sensor, we apply our method based on hand anatomy to segment hand and eliminate all the errors pixels. This approach is invariant to scale, to rotation and to translation of the hand. Some experimental results show the effectiveness of our new approach. Experiment revealed that the proposed ArSL system is able to recognize the ArSL with an accuracy of 90.12%.
Haiam A. Abdul-Azim
Full Text Available Recognizing human actions in video sequences has been a challenging problem in the last few years due to its real-world applications. A lot of action representation approaches have been proposed to improve the action recognition performance. Despite the popularity of local features-based approaches together with “Bag-of-Words” model for action representation, it fails to capture adequate spatial or temporal relationships. In an attempt to overcome this problem, a trajectory-based local representation approaches have been proposed to capture the temporal information. This paper introduces an improvement of trajectory-based human action recognition approaches to capture discriminative temporal relationships. In our approach, we extract trajectories by tracking the detected spatio-temporal interest points named “cuboid features” with matching its SIFT descriptors over the consecutive frames. We, also, propose a linking and exploring method to obtain efficient trajectories for motion representation in realistic conditions. Then the volumes around the trajectories’ points are described to represent human actions based on the Bag-of-Words (BOW model. Finally, a support vector machine is used to classify human actions. The effectiveness of the proposed approach was evaluated on three popular datasets (KTH, Weizmann and UCF sports. Experimental results showed that the proposed approach yields considerable performance improvement over the state-of-the-art approaches.
Dutta, A.; van Rootseler, R.T.A.; Veldhuis, Raymond N.J.; Spreeuwers, Lieuwe Jan
Face recognition is a challenging problem for surveillance view images commonly encountered in a forensic face recognition case. One approach to deal with a non-frontal test image is to synthesize the corresponding frontal view image and compare it with frontal view reference images. However, it is
Full Text Available Most motion recognition research has required tight-fitting suits for precise sensing. However, tight-suit systems have difficulty adapting to real applications, because people normally wear loose clothes. In this paper, we propose a gait recognition system with flexible piezoelectric sensors in loose clothing. The gait recognition system does not directly sense lower-body angles. It does, however, detect the transition between standing and walking. Specifically, we use the signals from the flexible sensors attached to the knee and hip parts on loose pants. We detect the periodic motion component using the discrete time Fourier series from the signal during walking. We adapt the gait detection method to a real-time patient motion and posture monitoring system. In the monitoring system, the gait recognition operates well. Finally, we test the gait recognition system with 10 subjects, for which the proposed system successfully detects walking with a success rate over 93 %.
Xie, Zhihua; Zhang, Shuai; Liu, Guodong; Xiong, Jinquan
Visible face recognition systems, being vulnerable to illumination, expression, and pose, can not achieve robust performance in unconstrained situations. Meanwhile, near infrared face images, being light- independent, can avoid or limit the drawbacks of face recognition in visible light, but its main challenges are low resolution and signal noise ratio (SNR). Therefore, near infrared and visible fusion face recognition has become an important direction in the field of unconstrained face recognition research. In this paper, a novel fusion algorithm in non-subsampled contourlet transform (NSCT) domain is proposed for Infrared and visible face fusion recognition. Firstly, NSCT is used respectively to process the infrared and visible face images, which exploits the image information at multiple scales, orientations, and frequency bands. Then, to exploit the effective discriminant feature and balance the power of high-low frequency band of NSCT coefficients, the local Gabor binary pattern (LGBP) and Local Binary Pattern (LBP) are applied respectively in different frequency parts to obtain the robust representation of infrared and visible face images. Finally, the score-level fusion is used to fuse the all the features for final classification. The visible and near infrared face recognition is tested on HITSZ Lab2 visible and near infrared face database. Experiments results show that the proposed method extracts the complementary features of near-infrared and visible-light images and improves the robustness of unconstrained face recognition.
Hong, Hyung Gil; Lee, Min Beom; Park, Kang Ryoung
Conventional finger-vein recognition systems perform recognition based on the finger-vein lines extracted from the input images or image enhancement, and texture feature extraction from the finger-vein images. In these cases, however, the inaccurate detection of finger-vein lines lowers the recognition accuracy. In the case of texture feature extraction, the developer must experimentally decide on a form of the optimal filter for extraction considering the characteristics of the image database. To address this problem, this research proposes a finger-vein recognition method that is robust to various database types and environmental changes based on the convolutional neural network (CNN). In the experiments using the two finger-vein databases constructed in this research and the SDUMLA-HMT finger-vein database, which is an open database, the method proposed in this research showed a better performance compared to the conventional methods.
Anzures, Gizelle; Kelly, David J.; Pascalis, Olivier; Quinn, Paul C.; Slater, Alan M.; de Viviés, Xavier; Lee, Kang
We used a matching-to-sample task and manipulated facial pose and feature composition to examine the other-race effect (ORE) in face identity recognition between 5 and 10 years of age. Overall, the present findings provide a genuine measure of own- and other-race face identity recognition in children that is independent of photographic and image…
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction
Abbas, Qaisar; Fondon, Irene; Sarmiento, Auxiliadora; Jiménez, Soledad; Alemany, Pedro
Diabetic retinopathy (DR) is leading cause of blindness among diabetic patients. Recognition of severity level is required by ophthalmologists to early detect and diagnose the DR. However, it is a challenging task for both medical experts and computer-aided diagnosis systems due to requiring extensive domain expert knowledge. In this article, a novel automatic recognition system for the five severity level of diabetic retinopathy (SLDR) is developed without performing any pre- and post-processing steps on retinal fundus images through learning of deep visual features (DVFs). These DVF features are extracted from each image by using color dense in scale-invariant and gradient location-orientation histogram techniques. To learn these DVF features, a semi-supervised multilayer deep-learning algorithm is utilized along with a new compressed layer and fine-tuning steps. This SLDR system was evaluated and compared with state-of-the-art techniques using the measures of sensitivity (SE), specificity (SP) and area under the receiving operating curves (AUC). On 750 fundus images (150 per category), the SE of 92.18%, SP of 94.50% and AUC of 0.924 values were obtained on average. These results demonstrate that the SLDR system is appropriate for early detection of DR and provide an effective treatment for prediction type of diabetes.
Full Text Available Getting a good feature representation of data is paramount for Human Activity Recognition (HAR using wearable sensors. An increasing number of feature learning approaches—in particular deep-learning based—have been proposed to extract an effective feature representation by analyzing large amounts of data. However, getting an objective interpretation of their performances faces two problems: the lack of a baseline evaluation setup, which makes a strict comparison between them impossible, and the insufficiency of implementation details, which can hinder their use. In this paper, we attempt to address both issues: we firstly propose an evaluation framework allowing a rigorous comparison of features extracted by different methods, and use it to carry out extensive experiments with state-of-the-art feature learning approaches. We then provide all the codes and implementation details to make both the reproduction of the results reported in this paper and the re-use of our framework easier for other researchers. Our studies carried out on the OPPORTUNITY and UniMiB-SHAR datasets highlight the effectiveness of hybrid deep-learning architectures involving convolutional and Long-Short-Term-Memory (LSTM to obtain features characterising both short- and long-term time dependencies in the data.
Creating and maintaining accurate bindings of elementary features (e.g., color and shape) in visual short-term memory (VSTM) is fundamental for veridical perception. How are low-level features bound in memory? The present work harnessed a multivariate model of perception - the General Recognition Theory (GRT) - to unravel the internal representations underlying feature binding in VSTM. On each trial, preview and target colored shapes were presented in succession, appearing in either repeated or altered spatial locations. Participants gave two same/different responses: one with respect to color and one with respect to shape. Converging GRT analyses on the accuracy confusion matrices provided substantial evidence for binding in the form of violations of perceptual independence at the level of the individual stimulus, such that positive correlations were obtained when both features repeated or alternated together, while negative correlations were obtained when one feature repeated and the other alternated. This "cloverleaf" GRT pattern of binding was similar whether the spatial location of the preview and target repeated or altered. The current results are consistent with: (a) the discrete memory "slots" model of VSTM, and (b) the notion that spatial location is not necessary for the formation of "object files." The GRT approach presented here offers a viable quantitative model for testing various questions regarding feature binding in VSTM.
Full Text Available Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.
Full Text Available Scale-Invariant Feature Transform (SIFT is being investigated more and more to realize a less-constrained hand vein recognition system. Contrast enhancement (CE, compensating for deficient dynamic range aspects, is a must for SIFT based framework to improve the performance. However, evidence of negative influence on SIFT matching brought by CE is analysed by our experiments. We bring evidence that the number of extracted keypoints resulting by gradient based detectors increases greatly with different CE methods, while on the other hand the matching result of extracted invariant descriptors is negatively influenced in terms of Precision-Recall (PR and Equal Error Rate (EER. Rigorous experiments with state-of-the-art and other CE adopted in published SIFT based hand vein recognition system demonstrate the influence. What is more, an improved SIFT model by importing the kernel of RootSIFT and Mirror Match Strategy into a unified framework is proposed to make use of the positive keypoints change and make up for the negative influence brought by CE.
van Huis, Jasper R.; Bouma, Henri; Baan, Jan; Burghouts, Gertjan J.; Eendebak, Pieter T.; den Hollander, Richard J. M.; Dijk, Judith; van Rest, Jeroen H.
Automatic detection of abnormal behavior in CCTV cameras is important to improve the security in crowded environments, such as shopping malls, airports and railway stations. This behavior can be characterized at different time scales, e.g., by small-scale subtle and obvious actions or by large-scale walking patterns and interactions between people. For example, pickpocketing can be recognized by the actual snatch (small scale), when he follows the victim, or when he interacts with an accomplice before and after the incident (longer time scale). This paper focusses on event recognition by detecting large-scale track-based patterns. Our event recognition method consists of several steps: pedestrian detection, object tracking, track-based feature computation and rule-based event classification. In the experiment, we focused on single track actions (walk, run, loiter, stop, turn) and track interactions (pass, meet, merge, split). The experiment includes a controlled setup, where 10 actors perform these actions. The method is also applied to all tracks that are generated in a crowded shopping mall in a selected time frame. The results show that most of the actions can be detected reliably (on average 90%) at a low false positive rate (1.1%), and that the interactions obtain lower detection rates (70% at 0.3% FP). This method may become one of the components that assists operators to find threatening behavior and enrich the selection of videos that are to be observed.
Vargas, Iliana M; Voss, Joel L; Paller, Ken A
In some circumstances, accurate recognition of repeated images in an explicit memory test is driven by implicit memory. We propose that this "implicit recognition" results from perceptual fluency that influences responding without awareness of memory retrieval. Here we examined whether recognition would vary if images appeared in the same or different visual hemifield during learning and testing. Kaleidoscope images were briefly presented left or right of fixation during divided-attention encoding. Presentation in the same visual hemifield at test produced higher recognition accuracy than presentation in the opposite visual hemifield, but only for guess responses. These correct guesses likely reflect a contribution from implicit recognition, given that when the stimulated visual hemifield was the same at study and test, recognition accuracy was higher for guess responses than for responses with any level of confidence. The dramatic difference in guessing accuracy as a function of lateralized perceptual overlap between study and test suggests that implicit recognition arises from memory storage in visual cortical networks that mediate repetition-induced fluency increments.
Wei-Jong Yang; Wei-Hau Du; Pau-Choo Chang; Jar-Ferr Yang; Pi-Hsia Hung
The demands of smart visual thing recognition in various devices have been increased rapidly for daily smart production, living and learning systems in recent years. This paper proposed a visual thing recognition system, which combines binary scale-invariant feature transform (SIFT), bag of words model (BoW), and support vector machine (SVM) by using color information. Since the traditional SIFT features and SVM classifiers only use the gray information, color information is still an importan...
Ribaric, Slobodan; Fratric, Ivan
This paper presents a multimodal biometric identification system based on the features of the human hand. We describe a new biometric approach to personal identification using eigenfinger and eigenpalm features, with fusion applied at the matching-score level. The identification process can be divided into the following phases: capturing the image; preprocessing; extracting and normalizing the palm and strip-like finger subimages; extracting the eigenpalm and eigenfinger features based on the K-L transform; matching and fusion; and, finally, a decision based on the (k, l)-NN classifier and thresholding. The system was tested on a database of 237 people (1,820 hand images). The experimental results showed the effectiveness of the system in terms of the recognition rate (100 percent), the equal error rate (EER = 0.58 percent), and the total error rate (TER = 0.72 percent).
Nikisins, Olegs; Nasrollahi, Kamal; Greitans, Modris
Facial images are of critical importance in many real-world applications from gaming to surveillance. The current literature on facial image analysis, from face detection to face and facial expression recognition, are mainly performed in either RGB, Depth (D), or both of these modalities. But......, such analyzes have rarely included Thermal (T) modality. This paper paves the way for performing such facial analyzes using synchronized RGB-D-T facial images by introducing a database of 51 persons including facial images of different rotations, illuminations, and expressions. Furthermore, a face recognition...... algorithm has been developed to use these images. The experimental results show that face recognition using such three modalities provides better results compared to face recognition in any of such modalities in most of the cases....
Oliver, Lindsay D; Virani, Karim; Finger, Elizabeth C; Mitchell, Derek G V
Frontotemporal dementia (FTD) is a debilitating neurodegenerative disorder characterized by severely impaired social and emotional behaviour, including emotion recognition deficits. Though fear recognition impairments seen in particular neurological and developmental disorders can be ameliorated by reallocating attention to critical facial features, the possibility that similar benefits can be conferred to patients with FTD has yet to be explored. In the current study, we examined the impact of presenting distinct regions of the face (whole face, eyes-only, and eyes-removed) on the ability to recognize expressions of anger, fear, disgust, and happiness in 24 patients with FTD and 24 healthy controls. A recognition deficit was demonstrated across emotions by patients with FTD relative to controls. Crucially, removal of diagnostic facial features resulted in an appropriate decline in performance for both groups; furthermore, patients with FTD demonstrated a lack of disproportionate improvement in emotion recognition accuracy as a result of isolating critical facial features relative to controls. Thus, unlike some neurological and developmental disorders featuring amygdala dysfunction, the emotion recognition deficit observed in FTD is not likely driven by selective inattention to critical facial features. Patients with FTD also mislabelled negative facial expressions as happy more often than controls, providing further evidence for abnormalities in the representation of positive affect in FTD. This work suggests that the emotional expression recognition deficit associated with FTD is unlikely to be rectified by adjusting selective attention to diagnostic features, as has proven useful in other select disorders. Copyright © 2014 Elsevier Ltd. All rights reserved.
The paper describes a knowledge-based approach for the recognition of PSL strokes. Information about location and the direction of the starting point and ﬁnal point of strokes are considered the knowledge base for recognition of strokes. The work comprises preprocessing, determination of starting and ﬁnal points, ...
Guo, Baobao; Wu, Qiang; Sun, Jiande; Yan, Hua
Eye movement is a new kind of feature for biometrical recognition, it has many advantages compared with other features such as fingerprint, face, and iris. It is not only a sort of static characteristics, but also a combination of brain activity and muscle behavior, which makes it effective to prevent spoofing attack. In addition, eye movements can be incorporated with faces, iris and other features recorded from the face region into multimode systems. In this paper, we do an exploring study on eye movement identification based on the eye movement datasets provided by Komogortsev et al. in 2011 with different classification methods. The time of saccade and fixation are extracted from the eye movement data as the eye movement features. Furthermore, the performance analysis was conducted on different classification methods such as the BP, RBF, ELMAN and SVM in order to provide a reference to the future research in this field.
Vargas, Iliana M.; Voss, Joel L.; Paller, Ken A.
In some circumstances, accurate recognition of repeated images in an explicit memory test is driven by implicit memory. We propose that this “implicit recognition” results from perceptual fluency that influences responding without awareness of memory retrieval. Here we examined whether recognition would vary if images appeared in the same or different visual hemifield during learning and testing. Kaleidoscope images were briefly presented left or right of fixation during divided-attention enc...
Dat Tien Nguyen
Full Text Available With higher demand from users, surveillance systems are currently being designed to provide more information about the observed scene, such as the appearance of objects, types of objects, and other information extracted from detected objects. Although the recognition of gender of an observed human can be easily performed using human perception, it remains a difficult task when using computer vision system images. In this paper, we propose a new human gender recognition method that can be applied to surveillance systems based on quality assessment of human areas in visible light and thermal camera images. Our research is novel in the following two ways: First, we utilize the combination of visible light and thermal images of the human body for a recognition task based on quality assessment. We propose a quality measurement method to assess the quality of image regions so as to remove the effects of background regions in the recognition system. Second, by combining the features extracted using the histogram of oriented gradient (HOG method and the measured qualities of image regions, we form a new image features, called the weighted HOG (wHOG, which is used for efficient gender recognition. Experimental results show that our method produces more accurate estimation results than the state-of-the-art recognition method that uses human body images.
Nguyen, Dat Tien; Park, Kang Ryoung
With higher demand from users, surveillance systems are currently being designed to provide more information about the observed scene, such as the appearance of objects, types of objects, and other information extracted from detected objects. Although the recognition of gender of an observed human can be easily performed using human perception, it remains a difficult task when using computer vision system images. In this paper, we propose a new human gender recognition method that can be applied to surveillance systems based on quality assessment of human areas in visible light and thermal camera images. Our research is novel in the following two ways: First, we utilize the combination of visible light and thermal images of the human body for a recognition task based on quality assessment. We propose a quality measurement method to assess the quality of image regions so as to remove the effects of background regions in the recognition system. Second, by combining the features extracted using the histogram of oriented gradient (HOG) method and the measured qualities of image regions, we form a new image features, called the weighted HOG (wHOG), which is used for efficient gender recognition. Experimental results show that our method produces more accurate estimation results than the state-of-the-art recognition method that uses human body images.
Iliana M. Vargas
Full Text Available In some circumstances, accurate recognition of repeated images in an explicit memory test is driven by implicit memory. We propose that this “implicit recognition” results from perceptual fluency that influences responding without awareness of memory retrieval. Here we examined whether recognition would vary if images appeared in the same or different visual hemifield during learning and testing. Kaleidoscope images were briefly presented left or right of fixation during divided-attention encoding. Presentation in the same visual hemifield at test produced higher recognition accuracy than presentation in the opposite visual hemifield, but only for guess responses. These correct guesses likely reflect a contribution from implicit recognition, given that when the stimulated visual hemifield was the same at study and test, recognition accuracy was higher for guess responses than for responses with any level of confidence. The dramatic difference in guessing accuracy as a function of lateralized perceptual overlap between study and test suggests that implicit recognition arises from memory storage in visual cortical networks that mediate repetition-induced fluency increments.
Full Text Available We proposed a face recognition algorithm based on both the multilinear principal component analysis (MPCA and linear discriminant analysis (LDA. Compared with current traditional existing face recognition methods, our approach treats face images as multidimensional tensor in order to find the optimal tensor subspace for accomplishing dimension reduction. The LDA is used to project samples to a new discriminant feature space, while the K nearest neighbor (KNN is adopted for sample set classification. The results of our study and the developed algorithm are validated with face databases ORL, FERET, and YALE and compared with PCA, MPCA, and PCA + LDA methods, which demonstrates an improvement in face recognition accuracy.
Chen, Guangyi; Xie, Wenfang
Moment invariants have received a lot of attention as features for identification and inspection of two-dimensional shapes. In this paper, two sets of novel moments are proposed by using the auto-correlation of wavelet functions and the dual-tree complex wavelet functions. It is well known that the wavelet transform lacks the property of shift invariance. A little shift in the input signal will cause very different output wavelet coefficients. The autocorrelation of wavelet functions and the dual-tree complex wavelet functions, on the other hand, are shift-invariant, which is very important in pattern recognition. Rotation invariance is the major concern in this paper, while translation invariance and scale invariance can be achieved by standard normalization techniques. The Gaussian white noise is added to the noise-free images and the noise levels vary with different signal-to-noise ratios. Experimental results conducted in this paper show that the proposed wavelet-based moments outperform Zernike's moments and the Fourier-wavelet descriptor for pattern recognition under different rotation angles and different noise levels. It can be seen that the proposed wavelet-based moments can do an excellent job even when the noise levels are very high.
Gao, Jun; Bao, Jie; Chen, Dingguo; Yang, Youqing; Yang, Xuedong
This paper presents a novel optical-electronic shape recognition system based on synergetic associative memory. Our shape recognition system is composed of two parts: the first one is feature extraction system; the second is synergetic pattern recognition system. Hough transform is proposed for feature extraction of unrecognized object, with the effects of reducing dimensions and filtering for object distortion and noise, synergetic neural network is proposed for realizing associative memory in order to eliminate spurious states. Then we adopt an approach of optical- electronic realization to our system that can satisfy the demands of real time, high speed and parallelism. In order to realize fast algorithm, we replace the dynamic evolution circuit with adjudge circuit according to the relationship between attention parameters and order parameters, then implement the recognition of some simple images and its validity is proved.
Ye, Xianyi; Min, Feng
The manual feature extraction of the traditional method for vehicle license plates has no good robustness to change in diversity. And the high feature dimension that is extracted with Principal Component Analysis Network (PCANet) leads to low classification efficiency. For solving these problems, a method of vehicle license plate recognition based on PCANet and compressive sensing is proposed. First, PCANet is used to extract the feature from the images of characters. And then, the sparse measurement matrix which is a very sparse matrix and consistent with Restricted Isometry Property (RIP) condition of the compressed sensing is used to reduce the dimensions of extracted features. Finally, the Support Vector Machine (SVM) is used to train and recognize the features whose dimension has been reduced. Experimental results demonstrate that the proposed method has better performance than Convolutional Neural Network (CNN) in the recognition and time. Compared with no compression sensing, the proposed method has lower feature dimension for the increase of efficiency.
Full Text Available In this paper, a noble nonintrusive three-dimensional (3D face modeling system for random-profile-based 3D face recognition is presented. Although recent two-dimensional (2D face recognition systems can achieve a reliable recognition rate under certain conditions, their performance is limited by internal and external changes, such as illumination and pose variation. To address these issues, 3D face recognition, which uses 3D face data, has recently received much attention. However, the performance of 3D face recognition highly depends on the precision of acquired 3D face data, while also requiring more computational power and storage capacity than 2D face recognition systems. In this paper, we present a developed nonintrusive 3D face modeling system composed of a stereo vision system and an invisible near-infrared line laser, which can be directly applied to profile-based 3D face recognition. We further propose a novel random-profile-based 3D face recognition method that is memory-efficient and pose-invariant. The experimental results demonstrate that the reconstructed 3D face data consists of more than 50 k 3D point clouds and a reliable recognition rate against pose variation.
Full Text Available The consumer behavior has been observed to be largely influenced by image data with increasing familiarity of smart phones and World Wide Web. Traditional technique of browsing through product varieties in the Internet with text keywords has been gradually replaced by the easy accessible image data. The importance of image data has portrayed a steady growth in application orientation for business domain with the advent of different image capturing devices and social media. The paper has described a methodology of feature extraction by image binarization technique for enhancing identification and retrieval of information using content based image recognition. The proposed algorithm was tested on two public datasets, namely, Wang dataset and Oliva and Torralba (OT-Scene dataset with 3688 images on the whole. It has outclassed the state-of-the-art techniques in performance measure and has shown statistical significance.
Full Text Available Facial expression recognition have a wide range of applications in human-machine interaction, pattern recognition, image understanding, machine vision and other fields. Recent years, it has gradually become a hot research. However, different people have different ways of expressing their emotions, and under the influence of brightness, background and other factors, there are some difficulties in facial expression recognition. In this paper, based on the Inception-v3 model of TensorFlow platform, we use the transfer learning techniques to retrain facial expression dataset (The Extended Cohn-Kanade dataset, which can keep the accuracy of recognition and greatly reduce the training time.
Oliu Simon, Marc; Corneanu, Ciprian; Nasrollahi, Kamal
years. At the same time a multimodal facial recognition is a promising approach. This paper combines the latest successes in both directions by applying deep learning Convolutional Neural Networks (CNN) to the multimodal RGB-D-T based facial recognition problem outperforming previously published results......Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent...
diverse. In ants, social interactions are regulated by at least three levels of recognition. Nestmate recognition occurs between colonies, is very effective, and involves fast processing. Within a colony, division of labor is enhanced by recognition of different classes of individuals. Ultimately......, in particular circumstances, such as cooperative colony founding with stable dominance hierarchies, ants are capable of individual recognition. The underlying recognition cues and mechanisms appear to be specific to each recognition level, and their integrated understanding could contribute...
Dai, Jiangyan; Guo, Changlu; Zhou, Wei; Shi, Yanjiao; Cong, Lin; Yi, Yugen
In this paper, we present a Sub-pattern based Multi-manifold Discriminant Analysis (SpMMDA) algorithm for face recognition. Unlike existing Multi-manifold Discriminant Analysis (MMDA) approach which is based on holistic information of face image for recognition, SpMMDA operates on sub-images partitioned from the original face image and then extracts the discriminative local feature from the sub-images separately. Moreover, the structure information of different sub-images from the same face image is considered in the proposed method with the aim of further improve the recognition performance. Extensive experiments on three standard face databases (Extended YaleB, CMU PIE and AR) demonstrate that the proposed method is effective and outperforms some other sub-pattern based face recognition methods.
Full Text Available Face detection and recognition is the first step for many applications in various fields such as identification and is used as a key to enter into the various electronic devices, video surveillance, and human computer interface and image database management. This paper focuses on feature extraction in an image using Gabor filter and the extracted image feature vector is then given as an input to the neural network. The neural network is trained with the input data. The Gabor wavelet concentrates on the important components of the face including eye, mouth, nose, cheeks. The main requirement of this technique is the threshold, which gives privileged sensitivity. The threshold values are the feature vectors taken from the faces. These feature vectors are given into the feed forward neural network to train the network. Using the feed forward neural network as a classifier, the recognized and unrecognized faces are classified. This classifier attains a higher face deduction rate. By training more input vectors the system proves to be effective. The effectiveness of the proposed method is demonstrated by the experimental results.
Reyes-Galaviz, Orion Fausto; Reyes-García, Carlos Alberto
Data compression is always advisable when it comes to handling and processing information quickly and efficiently. There are two main problems that need to be solved when it comes to handling data; store information in smaller spaces and processes it in the shortest possible time. When it comes to infant cry analysis (ICA), there is always the need to construct large sound repositories from crying babies. Samples that have to be analyzed and be used to train and test pattern recognition algorithms; making this a time consuming task when working with uncompressed feature vectors. In this work, we show a simple, but efficient, method that uses Fuzzy Relational Product (FRP) to compresses the information inside a feature vector, building with this a compressed matrix that will help us recognize two kinds of pathologies in infants; Asphyxia and Deafness. We describe the sound analysis, which consists on the extraction of Mel Frequency Cepstral Coefficients that generate vectors which will later be compressed by using FRP. There is also a description of the infant cry database used in this work, along with the training and testing of a Time Delay Neural Network with the compressed features, which shows a performance of 96.44% with our proposed feature vector compression.
Coisy, C.; Belaid, A.
In this paper, a method for analytic handwritten word recognition based on causal Markov random fields is described. The words models are HMMs where each state corresponds to a letter; each letter is modelled by a NSHPHMM (Markov field). Global models are build dynamically, and used for recognition
Khalafi, Lida; Kashani, Samira; Karimi, Javad
A laboratory experiment is described in which students measure the amount of cetirizine in allergy-treatment tablets based on molecular recognition. The basis of recognition is competition of cetirizine with phenolphthalein to form an inclusion complex with ß-cyclodextrin. Phenolphthalein is pinkish under basic condition, whereas it's complex form…
Full Text Available Hand-based biometrics plays a significant role in establishing security for real-time environments involving human interaction and is found to be more successful in terms of high speed and accuracy. This paper investigates on an integrated approach for personal authentication using Finger Back Knuckle Surface (FBKS based on two methodologies viz., Angular Geometric Analysis based Feature Extraction Method (AGFEM and Contourlet Transform based Feature Extraction Method (CTFEM. Based on these methods, this personal authentication system simultaneously extracts shape oriented feature information and textural pattern information of FBKS for authenticating an individual. Furthermore, the proposed geometric and textural analysis methods extract feature information from both proximal phalanx and distal phalanx knuckle regions (FBKS, while the existing works of the literature concentrate only on the features of proximal phalanx knuckle region. The finger joint region found nearer to the tip of the finger is called distal phalanx region of FBKS, which is a unique feature and has greater potentiality toward identification. Extensive experiments conducted using newly created database with 5400 FBKS images and the obtained results infer that the integration of shape oriented features with texture feature information yields excellent accuracy rate of 99.12% with lowest equal error rate of 1.04%.
Full Text Available Personal recognition using palm-vein patterns has emerged as a promising alternative for human recognition because of its uniqueness, stability, live body identification, flexibility, and difficulty to cheat. With the expanding application of palm-vein pattern recognition, the corresponding growth of the database has resulted in a long response time. To shorten the response time of identification, this paper proposes a simple and useful classification for palm-vein identification based on principal direction features. In the registration process, the Gaussian-Radon transform is adopted to extract the orientation matrix and then compute the principal direction of a palm-vein image based on the orientation matrix. The database can be classified into six bins based on the value of the principal direction. In the identification process, the principal direction of the test sample is first extracted to ascertain the corresponding bin. One-by-one matching with the training samples is then performed in the bin. To improve recognition efficiency while maintaining better recognition accuracy, two neighborhood bins of the corresponding bin are continuously searched to identify the input palm-vein image. Evaluation experiments are conducted on three different databases, namely, PolyU, CASIA, and the database of this study. Experimental results show that the searching range of one test sample in PolyU, CASIA and our database by the proposed method for palm-vein identification can be reduced to 14.29%, 14.50%, and 14.28%, with retrieval accuracy of 96.67%, 96.00%, and 97.71%, respectively. With 10,000 training samples in the database, the execution time of the identification process by the traditional method is 18.56 s, while that by the proposed approach is 3.16 s. The experimental results confirm that the proposed approach is more efficient than the traditional method, especially for a large database.
Smith, Carolynn L; Taubert, Jessica; Weldon, Kimberly; Evans, Christopher S
Correctly directing social behaviour towards a specific individual requires an ability to discriminate between conspecifics. The mechanisms of individual recognition include phenotype matching and familiarity-based recognition. Communication-based recognition is a subset of familiarity-based recognition wherein the classification is based on behavioural or distinctive signalling properties. Male fowl (Gallus gallus) produce a visual display (tidbitting) upon finding food in the presence of a female. Females typically approach displaying males. However, males may tidbit without food. We used the distinctiveness of the visual display and the unreliability of some males to test for communication-based recognition in female fowl. We manipulated the prior experience of the hens with the males to create two classes of males: S(+) wherein the tidbitting signal was paired with a food reward to the female, and S (-) wherein the tidbitting signal occurred without food reward. We then conducted a sequential discrimination test with hens using a live video feed of a familiar male. The results of the discrimination tests revealed that hens discriminated between categories of males based on their signalling behaviour. These results suggest that fowl possess a communication-based recognition system. This is the first demonstration of live-to-video transfer of recognition in any species of bird. Copyright © 2016 Elsevier B.V. All rights reserved.
Zawistowski, Jacek; Kurzejamski, Grzegorz; Garbat, Piotr; Naruniec, Jacek
This paper presents a system designed for the multi-object detection purposes and adjusted for the application of product search on the market shelves. System uses well known binary keypoint detection algorithms for finding characteristic points in the image. One of the main idea is object recognition based on Implicit Shape Model method. Authors of the article proposed many improvements of the algorithm. Originally fiducial points are matched with a very simple function. This leads to the limitations in the number of objects parts being success- fully separated, while various methods of classification may be validated in order to achieve higher performance. Such an extension implies research on training procedure able to deal with many objects categories. Proposed solution opens a new possibilities for many algorithms demanding fast and robust multi-object recognition.
Ai, Qingsong; Liu, Quan; Lu, Ying; Yuan, Tingting
Wavelet analysis is a time–frequency, non-stationary method while the largest Lyapunov exponent (LLE) is used to judge the non-linear characteristic of systems. Because surface electromyography signal (SEMGS) is a complex signal that is characterized by non-stationary and non-linear properties. This paper combines wavelet coefficient and LLE together as the new feature of SEMGS. The proposed method not only reflects the non-stationary and non-linear characteristics of SEMGS, but also is suitable for its classification. Then, the BP (back propagation) neural network is employed to implement the identification of six gestures (fist clench, fist extension, wrist extension, wrist flexion, radial deviation, ulnar deviation). The experimental results indicate that based on the proposed method, the identification of these six gestures can reach an average rate of 97.71 %.
Full Text Available In this paper a tracking algorithm for SIFT features in image sequences is developed. For each point feature extracted using SIFT algorithm a descriptor is computed using information from its neighborhood. Using an algorithm based on minimizing the distance between two descriptors tracking point features throughout image sequences is engaged. Experimental results, obtained from image sequences that capture scaling of different geometrical type object, reveal the performances of the tracking algorithm.
Ahmad, Foysal; Roy, Kaushik
A commercially available iris recognition system uses only a narrow band of the near infrared spectrum (700-900 nm) while iris images captured in the wide range of 405 nm to 1550 nm offer potential benefits to enhance recognition performance of an iris biometric system. The novelty of this research is that a group selection algorithm based on coalition game theory is explored to select the best patch subsets. In this algorithm, patches are divided into several groups based on their maximum contribution in different groups. Shapley values are used to evaluate the contribution of patches in different groups. Results show that this group selection based iris recognition
Stentiford, F W
An automatic evolutionary search is applied to the problem of feature extraction in an OCR application. A performance measure based on feature independence is used to generate features which do not appear to suffer from peaking effects . Features are extracted from a training set of 30 600 machine printed 34 class alphanumeric characters derived from British mail. Classification results on the training set and a test set of 10 200 characters are reported for an increasing number of features. A 1.01 percent forced decision error rate is obtained on the test data using 316 features. The hardware implementation should be cheap and fast to operate. The performance compares favorably with current low cost OCR page readers.
Brulin, Damien; Benezeth, Yannick; Courtial, Estelle
We propose in this paper a computer vision-based posture recognition method for home monitoring of the elderly. The proposed system performs human detection prior to the posture analysis; posture recognition is performed only on a human silhouette. The human detection approach has been designed to be robust to different environmental stimuli. Thus, posture is analyzed with simple and efficient features that are not designed to manage constraints related to the environment but only designed to describe human silhouettes. The posture recognition method, based on fuzzy logic, identifies four static postures and is robust to variation in the distance between the camera and the person, and to the person's morphology. With an accuracy of 74.29% of satisfactory posture recognition, this approach can detect emergency situations such as a fall within a health smart home.
Full Text Available Automatic target recognition (ATR in synthetic aperture radar (SAR images plays an important role in both national defense and civil applications. Although many methods have been proposed, SAR ATR is still very challenging due to the complex application environment. Feature extraction and classification are key points in SAR ATR. In this paper, we first design a novel feature, which is a histogram of oriented gradients (HOG-like feature for SAR ATR (called SAR-HOG. Then, we propose a supervised discriminative dictionary learning (SDDL method to learn a discriminative dictionary for SAR ATR and propose a strategy to simplify the optimization problem. Finally, we propose a SAR ATR classifier based on SDDL and sparse representation (called SDDLSR, in which both the reconstruction error and the classification error are considered. Extensive experiments are performed on the MSTAR database under standard operating conditions and extended operating conditions. The experimental results show that SAR-HOG can reliably capture the structures of targets in SAR images, and SDDL can further capture subtle differences among the different classes. By virtue of the SAR-HOG feature and SDDLSR, the proposed method achieves the state-of-the-art performance on MSTAR database. Especially for the extended operating conditions (EOC scenario “Training 17 ∘ —Testing 45 ∘ ”, the proposed method improves remarkably with respect to the previous works.
Full Text Available This paper addresses the problems arising from the use of data acquired with two different remote sensing techniques—high-resolution satellite imagery (HRSI and terrestrial laser scanning (TLS—for the extraction of digital elevation models (DEMs used in the geomorphological analysis and recognition of landslides, taking into account the uncertainties associated with DEM production. In order to obtain a georeferenced and edited point cloud, the two data sets require quite different processes, which are more complex for satellite images than for TLS data. The differences between the two processes are highlighted. The point clouds are interpolated on a DEM with a 1 m grid size using kriging. Starting from these DEMs, a number of contour, slope, and aspect maps are extracted, together with their associated uncertainty maps. Comparative analysis of selected landslide features drawn from the two data sources allows recognition and classification of hierarchical and multiscale landslide components. Taking into account the uncertainty related to the map enables areas to be located for which one data source was able to give more reliable results than another. Our case study is located in Southern Italy, in an area known for active landslides.
Tu, Zhigang; Xie, Wei; Qin, Qianqing; Poppe, R.W.; Veltkamp, R.C.; Li, Baoxin; Yuan, Junsong
The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We
Zhou, Hongjun; Chen, Pei; Shen, Wei
In this research, we present a framework for multi-view face detect and recognition system based on cascade face detector and improved Dlib. This method is aimed to solve the problems of low efficiency and low accuracy in multi-view face recognition, to build a multi-view face recognition system, and to discover a suitable monitoring scheme. For face detection, the cascade face detector is used to extracted the Haar-like feature from the training samples, and Haar-like feature is used to train a cascade classifier by combining Adaboost algorithm. Next, for face recognition, we proposed an improved distance model based on Dlib to improve the accuracy of multiview face recognition. Furthermore, we applied this proposed method into recognizing face images taken from different viewing directions, including horizontal view, overlooks view, and looking-up view, and researched a suitable monitoring scheme. This method works well for multi-view face recognition, and it is also simulated and tested, showing satisfactory experimental results.
Chien, Sung Il; Kim, In Chul; Baek, Yung Mok; Kim, Dong Su; Jeong, Jee Won; Shin, Kug [Kyungpook National University, Taegu (Korea)
In this research, we defined a gesture set needed for remote monitoring and control of a manless system in atomic power station environments. Here, we define a command as the loci of a gesture. We aim at the development of an algorithm using a vision sensor and glove sensors in order to implement the gesture recognition system. The gesture recognition system based on computer vision tracks a hand by using cross correlation of PDOE image. To recognize the gesture word, the 8 direction code is employed as the input symbol for discrete HMM. Another gesture recognition based on sensor has introduced Pinch glove and Polhemus sensor as an input device. The extracted feature through preprocessing now acts as an input signal of the recognizer. For recognition 3D loci of Polhemus sensor, discrete HMM is also adopted. The alternative approach of two foregoing recognition systems uses the vision and and glove sensors together. The extracted mesh feature and 8 direction code from the locus tracking are introduced for further enhancing recognition performance. MLP trained by backpropagation is introduced here and its performance is compared to that of discrete HMM. (author). 32 refs., 44 figs., 21 tabs.
Biederman, I; Kalocsai, P
A number of behavioural phenomena distinguish the recognition of faces and objects, even when members of a set of objects are highly similar. Because faces have the same parts in approximately the same relations, individuation of faces typically requires specification of the metric variation in a holistic and integral representation of the facial surface. The direct mapping of a hypercolumn-like pattern of activation onto a representation layer that preserves relative spatial filter values in...
Reyes Ortiz, Jorge Luis
Cotutela Universitat Politècnica de Catalunya i Università degli Studi di Genova Human Activity Recognition (HAR) is a multidisciplinary research field that aims to gather data regarding people's behavior and their interaction with the environment in order to deliver valuable context-aware information. It has nowadays contributed to develop human-centered areas of study such as Ambient Intelligence and Ambient Assisted Living, which concentrate on the improvement of people's Quality of Lif...
Kramer, K M; Hedin, D S; Rolkosky, D J
The inability to identify people during group meetings is a disadvantage for blind people in many professional and educational situations. To explore the efficacy of face recognition using smartphones in these settings, we have prototyped and tested a face recognition tool for blind users. The tool utilizes Smartphone technology in conjunction with a wireless network to provide audio feedback of the people in front of the blind user. Testing indicated that the face recognition technology can tolerate up to a 40 degree angle between the direction a person is looking and the camera's axis and a 96% success rate with no false positives. Future work will be done to further develop the technology for local face recognition on the smartphone in addition to remote server based face recognition.
Aradhana D.; Girish H.; Karibasappa K.; Reddy A. Chennakeshava
This paper presents a new method for feature extraction from the facial image by using bunch graph method. These extracted geometric features of the face are used subsequently for face recognition by utilizing the group based adaptive neural network. This method is suitable, when the facial images are rotation and translation invariant. Further the technique also free from size invariance of facial image and is capable of identifying the facial images correctly when corrupted w...
Yang Hong; Jia Weimin
Many features are extracted to improve the identified rate and reliability of underground nuclear explosion and natural earthquake. But how to synthesize these characters is the key of pattern recognition. Based on the improved Delta algorithm, features of underground nuclear explosion and natural earthquake are inputted into BP neural network, and friendship functions are constructed to identify the output values. The identified rate is up to 92.0%, which shows that: the way is feasible
Christiansen, Thomas Ulrich; Greenberg, Steven
The perceptual basis of consonant recognition was experimentally investigated through a study of how information associated with phonetic features (Voicing, Manner, and Place of Articulation) combines across the acoustic-frequency spectrum. The speech signals, 11 Danish consonants embedded...... in Consonant + Vowel + Liquid syllables, were partitioned into 3/4-octave bands (“slits”) centered at 750 Hz, 1500 Hz, and 3000 Hz, and presented individually and in two- or three-slit combinations. The amount of information transmitted (IT) was calculated from consonant- confusion matrices for each feature...... the bands are essentially independent in terms of decoding this feature. Because consonant recognition and Place decoding are highly correlated (correlation coefficient r2 = 0.99), these results imply that the auditory processes underlying consonant recognition are not strictly linear. This may account...
Kalliatakis, Grigorios; Triantafyllidis, Georgios
This article presents an image-based application aiming at simple image classification of well-known monuments in the area of Heraklion, Crete, Greece. This classification takes place by utilizing Graph Based Visual Saliency (GBVS) and employing Scale Invariant Feature Transform (SIFT) or Speeded......, the images have been previously processed according to the Graph Based Visual Saliency model in order to keep either SIFT or SURF features corresponding to the actual monuments while the background “noise” is minimized. The application is then able to classify these images, helping the user to better...
Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.
Full Text Available In this paper, we propose an effective human and nonhuman pyroelectric infrared (PIR signal recognition method to reduce PIR detector false alarms. First, using the mathematical model of the PIR detector, we analyze the physical characteristics of the human and nonhuman PIR signals; second, based on the analysis results, we propose an empirical mode decomposition (EMD-based symbolic dynamic analysis method for the recognition of human and nonhuman PIR signals. In the proposed method, first, we extract the detailed features of a PIR signal into five symbol sequences using an EMD-based symbolization method, then, we generate five feature descriptors for each PIR signal through constructing five probabilistic finite state automata with the symbol sequences. Finally, we use a weighted voting classification strategy to classify the PIR signals with their feature descriptors. Comparative experiments show that the proposed method can effectively classify the human and nonhuman PIR signals and reduce PIR detector’s false alarms.
Zhao, Jiaduo; Gong, Weiguo; Tang, Yuzhen; Li, Weihong
In this paper, we propose an effective human and nonhuman pyroelectric infrared (PIR) signal recognition method to reduce PIR detector false alarms. First, using the mathematical model of the PIR detector, we analyze the physical characteristics of the human and nonhuman PIR signals; second, based on the analysis results, we propose an empirical mode decomposition (EMD)-based symbolic dynamic analysis method for the recognition of human and nonhuman PIR signals. In the proposed method, first, we extract the detailed features of a PIR signal into five symbol sequences using an EMD-based symbolization method, then, we generate five feature descriptors for each PIR signal through constructing five probabilistic finite state automata with the symbol sequences. Finally, we use a weighted voting classification strategy to classify the PIR signals with their feature descriptors. Comparative experiments show that the proposed method can effectively classify the human and nonhuman PIR signals and reduce PIR detector's false alarms.
Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.
The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.
Zhang, Qiang; Li, Jiafeng; Zhuo, Li; Zhang, Hui; Li, Xiaoguang
Color is one of the most stable attributes of vehicles and often used as a valuable cue in some important applications. Various complex environmental factors, such as illumination, weather, noise and etc., result in the visual characteristics of the vehicle color being obvious diversity. Vehicle color recognition in complex environments has been a challenging task. The state-of-the-arts methods roughly take the whole image for color recognition, but many parts of the images such as car windows; wheels and background contain no color information, which will have negative impact on the recognition accuracy. In this paper, a novel vehicle color recognition method using local vehicle-color saliency detection and dual-orientational dimensionality reduction of convolutional neural network (CNN) deep features has been proposed. The novelty of the proposed method includes two parts: (1) a local vehicle-color saliency detection method has been proposed to determine the vehicle color region of the vehicle image and exclude the influence of non-color regions on the recognition accuracy; (2) dual-orientational dimensionality reduction strategy has been designed to greatly reduce the dimensionality of deep features that are learnt from CNN, which will greatly mitigate the storage and computational burden of the subsequent processing, while improving the recognition accuracy. Furthermore, linear support vector machine is adopted as the classifier to train the dimensionality reduced features to obtain the recognition model. The experimental results on public dataset demonstrate that the proposed method can achieve superior recognition performance over the state-of-the-arts methods.
Liu, Tong; Han, Li; Ma, Liangkai; Guo, Dongwei
As the rapid development of multimedia networking, more and more songs are issued through the Internet and stored in large digital music libraries. However, music information retrieval on these libraries can be really hard, and the recognition of musical emotion is especially challenging. In this paper, we report a strategy to recognize the emotion contained in songs by classifying their spectrograms, which contain both the time and frequency information, with a convolutional neural network (CNN). The experiments conducted on the l000-song dataset indicate that the proposed model outperforms traditional machine learning method.
Ryals, Anthony J.; Cleary, Anne M.
Among cues that fail to elicit successful recall, participants can still discriminate between cues that do and do not resemble studied items. This ability is referred to as recognition without cued recall (RWCR). We hypothesized that whereas recognition with cued recall is at least partly based on recalled studied information, RWCR results from a…
In this article, I shall examine the cognitive, heuristic and theoretical functions of the concept of recognition. To evaluate both the explanatory power and the limitations of a sociological concept, the theory construction must be analysed and its actual productivity for sociological theory mus...
Full Text Available Automatic extraction of time-frequency spectral image of mechanical faults can be achieved and faults can be identified consequently when rotating machinery spectral image processing technology is applied to fault diagnosis, which is an advantage. Acquired mechanical vibration signals can be converted into color time-frequency spectrum images by the processing of pseudo Wigner-Ville distribution. Then a feature extraction method based on quaternion invariant moment was proposed, combining image processing technology and multiweight neural network technology. The paper adopted quaternion invariant moment feature extraction method and gray level-gradient cooccurrence matrix feature extraction method and combined them with geometric learning algorithm and probabilistic neural network algorithm, respectively, and compared the recognition rates of rolling bearing faults. The experimental results show that the recognition rates of quaternion invariant moment are higher than gray level-gradient cooccurrence matrix in the same recognition method. The recognition rates of geometric learning algorithm are higher than probabilistic neural network algorithm in the same feature extraction method. So the method based on quaternion invariant moment geometric learning and multiweight neural network is superior. What is more, this algorithm has preferable generalization performance under the condition of fewer samples, and it has practical value and acceptation on the field of fault diagnosis for rotating machinery as well.
Sengupta, Sailes Kumar
An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes
Zheng, Bo; Li, Yan-Feng; Huang, Hong-Zhong
For the recognition principle based optimized single center, one important issue is that the data with nonlinear separatrix cannot be recognized accurately. In order to solve this problem, a novel recognition strategy based on adaptive optimized multiple centers is proposed in this paper. This strategy recognizes the data sets with nonlinear separatrix by the multiple centers. Meanwhile, the priority levels are introduced into the multi-objective optimization, including recognition accuracy, the quantity of optimized centers, and distance relationship. According to the characteristics of various data, the priority levels are adjusted to ensure the quantity of optimized centers adaptively and to keep the original accuracy. The proposed method is compared with other methods, including support vector machine (SVM), neural network, and Bayesian classifier. The results demonstrate that the proposed strategy has the same or even better recognition ability on different distribution characteristics of data.
Bo, G.; Caviglia, D.; Valle, M.
In this paper a real time Optical Character Recognition system is presented: it is based on a feature extraction module and a neural network classifier which have been designed and fabricated in analog VLSI technology. Experimental results validate the circuit functionality. The results obtained from a validation based on a mixed approach (i.e., an approach based on both experimental and simulation results) confirm the soundness and reliability of the system
Bo, G.; Caviglia, D.; Valle, M. [Genoa Univ. (Italy). Dip. of Biophysical and Electronic Engineering
In this paper a real time Optical Character Recognition system is presented: it is based on a feature extraction module and a neural network classifier which have been designed and fabricated in analog VLSI technology. Experimental results validate the circuit functionality. The results obtained from a validation based on a mixed approach (i.e., an approach based on both experimental and simulation results) confirm the soundness and reliability of the system.
Full Text Available An activity recognition system essentially processes raw sensor data and maps them into latent activity classes. Most of the previous systems are built with supervised learning techniques and pre-defined data sources, and result in static models. However, in realistic and dynamic environments, original data sources may fail and new data sources become available, a robust activity recognition system should be able to perform evolution automatically with dynamic sensor availability in dynamic environments. In this paper, we propose methods that automatically incorporate dynamically available data sources to adapt and refine the recognition system at run-time. The system is built upon ensemble classifiers which can automatically choose the features with the most discriminative power. Extensive experimental results with publicly available datasets demonstrate the effectiveness of our methods.
Petridis, Stavros; Nijholt, Antinus; Nijholt, A.; Pantic, M.; Pantic, Maja; Poel, Mannes; Poel, M.; Hondorp, G.H.W.
Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audiovisual approach to distinguishing laughter from speech based on temporal features and we show that the integration of audio and visual information leads to improved
Dat Tien Nguyen
Full Text Available Although face recognition systems have wide application, they are vulnerable to presentation attack samples (fake samples. Therefore, a presentation attack detection (PAD method is required to enhance the security level of face recognition systems. Most of the previously proposed PAD methods for face recognition systems have focused on using handcrafted image features, which are designed by expert knowledge of designers, such as Gabor filter, local binary pattern (LBP, local ternary pattern (LTP, and histogram of oriented gradients (HOG. As a result, the extracted features reflect limited aspects of the problem, yielding a detection accuracy that is low and varies with the characteristics of presentation attack face images. The deep learning method has been developed in the computer vision research community, which is proven to be suitable for automatically training a feature extractor that can be used to enhance the ability of handcrafted features. To overcome the limitations of previously proposed PAD methods, we propose a new PAD method that uses a combination of deep and handcrafted features extracted from the images by visible-light camera sensor. Our proposed method uses the convolutional neural network (CNN method to extract deep image features and the multi-level local binary pattern (MLBP method to extract skin detail features from face images to discriminate the real and presentation attack face images. By combining the two types of image features, we form a new type of image features, called hybrid features, which has stronger discrimination ability than single image features. Finally, we use the support vector machine (SVM method to classify the image features into real or presentation attack class. Our experimental results indicate that our proposed method outperforms previous PAD methods by yielding the smallest error rates on the same image databases.
Nguyen, Dat Tien; Pham, Tuyen Danh; Baek, Na Rae; Park, Kang Ryoung
Although face recognition systems have wide application, they are vulnerable to presentation attack samples (fake samples). Therefore, a presentation attack detection (PAD) method is required to enhance the security level of face recognition systems. Most of the previously proposed PAD methods for face recognition systems have focused on using handcrafted image features, which are designed by expert knowledge of designers, such as Gabor filter, local binary pattern (LBP), local ternary pattern (LTP), and histogram of oriented gradients (HOG). As a result, the extracted features reflect limited aspects of the problem, yielding a detection accuracy that is low and varies with the characteristics of presentation attack face images. The deep learning method has been developed in the computer vision research community, which is proven to be suitable for automatically training a feature extractor that can be used to enhance the ability of handcrafted features. To overcome the limitations of previously proposed PAD methods, we propose a new PAD method that uses a combination of deep and handcrafted features extracted from the images by visible-light camera sensor. Our proposed method uses the convolutional neural network (CNN) method to extract deep image features and the multi-level local binary pattern (MLBP) method to extract skin detail features from face images to discriminate the real and presentation attack face images. By combining the two types of image features, we form a new type of image features, called hybrid features, which has stronger discrimination ability than single image features. Finally, we use the support vector machine (SVM) method to classify the image features into real or presentation attack class. Our experimental results indicate that our proposed method outperforms previous PAD methods by yielding the smallest error rates on the same image databases.
Nguyen, Dat Tien; Pham, Tuyen Danh; Baek, Na Rae; Park, Kang Ryoung
Although face recognition systems have wide application, they are vulnerable to presentation attack samples (fake samples). Therefore, a presentation attack detection (PAD) method is required to enhance the security level of face recognition systems. Most of the previously proposed PAD methods for face recognition systems have focused on using handcrafted image features, which are designed by expert knowledge of designers, such as Gabor filter, local binary pattern (LBP), local ternary pattern (LTP), and histogram of oriented gradients (HOG). As a result, the extracted features reflect limited aspects of the problem, yielding a detection accuracy that is low and varies with the characteristics of presentation attack face images. The deep learning method has been developed in the computer vision research community, which is proven to be suitable for automatically training a feature extractor that can be used to enhance the ability of handcrafted features. To overcome the limitations of previously proposed PAD methods, we propose a new PAD method that uses a combination of deep and handcrafted features extracted from the images by visible-light camera sensor. Our proposed method uses the convolutional neural network (CNN) method to extract deep image features and the multi-level local binary pattern (MLBP) method to extract skin detail features from face images to discriminate the real and presentation attack face images. By combining the two types of image features, we form a new type of image features, called hybrid features, which has stronger discrimination ability than single image features. Finally, we use the support vector machine (SVM) method to classify the image features into real or presentation attack class. Our experimental results indicate that our proposed method outperforms previous PAD methods by yielding the smallest error rates on the same image databases. PMID:29495417
SONG Le; LIN Yuchi; HAO Liguo
Based on a comprehensive study of various algorithms, the automatic recognition of traditional ocular optical measuring instruments is realized. Taking a universal tools microscope (UTM) lens view image as an example, a 2-layer automatic recognition model for data reading is established after adopting a series of pre-processing algorithms. This model is an optimal combination of the correlation-based template matching method and a concurrent back propagation (BP) neural network. Multiple complementary feature extraction is used in generating the eigenvectors of the concurrent network. In order to improve fault-tolerance capacity, rotation invariant features based on Zernike moments are extracted from digit characters and a 4-dimensional group of the outline features is also obtained. Moreover, the operating time and reading accuracy can be adjusted dynamically by setting the threshold value. The experimental result indicates that the newly developed algorithm has optimal recognition precision and working speed. The average reading ratio can achieve 97.23%. The recognition method can automatically obtain the results of optical measuring instruments rapidly and stably without modifying their original structure, which meets the application requirements.
Tang, Jin; Luo, Jian; Tjahjadi, Tardi; Gao, Yan
This paper presents a method for modeling a 2.5-dimensional (2.5D) human body and extracting the gait features for identifying the human subject. To achieve view-invariant gait recognition, a multi-view synthesizing method based on point cloud registration (MVSM) to generate multi-view training galleries is proposed. The concept of a density and curvature-based Color Gait Curvature Image is introduced to map 2.5D data onto a 2D space to enable data dimension reduction by discrete cosine transform and 2D principle component analysis. Gait recognition is achieved via a 2.5D view-invariant gait recognition method based on point cloud registration. Experimental results on the in-house database captured by a Microsoft Kinect camera show a significant performance gain when using MVSM. PMID:24686727
Full Text Available This article deals with a recognition system using an algorithm based on the Principal Component Analysis (PCA technique. The recognition system consists only of a PC and an integrated video camera. The algorithm is developed in MATLAB language and calculates the eigenfaces considered as features of the face. The PCA technique is based on the matching between the facial test image and the training prototype vectors. The mathcing score between the facial test image and the training prototype vectors is calculated between their coefficient vectors. If the matching is high, we have the best recognition. The results of the algorithm based on the PCA technique are very good, even if the person looks from one side at the video camera.
Full Text Available Logo recognition is an important issue in document image, advertisement, and intelligent transportation. Although there are many approaches to study logos in these fields, logo recognition is an essential subprocess. Among the methods of logo recognition, the descriptor is very vital. The results of moments as powerful descriptors were not discussed before in terms of logo recognition. So it is unclear which moments are more appropriate to recognize which kind of logos. In this paper we find out the relations between logos with different transforms and moments, which moments are fit for logos with different transforms. The open datasets are employed from the University of Maryland. The comparisons based on moments are carried out from the aspects of logos with noise, and rotation, scaling, rotation and scaling.
Tickle, A J; Harvey, P K; Smith, J S; Wu, F
Object recognition is an image processing task of finding a given object in a selected image or video sequence. Object recognition can be divided into two areas: one of these is decision-theoretic and deals with patterns described by quantitative descriptors, for example such as length, area, shape and texture. With this Graphical User Interface Circuitry (GUIC) methodology employed here being relatively new for object recognition systems, the aim of this work is to identify if the developed circuitry can detect certain shapes or strings within the target image. A much smaller reference image feeds the preset data for identification, tests are conducted for both binary and greyscale and the additional mathematical morphology to highlight the area within the target image with the object(s) are located is also presented. This then provides proof that basic recognition methods are valid and would allow the progression to developing decision-theoretical and learning based approaches using GUICs for use in multidisciplinary tasks.
Full Text Available Iris recognition system is one of biometric based recognition/identification systems. Numerous techniques have been implemented to achieve a good recognition rate, including the ones based on Phase Only Correlation (POC. Significant and higher correlation peaks suggest that the system recognizes iris images of the same subject (person, while lower and unsignificant peaks correspond to recognition of those of difference subjects. Current POC methods have not investigated minimum iris point that can be used to achieve higher correlation peaks. This paper proposed a method that used only one-fourth of full normalized iris size to achieve higher (or at least the same recognition rate. Simulation on CASIA version 1.0 iris image database showed that averaged recognition rate of the proposed method achieved 67%, higher than that of using one-half (56% and full (53% iris point. Furthermore, all (100% POC peak values of the proposed method was higher than that of the method with full iris points.
Joao Ricardo Sato
Full Text Available Attention-Deficit/Hyperactivity Disorder is a neurodevelopmental disorder, being one of the most prevalent psychiatric disorders in childhood. The neural substrates associated with this condition, both from structural and functional perspectives, are not yet well established . Recent studies have highlighted the relevance of neuroimaging not only to provide a more solid understanding about the disorder but also for possible clinical support. The ADHD-200 Consortium organized the ADHD-200 global competition making publicly available, hundreds of structural magnetic resonance imaging (MRI and functional MRI (fMRI datasets of both ADHD patients and typically developing controls for research use. In the current study, we evaluate the predictive power of a set of three different feature extraction methods and 10 different pattern recognition methods. The features tested were regional homogeneity (ReHo, amplitude of low frequency fluctuations (ALFF and independent components analysis maps (RSN. Our findings suggest that the combination ALFF+ReHo maps contain relevant information to discriminate ADHD patients from typically developing controls, but with limited accuracy. All classifiers provided almost the same performance in this case. In addition, the combination ALFF+ReHo+RSN was relevant in combined vs inattentive ADHD classification, achieving a score accuracy of 67%. In this latter case, the performances of the classifiers were not equivalent and L2-regularized logistic regression (both in primal and dual space provided the most accurate predictions. The analysis of brain regions containing most discriminative information suggested that in both classifications (ADHD vs typically developing controls and combined vs inattentive, the relevant information is not confined only to a small set of regions but it is spatially distributed across the whole brain.
Wan, Jianwei; Wang, Ling; Huang, Fukan; Zhou, Liangzhu
This paper first give a comprehensive discussion about the concept of artificial neural network its research methods and the relations with information processing. On the basis of such a discussion, we expound the mathematical similarity of artificial neural network and information processing. Then, the paper presents a new method of image recognition based on invariant features and neural network by using image Zernike transform. The method not only has the invariant properties for rotation, shift and scale of image object, but also has good fault tolerance and robustness. Meanwhile, it is also compared with statistical classifier and invariant moments recognition method.
Jyoti Dalal*1 & Sumiran Daiya2
Character recognition techniques associate a symbolic identity with the image of character. In a typical OCR systems input characters are digitized by an optical scanner. Each character is then located and segmented, and the resulting character image is fed into a pre-processor for noise reduction and normalization. Certain characteristics are the extracted from the character for classification. The feature extraction is critical and many different techniques exist, each having its strengths ...
Wang, Hui; Li, Jingchao; Guo, Lili; Dou, Zheng; Lin, Yun; Zhou, Ruolin
How to analyze and identify the characteristics of radiation sources and estimate the threat level by means of detecting, intercepting and locating has been the central issue of electronic support in the electronic warfare, and communication signal recognition is one of the key points to solve this issue. Aiming at accurately extracting the individual characteristics of the radiation source for the increasingly complex communication electromagnetic environment, a novel feature extraction algorithm for individual characteristics of the communication radiation source based on the fractal complexity of the signal is proposed. According to the complexity of the received signal and the situation of environmental noise, use the fractal dimension characteristics of different complexity to depict the subtle characteristics of the signal to establish the characteristic database, and then identify different broadcasting station by gray relation theory system. The simulation results demonstrate that the algorithm can achieve recognition rate of 94% even in the environment with SNR of -10dB, and this provides an important theoretical basis for the accurate identification of the subtle features of the signal at low SNR in the field of information confrontation.
Galih Hendra Wibowo
Full Text Available Javanese character is one of Indonesia's noble culture, especially in Java. However, the number of Javanese people who are able to read the letter has decreased so that there need to be conservation efforts in the form of a system that is able to recognize the characters. One solution to these problem lies in Optical Character Recognition (OCR studies, where one of its heaviest points lies in feature extraction which is to distinguish each character. Shape Energy is one of feature extraction method with the basic idea of how the character can be distinguished simply through its skeleton. Based on the basic idea, then the development of feature extraction is done based on its components to produce an angular histogram with various variations of multiples angle. Furthermore, the performance test of this method and its basic method is performed in Javanese character dataset, which has been obtained from various images, is 240 data with 19 labels by using K-Nearest Neighbors as its classification method. Performance values were obtained based on the accuracy which is generated through the Cross-Validation process of 80.83% in the angular histogram with an angle of 20 degrees, 23% better than Shape Energy. In addition, other test results show that this method is able to recognize rotated character with the lowest performance value of 86% at 180-degree rotation and the highest performance value of 96.97% at 90-degree rotation. It can be concluded that this method is able to improve the performance of Shape Energy in the form of recognition of Javanese characters as well as robust to the rotation.
Mounîm A. El-Yacoubi
Full Text Available We present an autonomous assistive robotic system for human activity recognition from video sequences. Due to the large variability inherent to video capture from a non-fixed robot (as opposed to a fixed camera, as well as the robot's limited computing resources, implementation has been guided by robustness to this variability and by memory and computing speed efficiency. To accommodate motion speed variability across users, we encode motion using dense interest point trajectories. Our recognition model harnesses the dense interest point bag-of-words representation through an intersection kernel-based SVM that better accommodates the large intra-class variability stemming from a robot operating in different locations and conditions. To contextually assess the engine as implemented in the robot, we compare it with the most recent approaches of human action recognition performed on public datasets (non-robot-based, including a novel approach of our own that is based on a two-layer SVM-hidden conditional random field sequential recognition model. The latter's performance is among the best within the recent state of the art. We show that our robot-based recognition engine, while less accurate than the sequential model, nonetheless shows good performances, especially given the adverse test conditions of the robot, relative to those of a fixed camera.
Goedemé, Toon; Tuytelaars, Tinne; Van Gool, Luc; Vanacker, Gerolf; Nuttin, Marnix
Goedemé T., Tuytelaars T., Van Gool L., Vanacker G., Nuttin M., ''Feature based omnidirectional sparse visual path following'', Proceedings IEEE/RSJ international conference on intelligent robots and systems - IROS2005, pp. 1003-1008, August 2-6, 2005, Edmonton, Alberta, Canada.
Feng, G.; Wu, Ye-qing
With the enlargement of mankind's activity range, the significance for person's status identity is becoming more and more important. So many different techniques for person's status identity were proposed for this practical usage. Conventional person's status identity methods like password and identification card are not always reliable. A wide variety of biometrics has been developed for this challenge. Among those biologic characteristics, iris pattern gains increasing attention for its stability, reliability, uniqueness, noninvasiveness and difficult to counterfeit. The distinct merits of the iris lead to its high reliability for personal identification. So the iris identification technique had become hot research point in the past several years. This paper presents an efficient algorithm for iris recognition using gray-level co-occurrence matrix(GLCM) and Discrete Cosine transform(DCT). To obtain more representative iris features, features from space and DCT transformation domain are extracted. Both GLCM and DCT are applied on the iris image to form the feature sequence in this paper. The combination of GLCM and DCT makes the iris feature more distinct. Upon GLCM and DCT the eigenvector of iris extracted, which reflects features of spatial transformation and frequency transformation. Experimental results show that the algorithm is effective and feasible with iris recognition.
Shin, Kwang Yong; Kim, Yeong Gon; Park, Kang Ryoung
For the purpose of biometric person identification, iris recognition uses the unique characteristics of the patterns of the iris; that is, the eye region between the pupil and the sclera. When obtaining an iris image, the iris's image is frequently rotated because of the user's head roll toward the left or right shoulder. As the rotation of the iris image leads to circular shifting of the iris features, the accuracy of iris recognition is degraded. To solve this problem, conventional iris recognition methods use shifting of the iris feature codes to perform the matching. However, this increases the computational complexity and level of false acceptance error. To solve these problems, we propose a novel iris recognition method based on multi-unit iris images. Our method is novel in the following five ways compared with previous methods. First, to detect both eyes, we use Adaboost and a rapid eye detector (RED) based on the iris shape feature and integral imaging. Both eyes are detected using RED in the approximate candidate region that consists of the binocular region, which is determined by the Adaboost detector. Second, we classify the detected eyes into the left and right eyes, because the iris patterns in the left and right eyes in the same person are different, and they are therefore considered as different classes. We can improve the accuracy of iris recognition using this pre-classification of the left and right eyes. Third, by measuring the angle of head roll using the two center positions of the left and right pupils, detected by two circular edge detectors, we obtain the information of the iris rotation angle. Fourth, in order to reduce the error and processing time of iris recognition, adaptive bit-shifting based on the measured iris rotation angle is used in feature matching. Fifth, the recognition accuracy is enhanced by the score fusion of the left and right irises. Experimental results on the iris open database of low-resolution images showed that the
Sharma, Mahesh Kumar; Sharma, Shashikant; Leeprechanon, Nopbhorn; Ranjan, Aashish
For the realization of face recognition systems in the static as well as in the real time frame, algorithms such as principal component analysis, independent component analysis, linear discriminate analysis, neural networks and genetic algorithms are used for decades. This paper discusses an approach which is a wavelet decomposition based principal component analysis for face recognition. Principal component analysis is chosen over other algorithms due to its relative simplicity, efficiency, and robustness features. The term face recognition stands for identifying a person from his facial gestures and having resemblance with factor analysis in some sense, i.e. extraction of the principal component of an image. Principal component analysis is subjected to some drawbacks, mainly the poor discriminatory power and the large computational load in finding eigenvectors, in particular. These drawbacks can be greatly reduced by combining both wavelet transform decomposition for feature extraction and principal component analysis for pattern representation and classification together, by analyzing the facial gestures into space and time domain, where, frequency and time are used interchangeably. From the experimental results, it is envisaged that this face recognition method has made a significant percentage improvement in recognition rate as well as having a better computational efficiency.
Bullee, Jan-Willem; Veldhuis, Raymond N.J.
What information is available in biometric features besides that needed for the biometric recognition process? What if a biometric feature contains Personally Identifiable Information? Will the whole biometric system become a threat to privacy? This paper is an attempt to quantifiy the link between
as the recognition of characters using rule based post-pro- cessing algorithm. ... ods in their work in order to recognize handwriting with pen-based devices. ..... Centernew is the average y-coordinate value of new stroke and denotes the center ...
Zhou, Zhi; Cao, Zongjie; Pi, Yiming
The frequency of terahertz radar ranges from 0.1 THz to 10 THz, which is higher than that of microwaves. Multi-modal signals, including high-resolution range profile (HRRP) and Doppler signatures, can be acquired by the terahertz radar system. These two kinds of information are commonly used in automatic target recognition; however, dynamic gesture recognition is rarely discussed in the terahertz regime. In this paper, a dynamic gesture recognition system using a terahertz radar is proposed, based on multi-modal signals. The HRRP sequences and Doppler signatures were first achieved from the radar echoes. Considering the electromagnetic scattering characteristics, a feature extraction model is designed using location parameter estimation of scattering centers. Dynamic Time Warping (DTW) extended to multi-modal signals is used to accomplish the classifications. Ten types of gesture signals, collected from a terahertz radar, are applied to validate the analysis and the recognition system. The results of the experiment indicate that the recognition rate reaches more than 91%. This research verifies the potential applications of dynamic gesture recognition using a terahertz radar.
Ariffin Abdul Razak
This paper presents an object-oriented, feature-based design system which supports the integration of design and manufacture by ensuring that part descriptions fully account for any feature interactions. Manufacturing information is extracted from the feature descriptions in the form of volumes and Tool Access Directions, TADs. When features interact, both volumes and TADs are updated. This methodology has been demonstrated by developing a prototype system in which ACIS attributes are used to record feature information within the data structure of the solid model. The system implemented in the C++ programming language and embedded in a menu-driven X-windows user interface to the ACIS 3D Toolkit. (author)
Handwritten character recognition plays an important role in transforming raw visual image data obtained from handwritten documents using for example scanners to a format which is understandable by a computer. It is an important application in the field of pattern recognition, machine learning and
Winda, A.; E Byan, W. R.; Sofyan; Armansyah; Zariantin, D. L.; Josep, B. G.
Current mechanical key in the motorcycle is prone to bulgary, being stolen or misplaced. Intelligent biometric voice recognition as means to replace this mechanism is proposed as an alternative. The proposed system will decide whether the voice is belong to the user or not and the word utter by the user is ‘On’ or ‘Off’. The decision voice will be sent to Arduino in order to start or stop the engine. The recorded voice is processed in order to get some features which later be used as input to the proposed system. The Mel-Frequency Ceptral Coefficient (MFCC) is adopted as a feature extraction technique. The extracted feature is the used as input to the SVM-based identifier. Experimental results confirm the effectiveness of the proposed intelligent voice recognition and word recognition system. It show that the proposed method produces a good training and testing accuracy, 99.31% and 99.43%, respectively. Moreover, the proposed system shows the performance of false rejection rate (FRR) and false acceptance rate (FAR) accuracy of 0.18% and 17.58%, respectively. In the intelligent word recognition shows that the training and testing accuracy are 100% and 96.3%, respectively.
Full Text Available Intelligent recognition of traffic police command gestures increases authenticity and interactivity in virtual urban scenes. To actualize real-time traffic gesture recognition, a novel spatiotemporal convolution neural network (ST-CNN model is presented. We utilized Kinect 2.0 to construct a traffic police command gesture skeleton (TPCGS dataset collected from 10 volunteers. Subsequently, convolution operations on the locational change of each skeletal point were performed to extract temporal features, analyze the relative positions of skeletal points, and extract spatial features. After temporal and spatial features based on the three-dimensional positional information of traffic police skeleton points were extracted, the ST-CNN model classified positional information into eight types of Chinese traffic police gestures. The test accuracy of the ST-CNN model was 96.67%. In addition, a virtual urban traffic scene in which real-time command tests were carried out was set up, and a real-time test accuracy rate of 93.0% was achieved. The proposed ST-CNN model ensured a high level of accuracy and robustness. The ST-CNN model recognized traffic command gestures, and such recognition was found to control vehicles in virtual traffic environments, which enriches the interactive mode of the virtual city scene. Traffic command gesture recognition contributes to smart city construction.
Li, Weihong; Liu, Lijuan; Gong, Weiguo
Support vector machine (SVM) has been proved to be a powerful tool for face recognition. The generalization capacity of SVM depends on the model with optimal hyperparameters. The computational cost of SVM model selection results in application difficulty in face recognition. In order to overcome the shortcoming, we utilize the advantage of uniform design--space filling designs and uniformly scattering theory to seek for optimal SVM hyperparameters. Then we propose a face recognition scheme based on SVM with optimal model which obtained by replacing the grid and gradient-based method with uniform design. The experimental results on Yale and PIE face databases show that the proposed method significantly improves the efficiency of SVM model selection.
Azmi, Mohd Sanusi; Omar, Khairuddin; Nasrudin, Mohamad Faidzul; Idrus, Bahari; Wan Mohd Ghazali, Khadijah
A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07% accuracy for SVM, and 96.73% for NN. For IFHCDB and MNIST the accuracy are 91.75% and 93.095% respectively. For the UML tests, HODA dataset is 93.91%, IFHCDB 85.94% and MNIST 86.61%. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90% for each SML trained datasets where the highest result is the one that uses 25 zones features.
Full Text Available When extracting discriminative features from multimodal data, current methods rarely concern themselves with the data distribution. In this paper, we present an assumption that is consistent with the viewpoint of discrimination, that is, a person’s overall biometric data should be regarded as one class in the input space, and his different biometric data can form different Gaussians distributions, i.e., different subclasses. Hence, we propose a novel multimodal feature extraction and recognition approach based on subclass discriminant analysis (SDA. Specifically, one person’s different bio-data are treated as different subclasses of one class, and a transformed space is calculated, where the difference among subclasses belonging to different persons is maximized, and the difference within each subclass is minimized. Then, the obtained multimodal features are used for classification. Two solutions are presented to overcome the singularity problem encountered in calculation, which are using PCA preprocessing, and employing the generalized singular value decomposition (GSVD technique, respectively. Further, we provide nonlinear extensions of SDA based multimodal feature extraction, that is, the feature fusion based on KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first apply Kernel PCA on each single modal before performing SDA. While in KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem. For simplicity two typical types of biometric data are considered in this paper, i.e., palmprint data and face data. Compared with several representative multimodal biometrics recognition methods, experimental results show that our approaches outperform related multimodal recognition methods and KSDA-GSVD achieves the best recognition performance.
Jing, Xiao-Yuan; Li, Sheng; Li, Wen-Qian; Yao, Yong-Fang; Lan, Chao; Lu, Jia-Sen; Yang, Jing-Yu
When extracting discriminative features from multimodal data, current methods rarely concern themselves with the data distribution. In this paper, we present an assumption that is consistent with the viewpoint of discrimination, that is, a person's overall biometric data should be regarded as one class in the input space, and his different biometric data can form different Gaussians distributions, i.e., different subclasses. Hence, we propose a novel multimodal feature extraction and recognition approach based on subclass discriminant analysis (SDA). Specifically, one person's different bio-data are treated as different subclasses of one class, and a transformed space is calculated, where the difference among subclasses belonging to different persons is maximized, and the difference within each subclass is minimized. Then, the obtained multimodal features are used for classification. Two solutions are presented to overcome the singularity problem encountered in calculation, which are using PCA preprocessing, and employing the generalized singular value decomposition (GSVD) technique, respectively. Further, we provide nonlinear extensions of SDA based multimodal feature extraction, that is, the feature fusion based on KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first apply Kernel PCA on each single modal before performing SDA. While in KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem. For simplicity two typical types of biometric data are considered in this paper, i.e., palmprint data and face data. Compared with several representative multimodal biometrics recognition methods, experimental results show that our approaches outperform related multimodal recognition methods and KSDA-GSVD achieves the best recognition performance.
Jing, Xiao-Yuan; Li, Sheng; Li, Wen-Qian; Yao, Yong-Fang; Lan, Chao; Lu, Jia-Sen; Yang, Jing-Yu
When extracting discriminative features from multimodal data, current methods rarely concern themselves with the data distribution. In this paper, we present an assumption that is consistent with the viewpoint of discrimination, that is, a person's overall biometric data should be regarded as one class in the input space, and his different biometric data can form different Gaussians distributions, i.e., different subclasses. Hence, we propose a novel multimodal feature extraction and recognition approach based on subclass discriminant analysis (SDA). Specifically, one person's different bio-data are treated as different subclasses of one class, and a transformed space is calculated, where the difference among subclasses belonging to different persons is maximized, and the difference within each subclass is minimized. Then, the obtained multimodal features are used for classification. Two solutions are presented to overcome the singularity problem encountered in calculation, which are using PCA preprocessing, and employing the generalized singular value decomposition (GSVD) technique, respectively. Further, we provide nonlinear extensions of SDA based multimodal feature extraction, that is, the feature fusion based on KPCA-SDA and KSDA-GSVD. In KPCA-SDA, we first apply Kernel PCA on each single modal before performing SDA. While in KSDA-GSVD, we directly perform Kernel SDA to fuse multimodal data by applying GSVD to avoid the singular problem. For simplicity two typical types of biometric data are considered in this paper, i.e., palmprint data and face data. Compared with several representative multimodal biometrics recognition methods, experimental results show that our approaches outperform related multimodal recognition methods and KSDA-GSVD achieves the best recognition performance. PMID:22778600
Full Text Available The accuracy of underwater acoustic targets recognition via limited ship radiated noise can be improved by a deep neural network trained with a large number of unlabeled samples. However, redundant features learned by deep neural network have negative effects on recognition accuracy and efficiency. A compressed deep competitive network is proposed to learn and extract features from ship radiated noise. The core idea of the algorithm includes: (1 Competitive learning: By integrating competitive learning into the restricted Boltzmann machine learning algorithm, the hidden units could share the weights in each predefined group; (2 Network pruning: The pruning based on mutual information is deployed to remove the redundant parameters and further compress the network. Experiments based on real ship radiated noise show that the network can increase recognition accuracy with fewer informative features. The compressed deep competitive network can achieve a classification accuracy of 89.1 % , which is 5.3 % higher than deep competitive network and 13.1 % higher than the state-of-the-art signal processing feature extraction methods.
Zhang Fang Hu
Full Text Available As the pixel information of depth image is derived from the distance information, when implementing SURF algorithm with KINECT sensor for static sign language recognition, there can be some mismatched pairs in palm area. This paper proposes a feature point selection algorithm, by filtering the SURF feature points step by step based on the number of feature points within adaptive radius r and the distance between the two points, it not only greatly improves the recognition rate, but also ensures the robustness under the environmental factors, such as skin color, illumination intensity, complex background, angle and scale changes. The experiment results show that the improved SURF algorithm can effectively improve the recognition rate, has a good robustness.
Kim, Woong-Ki; Park, Soon-Yong; Lee, Yong-Bum; Kim, Seung-Ho; Lee, Jong-Min; Chien, Sung-Il.
The nuclear fuel rods should be discriminated and managed systematically by numeric characters which are printed at the end part of each rod in the process of producing fuel assembly. The characters are used to examine manufacturing process of the fuel rods in the inspection process of irradiated fuel rod. Therefore automatic character recognition is one of the most important technologies to establish automatic manufacturing process of fuel assembly. In the developed character recognition system, mesh feature set extracted from each character written in the fuel rod is employed to train a neural network based on back-propagation algorithm as a classifier for character recognition system. Performance evaluation has been achieved on a test set which is not included in a training character set. (author)
Full Text Available A good Arabic handwritten recognition system must consider the characteristics of Arabic letters which can be explicit such as the presence of diacritics or implicit such as the baseline information (a virtual line on which cursive text are aligned and/join. In order to find an adequate method of features extraction, we have taken into consideration the nature of the Arabic characters. The paper investigate two methods based on two different visions: one describes the image in terms of the distribution of pixels, and the other describes it in terms of local patterns. Spatial Distribution of Pixels (SDP is used according to the first vision; whereas Local Binary Patterns (LBP are used for the second one. Tested on the Arabic portion of the Isolated Farsi Handwritten Character Database (IFHCDB and using neural networks as a classifier, SDP achieve a recognition rate around 94% while LBP achieve a recognition rate of about 96%.
Full Text Available This paper proposes combining the biometric fractal pattern and particle swarm optimization (PSO-based classifier for fingerprint recognition. Fingerprints have arch, loop, whorl, and accidental morphologies, and embed singular points, resulting in the establishment of fingerprint individuality. An automatic fingerprint identification system consists of two stages: digital image processing (DIP and pattern recognition. DIP is used to convert to binary images, refine out noise, and locate the reference point. For binary images, Katz's algorithm is employed to estimate the fractal dimension (FD from a two-dimensional (2D image. Biometric features are extracted as fractal patterns using different FDs. Probabilistic neural network (PNN as a classifier performs to compare the fractal patterns among the small-scale database. A PSO algorithm is used to tune the optimal parameters and heighten the accuracy. For 30 subjects in the laboratory, the proposed classifier demonstrates greater efficiency and higher accuracy in fingerprint recognition.
Full Text Available In Geographical Information Systems, geo-coding is used for the task of mapping from implicitly geo-referenced data to explicitly geo-referenced coordinates. At present, an enormous amount of implicitly geo-referenced information is hidden in unstructured text, e.g., Wikipedia, social data and news. Toponym recognition is the foundation of mining this useful geo-referenced information by identifying words as toponyms in text. In this paper, we propose an adapted toponym recognition approach based on deep belief network (DBN by exploring two key issues: word representation and model interpretation. A Skip-Gram model is used in the word representation process to represent words with contextual information that are ignored by current word representation models. We then determine the core hyper-parameters of the DBN model by illustrating the relationship between the performance and the hyper-parameters, e.g., vector dimensionality, DBN structures and probability thresholds. The experiments evaluate the performance of the Skip-Gram model implemented by the Word2Vec open-source tool, determine stable hyper-parameters and compare our approach with a conditional random field (CRF based approach. The experimental results show that the DBN model outperforms the CRF model with smaller corpus. When the corpus size is large enough, their statistical metrics become approaching. However, their recognition results express differences and complementarity on different kinds of toponyms. More importantly, combining their results can directly improve the performance of toponym recognition relative to their individual performances. It seems that the scale of the corpus has an obvious effect on the performance of toponym recognition. Generally, there is no adequate tagged corpus on specific toponym recognition tasks, especially in the era of Big Data. In conclusion, we believe that the DBN-based approach is a promising and powerful method to extract geo
Chinese wines can be classification or graded by the micrographs. Micrographs of Chinese wines show floccules, stick and granule of variant shape and size. Different wines have variant microstructure and micrographs, we study the classification of Chinese wines based on the micrographs. Shape and structure of wines' particles in microstructure is the most important feature for recognition and classification of wines. So we introduce a feature extraction method which can describe the structure and region shape of micrograph efficiently. First, the micrographs are enhanced using total variation denoising, and segmented using a modified Otsu's method based on the Rayleigh Distribution. Then features are extracted using proposed method in the paper based on area, perimeter and traditional shape feature. Eight kinds total 26 features are selected. Finally, Chinese wine classification system based on micrograph using combination of shape and structure features and BP neural network have been presented. We compare the recognition results for different choices of features (traditional shape features or proposed features). The experimental results show that the better classification rate have been achieved using the combinational features proposed in this paper.
Rizal Isnanto, R.
Human iris has a very unique pattern which is possible to be used as a biometric recognition. To identify texture in an image, texture analysis method can be used. One of method is wavelet that extract the image feature based on energy. Wavelet transforms used are Haar, Daubechies, Coiflets, Symlets, and Biorthogonal. In the research, iris recognition based on five mentioned wavelets was done and then comparison analysis was conducted for which some conclusions taken. Some steps have to be done in the research. First, the iris image is segmented from eye image then enhanced with histogram equalization. The features obtained is energy value. The next step is recognition using normalized Euclidean distance. Comparison analysis is done based on recognition rate percentage with two samples stored in database for reference images. After finding the recognition rate, some tests are conducted using Energy Compaction for all five types of wavelets above. As the result, the highest recognition rate is achieved using Haar, whereas for coefficients cutting for C(i) < 0.1, Haar wavelet has a highest percentage, therefore the retention rate or significan coefficient retained for Haaris lower than other wavelet types (db5, coif3, sym4, and bior2.4)
Huo, Guang; Liu, Yuanning; Zhu, Xiaodong; Dong, Hongxing; He, Fei
Unlike score level fusion, feature level fusion demands all the features extracted from unimodal traits with high distinguishability, as well as homogeneity and compatibility, which is difficult to achieve. Therefore, most multimodal biometric research focuses on score level fusion, whereas few investigate feature level fusion. We propose a face-iris recognition method based on feature level fusion. We build a special two-dimensional-Gabor filter bank to extract local texture features from face and iris images, and then transform them by histogram statistics into an energy-orientation variance histogram feature with lower dimensions and higher distinguishability. Finally, through a fusion-recognition strategy based on principal components analysis and support vector machine (FRSPS), feature level fusion and one-to-n identification are accomplished. The experimental results demonstrate that this method can not only effectively extract face and iris features but also provide higher recognition accuracy. Compared with some state-of-the-art fusion methods, the proposed method has a significant performance advantage.
Full Text Available The EMG signal indicates the electrophysiological response to daily living of activities, particularly to lower-limb knee exercises. Literature reports have shown numerous benefits of the Wavelet analysis in EMG feature extraction for pattern recognition. However, its application to typical knee exercises when using only a single EMG channel is limited. In this study, three types of knee exercises, i.e., flexion of the leg up (standing, hip extension from a sitting position (sitting and gait (walking are investigated from 14 healthy untrained subjects, while EMG signals from the muscle group of vastus medialis and the goniometer on the knee joint of the detected leg are synchronously monitored and recorded. Four types of lower-limb motions including standing, sitting, stance phase of walking, and swing phase of walking, are segmented. The Wavelet Transform (WT based Singular Value Decomposition (SVD approach is proposed for the classification of four lower-limb motions using a single-channel EMG signal from the muscle group of vastus medialis. Based on lower-limb motions from all subjects, the combination of five-level wavelet decomposition and SVD is used to comprise the feature vector. The Support Vector Machine (SVM is then configured to build a multiple-subject classifier for which the subject independent accuracy will be given across all subjects for the classification of four types of lower-limb motions. In order to effectively indicate the classification performance, EMG features from time-domain (e.g., Mean Absolute Value (MAV, Root-Mean-Square (RMS, integrated EMG (iEMG, Zero Crossing (ZC and frequency-domain (e.g., Mean Frequency (MNF and Median Frequency (MDF are also used to classify lower-limb motions. The five-fold cross validation is performed and it repeats fifty times in order to acquire the robust subject independent accuracy. Results show that the proposed WT-based SVD approach has the classification accuracy of 91.85%±0
In past manned lunar landing missions, such as Apollo 14, spatial disorientation of astronauts substantially compromised the productivities of astronauts, and caused safety and mission success problems. The non-GPS lunar environment has micro-gravity field, and lacks both spatial recognition cues and reference objects which are familiar to the human biological sensors related to spatial recognition (e.g. eyes). Such an environment causes misperceptions of the locations of astronauts and targets and their spatial relations, as well as misperceptions of the heading direction and travel distances of astronauts. These spatial disorientation effects can reduce productivity and cause life risks in lunar manned missions. A navigation system, which is capable of locating astronauts and tracking the movements of them on the lunar surface, is critical for future lunar manned missions where multiple astronauts will traverse more than 100km from the lander or the base station with the assistance from roving vehicle, and need real-time navigation support for effective collaborations among them. Our earlier research to solve these problems dealt with developing techniques to enable a precise, flexible and reliable Lunar Astronaut Spatial Orientation and Information System (LASOIS) capable of delivering real-time navigation information to astronauts on the lunar surface. The LASOIS hardware was a sensor network composed of orbital, ground and on-suit sensors: the Lunar Reconnaissance Orbiter Camera (LROC), radio beacons, the on-suit cameras, and shoe-mounted Inertial Measurement Unit (IMU). The LASOIS software included efficient and robust algorithms for estimating trajectory from IMU signals, generating heading information from imagery acquired from on-suit cameras, and an Extended Kalman Filter (EKF) based approach for integrating these spatial information components to generate the trajectory of an astronaut with meter-level accuracy. Moreover, LASOIS emphasized multi
Tomizuka, Chiaki; Takeuchi, Yutaka
In a nuclear facility, the maintenance and repair activities must be done remotely in a radioactive environment. Fuji Electric Systems Co., Ltd. has developed a remote handling system based on 3-D recognition technique. The system recognizes the pose and position of the target to manipulate, and visualizes the scene with the target in 3-D, enabling an operator to handle it easily. This paper introduces the concept and the key features of this system. (author)
M. Favorskaya; A. Nosov; A. Popov
Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin dete...
Min Beom Lee
Full Text Available In recent years, the iris recognition system has been gaining increasing acceptance for applications such as access control and smartphone security. When the images of the iris are obtained under unconstrained conditions, an issue of undermined quality is caused by optical and motion blur, off-angle view (the user’s eyes looking somewhere else, not into the front of the camera, specular reflection (SR and other factors. Such noisy iris images increase intra-individual variations and, as a result, reduce the accuracy of iris recognition. A typical iris recognition system requires a near-infrared (NIR illuminator along with an NIR camera, which are larger and more expensive than fingerprint recognition equipment. Hence, many studies have proposed methods of using iris images captured by a visible light camera without the need for an additional illuminator. In this research, we propose a new recognition method for noisy iris and ocular images by using one iris and two periocular regions, based on three convolutional neural networks (CNNs. Experiments were conducted by using the noisy iris challenge evaluation-part II (NICE.II training dataset (selected from the university of Beira iris (UBIRIS.v2 database, mobile iris challenge evaluation (MICHE database, and institute of automation of Chinese academy of sciences (CASIA-Iris-Distance database. As a result, the method proposed by this study outperformed previous methods.
Lee, Min Beom; Hong, Hyung Gil; Park, Kang Ryoung
In recent years, the iris recognition system has been gaining increasing acceptance for applications such as access control and smartphone security. When the images of the iris are obtained under unconstrained conditions, an issue of undermined quality is caused by optical and motion blur, off-angle view (the user's eyes looking somewhere else, not into the front of the camera), specular reflection (SR) and other factors. Such noisy iris images increase intra-individual variations and, as a result, reduce the accuracy of iris recognition. A typical iris recognition system requires a near-infrared (NIR) illuminator along with an NIR camera, which are larger and more expensive than fingerprint recognition equipment. Hence, many studies have proposed methods of using iris images captured by a visible light camera without the need for an additional illuminator. In this research, we propose a new recognition method for noisy iris and ocular images by using one iris and two periocular regions, based on three convolutional neural networks (CNNs). Experiments were conducted by using the noisy iris challenge evaluation-part II (NICE.II) training dataset (selected from the university of Beira iris (UBIRIS).v2 database), mobile iris challenge evaluation (MICHE) database, and institute of automation of Chinese academy of sciences (CASIA)-Iris-Distance database. As a result, the method proposed by this study outperformed previous methods.
Braunecker, B; Hauck, R; Lohmann, A W
The essence of character recognition is a comparison between the unknown character and a set of reference patterns. Usually, these reference patterns are all possible characters themselves, the whole alphabet in the case of letter characters. Obviously, N analog measurements are highly redundant, since only K = log(2)N binary decisions are enough to identify one out of N characters. Therefore, we devised K reference patterns accordingly. These patterns, called principal components, are found by digital image processing, but used in an optical analog computer. We will explain the concept of principal components, and we will describe experiments with several optical character recognition systems, based on this concept.
Dutta, A.; Veldhuis, Raymond N.J.; Spreeuwers, Lieuwe Jan
In this paper, we propose a non-frontal model based approach which ensures that a face recognition system always gets to compare images having similar view (or pose). This requires a virtual suspect reference set that consists of non-frontal suspect images having pose similar to the surveillance
Nasrollahi, Kamal; Guerrero, Sergio Escalera; Rasti, Pejman
with results of a state-of- the-art deep learning-based super-resolution algorithm, through an alpha-blending approach. The experimental results obtained on down-sampled version of a large subset of Hoolywood2 benchmark database show the importance of the proposed system in increasing the recognition rate...
Veldhuis, Raymond N.J.; Bazen, A.M.; Booij, W.D.T.; Hendrikse, A.J.; Jain, A.K.; Ratha, N.K.
This paper demonstrates the feasibility of a new method of hand-geometry recognition based on parameters derived from the contour of the hand. The contour is completely determined by the black-and-white image of the hand and can be derived from it by means of simple image-processing techniques. It
Veldhuis, Raymond N.J.; Bazen, A.M.; Kauffman, J.A.; Hartel, Pieter H.; Delp, Edward J.; Wong, Ping W.
This paper describes the design, implementation and evaluation of a user-verification system for a smart gun, which is based on grip-pattern recognition. An existing pressure sensor consisting of an array of 44 x 44 piezoresistive elements is used to measure the grip pattern. An interface has been
Poppe, Ronald Walter
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion
Jensen, Rune Fisker; Carstensen, Jens Michael
We propose a general scheme for object localization and recognition based on a deformable model. The model combines shape and image properties by warping a arbitrary prototype intensity template according to the deformation in shape. The shape deformations are constrained by a probabilistic distr...
Veldhuis, Raymond N.J.; Bazen, A.M.; Kauffman, J.A.; Hartel, Pieter H.
This paper describes the design, implementation and evaluation of a user-verification system for a smart gun, which is based on grip-pattern recognition. An existing pressure sensor consisting of an array of 44 £ 44 piezoresistive elements is used to measure the grip pattern. An interface has been
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity. PMID:24683317
Full Text Available In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region’s weights and then weighted different subregions’ matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1, demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity.
Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; He, Fei; Wang, Hongye; Deng, Ning
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, and MMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity.
Full Text Available Most of the existing gait recognition methods rely on a single view, usually the side view, of the walking person. This paper investigates the case in which several views are available for gait recognition. It is shown that each view has unequal discrimination power and, therefore, should have unequal contribution in the recognition process. In order to exploit the availability of multiple views, several methods for the combination of the results that are obtained from the individual views are tested and evaluated. A novel approach for the combination of the results from several views is also proposed based on the relative importance of each view. The proposed approach generates superior results, compared to those obtained by using individual views or by using multiple views that are combined using other combination methods.
Cognitive agents are expected to interact with and adapt to a nonstationary dynamic environment. As an initial process of decision making in a real-world agent interaction, familiarity judgment leads the following processes for intelligence. Familiarity judgment includes knowing previously encoded data as well as completing original patterns from partial information, which are fundamental functions of recognition memory. Although previous computational memory models have attempted to reflect human behavioral properties on the recognition memory, they have been focused on static conditions without considering temporal changes in terms of lifelong learning. To provide temporal adaptability to an agent, in this paper, we suggest a computational model for recognition memory that enables lifelong learning. The proposed model is based on a hypergraph structure, and thus it allows a high-order relationship between contextual nodes and enables incremental learning. Through a simulated experiment, we investigate the optimal conditions of the memory model and validate the consistency of memory performance for lifelong learning. PMID:25371665
The last two decades has seen a rapid increase in the application of AIS (Artificial Immune Systems) modeled after the human immune system to a wide range of areas including network intrusion detection, job shop scheduling, classification, pattern recognition, and robot control. JPL (Jet Propulsion Laboratory) has developed an integrated pattern recognition/classification system called AISLE (Artificial Immune System for Learning and Exploration) based on biologically inspired models of B-cell dynamics in the immune system. When used for unsupervised or supervised classification, the method scales linearly with the number of dimensions, has performance that is relatively independent of the total size of the dataset, and has been shown to perform as well as traditional clustering methods. When used for pattern recognition, the method efficiently isolates the appropriate matches in the data set. The paper presents the underlying structure of AISLE and the results from a number of experimental studies.
Baby, Sanmohan; Krüger, Volker
a sequential and statistical learning algorithm for automatic detection of the action primitives and the action grammar based on these primitives. We model a set of actions using a single HMM whose structure is learned incrementally as we observe new types. Actions are modeled with sufficient...
Full Text Available The control of prosthetic limb would be more effective if it is based on Surface Electromyogram (SEMG signals from remnant muscles. The analysis of SEMG signals depend on a number of factors, such as amplitude as well as time- and frequency-domain properties. Time series analysis using Auto Regressive (AR model and Mean frequency which is tolerant to white Gaussian noise are used as feature extraction techniques. EMG Histogram is used as another feature vector that was seen to give more distinct classification. The work was done with SEMG dataset obtained from the NINAPRO DATABASE, a resource for bio robotics community. Eight classes of hand movements hand open, hand close, Wrist extension, Wrist flexion, Pointing index, Ulnar deviation, Thumbs up, Thumb opposite to little finger are taken into consideration and feature vectors are extracted. The feature vectors can be given to an artificial neural network for further classification in controlling the prosthetic arm which is not dealt in this paper.
Full Text Available The identification and classification of interchange structures in OSM data can provide important information for the construction of multi-scale model, navigation and location services, congestion analysis, etc. The traditional method of interchange identification relies on the low-level characteristics of artificial design, and cannot distinguish the complex interchange structure with interference section effectively. In this paper, a new method based on convolutional neural network for identification of the interchange is proposed. The method combines vector data with raster image, and uses neural network to learn the fuzzy characteristics of the interchange, and classifies the complex interchange structure in OSM. Experiments show that this method has strong anti-interference, and has achieved good results in the classification of complex interchange shape, and there is room for further improvement with the expansion of the case base and the optimization of neural network model.
Full Text Available Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs and extracting deep convolutional neural network (DCNN features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.
Full Text Available This paper incorporates principal component analysis (PCA with support vector machine-particle swarm optimization (SVM-PSO for developing real-time face recognition systems. The integrated scheme aims to adopt the SVM-PSO method to improve the validity of PCA based image recognition systems on dynamically visual perception. The face recognition for most human-robot interaction applications is accomplished by PCA based method because of its dimensionality reduction. However, PCA based systems are only suitable for processing the faces with the same face expressions and/or under the same view directions. Since the facial feature selection process can be considered as a problem of global combinatorial optimization in machine learning, the SVM-PSO is usually used as an optimal classifier of the system. In this paper, the PSO is used to implement a feature selection, and the SVMs serve as fitness functions of the PSO for classification problems. Experimental results demonstrate that the proposed method simplifies features effectively and obtains higher classification accuracy.
Mioulet, L.; Bideault, G.; Chatelain, C.; Paquet, T.; Brunessaux, S.
The BLSTM-CTC is a novel recurrent neural network architecture that has outperformed previous state of the art algorithms in tasks such as speech recognition or handwriting recognition. It has the ability to process long term dependencies in temporal signals in order to label unsegmented data. This paper describes different ways of combining features using a BLSTM-CTC architecture. Not only do we explore the low level combination (feature space combination) but we also explore high level combination (decoding combination) and mid-level (internal system representation combination). The results are compared on the RIMES word database. Our results show that the low level combination works best, thanks to the powerful data modeling of the LSTM neurons.
Sprager, Sebastijan; Juric, Matjaz B.
With the recent development of microelectromechanical systems (MEMS), inertial sensors have become widely used in the research of wearable gait analysis due to several factors, such as being easy-to-use and low-cost. Considering the fact that each individual has a unique way of walking, inertial sensors can be applied to the problem of gait recognition where assessed gait can be interpreted as a biometric trait. Thus, inertial sensor-based gait recognition has a great potential to play an important role in many security-related applications. Since inertial sensors are included in smart devices that are nowadays present at every step, inertial sensor-based gait recognition has become very attractive and emerging field of research that has provided many interesting discoveries recently. This paper provides a thorough and systematic review of current state-of-the-art in this field of research. Review procedure has revealed that the latest advanced inertial sensor-based gait recognition approaches are able to sufficiently recognise the users when relying on inertial data obtained during gait by single commercially available smart device in controlled circumstances, including fixed placement and small variations in gait. Furthermore, these approaches have also revealed considerable breakthrough by realistic use in uncontrolled circumstances, showing great potential for their further development and wide applicability. PMID:26340634
Xie, Youye; Tang, Gongguo; Hoff, William
Chessboards are commonly used to calibrate cameras, and many robust methods have been developed to recognize the unoccupied boards. However, when the chessboard is populated with chess pieces, such as during an actual game, the problem of recognizing the board is much harder. Challenges include occlusion caused by the chess pieces, the presence of outlier lines and low viewing angles of the chessboard. In this paper, we present a novel approach to address the above challenges and recognize the chessboard. The Canny edge detector and Hough transform are used to capture all possible lines in the scene. The k-means clustering and a k-nearest-neighbors inspired algorithm are applied to cluster and reject the outlier lines based on their Euclidean distances to the nearest neighbors in a scaled Hough transform space. Finally, based on prior knowledge of the chessboard structure, a geometric constraint is used to find the correspondences between image lines and the lines on the chessboard through the homography transformation. The proposed algorithm works for a wide range of the operating angles and achieves high accuracy in experiments.
Khan, Zafar Ali; Sohn, Won
The growing population of elderly people living alone increases the need for automatic healthcare monitoring systems for elderly care. Automatic vision sensor-based systems are increasingly used for human activity recognition (HAR) in recent years. This study presents an improved model, tested using actors, of a sensor-based HAR system to recognize daily life activities of elderly people at home and generate an alert in case of abnormal HAR. Datasets consisting of six abnormal activities (falling backward, falling forward, falling rightward, falling leftward, chest pain, and fainting) and four normal activities (walking, rushing, sitting down, and standing up) are generated from different view angles (90°, -90°, 45°, -45°). Feature extraction and dimensions reduction are performed by R-transform followed by generalized discriminant analysis (GDA) methods. R-transform extracts symmetric, scale, and translation-invariant features from the sequences of activities. GDA increases the discrimination between different classes of highly similar activities. Silhouette sequences are quantified by the Linde-Buzo-Gray algorithm and recognized by hidden conditional random fields. Experimental results provide an average recognition rate of 94.2% for abnormal activities and 92.7% for normal activities. The recognition rate for the highly similar activities from different view angles shows the flexibility and efficacy of the proposed abnormal HAR and alert generation system for elderly care.
Giardino, Marco; Magagna, Alessandra; Ferrero, Elena; Perrone, Gianluigi
Digital field mapping has certainly provided geoscientists with the opportunity to map and gather data in the field directly using digital tools and software rather than using paper maps, notebooks and analogue devices and then subsequently transferring the data to a digital format for subsequent analysis. But, the same opportunity has to be recognized for Geoscience education, as well as for stimulating and helping students in the recognition of landforms and interpretation of the geological and geomorphological components of a landscape. More, an early exposure to mapping during school and prior to university can optimise the ability to "read" and identify uncertainty in 3d models. During 2014, about 200 Secondary School students (aged 12-15) of the Piedmont region (NW Italy) participated in a research program involving the use of mobile devices (smartphone and tablet) in the field. Students, divided in groups, used the application Trimble Outdoors Navigators for tracking a geological trail in the Sangone Valley and for taking georeferenced pictures and notes. Back to school, students downloaded the digital data in a .kml file for the visualization on Google Earth. This allowed them: to compare the hand tracked trail on a paper map with the digital trail, and to discuss about the functioning and the precision of the tools; to overlap a digital/semitransparent version of the 2D paper map (a Regional Technical Map) used during the field trip on the 2.5D landscape of Google Earth, as to help them in the interpretation of conventional symbols such as contour lines; to perceive the landforms seen during the field trip as a part of a more complex Pleistocene glacial landscape; to understand the classical and innovative contributions from different geoscientific disciplines to the generation of a 3D structural geological model of the Rivoli-Avigliana Morainic Amphitheatre. In 2013 and 2014, some other pilot projects have been carried out in different areas of the
Cotret, Pascal; Chevobbe, Stéphane; Darouich, Mehdi
For several years, face recognition has been a hot topic in the image processing field: this technique is applied in several domains such as CCTV, electronic devices delocking and so on. In this context, this work studies the efficiency of a wavelet-based face recognition method in terms of subject position robustness and performance on various systems. The use of wavelet transform has a limited impact on the position robustness of PCA-based face recognition. This work shows, for a well-known database (Yale face database B*), that subject position in a 3D space can vary up to 10% of the original ROI size without decreasing recognition rates. Face recognition is performed on approximation coefficients of the image wavelet transform: results are still satisfying after 3 levels of decomposition. Furthermore, face database size can be divided by a factor 64 (22K with K = 3). In the context of ultra-embedded vision systems, memory footprint is one of the key points to be addressed; that is the reason why compression techniques such as wavelet transform are interesting. Furthermore, it leads to a low-complexity face detection stage compliant with limited computation resources available on such systems. The approach described in this work is tested on three platforms from a standard x86-based computer towards nanocomputers such as RaspberryPi and SECO boards. For K = 3 and a database with 40 faces, the execution mean time for one frame is 0.64 ms on a x86-based computer, 9 ms on a SECO board and 26 ms on a RaspberryPi (B model).
Qu, Hongquan; Yuan, Shijiao; Wang, Yanping; Yang, Dan
To improve the recognition performance of optical fiber prewarning system (OFPS), this study proposed a hierarchical recognition algorithm (HRA). Compared with traditional methods, which employ only a complex algorithm that includes multiple extracted features and complex classifiers to increase the recognition rate with a considerable decrease in recognition speed, HRA takes advantage of the continuity of intrusion events, thereby creating a staged recognition flow inspired by stress reaction. HRA is expected to achieve high-level recognition accuracy with less time consumption. First, this work analyzed the continuity of intrusion events and then presented the algorithm based on the mechanism of stress reaction. Finally, it verified the time consumption through theoretical analysis and experiments, and the recognition accuracy was obtained through experiments. Experiment results show that the processing speed of HRA is 3.3 times faster than that of a traditional complicated algorithm and has a similar recognition rate of 98%. The study is of great significance to fast intrusion event recognition in OFPS.
Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor
In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.
Full Text Available Marketing science seeks to prescribe better marketing strategies (advertising, product development, pricing, etc.. To do so we rely on models of consumer decisions grounded in empirical observations. Field experience suggests that recognition-based heuristics help consumers to choose which brands to consider and purchase in frequently-purchased categories, but other heuristics are more relevant in durable-goods categories. Screening with recognition is a rational screening rule when advertising is a signal of product quality, when observing other consumers makes it easy to learn decision rules, and when firms react to engineering-design constraints by offering brands such that a high-level on one product feature implies a low level on another product feature. Experience with applications and field experiments suggests four fruitful research topics: deciding how to decide (endogeneity, learning decision rules by self-reflection, risk reduction, and the difference between utility functions and decision rules. These challenges also pose methodological cautions.
Bouma, H.; Baan, J.; Burghouts, G.J.; Eendebak, P.T.; Huis, J.R. van; Dijk, J.; Rest, J.H.C. van
Proactive detection of incidents is required to decrease the cost of security incidents. This paper focusses on the automatic early detection of suspicious behavior of pickpockets with track-based features in a crowded shopping mall. Our method consists of several steps: pedestrian tracking, feature computation and pickpocket recognition. This is challenging because the environment is crowded, people move freely through areas which cannot be covered by a single camera, because the actual snat...
Islam, Kh Tohidul; Raj, Ram Gopal
Road sign recognition is a driver support function that can be used to notify and warn the driver by showing the restrictions that may be effective on the current stretch of road. Examples for such regulations are ‘traffic light ahead’ or ‘pedestrian crossing’ indications. The present investigation targets the recognition of Malaysian road and traffic signs in real-time. Real-time video is taken by a digital camera from a moving vehicle and real world road signs are then extracted using vision-only information. The system is based on two stages, one performs the detection and another one is for recognition. In the first stage, a hybrid color segmentation algorithm has been developed and tested. In the second stage, an introduced robust custom feature extraction method is used for the first time in a road sign recognition approach. Finally, a multilayer artificial neural network (ANN) has been created to recognize and interpret various road signs. It is robust because it has been tested on both standard and non-standard road signs with significant recognition accuracy. This proposed system achieved an average of 99.90% accuracy with 99.90% of sensitivity, 99.90% of specificity, 99.90% of f-measure, and 0.001 of false positive rate (FPR) with 0.3 s computational time. This low FPR can increase the system stability and dependability in real-time applications. PMID:28406471
Islam, Kh Tohidul; Raj, Ram Gopal
Road sign recognition is a driver support function that can be used to notify and warn the driver by showing the restrictions that may be effective on the current stretch of road. Examples for such regulations are 'traffic light ahead' or 'pedestrian crossing' indications. The present investigation targets the recognition of Malaysian road and traffic signs in real-time. Real-time video is taken by a digital camera from a moving vehicle and real world road signs are then extracted using vision-only information. The system is based on two stages, one performs the detection and another one is for recognition. In the first stage, a hybrid color segmentation algorithm has been developed and tested. In the second stage, an introduced robust custom feature extraction method is used for the first time in a road sign recognition approach. Finally, a multilayer artificial neural network (ANN) has been created to recognize and interpret various road signs. It is robust because it has been tested on both standard and non-standard road signs with significant recognition accuracy. This proposed system achieved an average of 99.90% accuracy with 99.90% of sensitivity, 99.90% of specificity, 99.90% of f-measure, and 0.001 of false positive rate (FPR) with 0.3 s computational time. This low FPR can increase the system stability and dependability in real-time applications.
Favorskaya, M.; Nosov, A.; Popov, A.
Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case). Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results" including 393 dynamic hand-gestures was chosen. The proposed method yielded 84-91% recognition accuracy, in average, for restricted set of dynamic gestures.
Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.
Full Text Available It is significant for the final goal of RoboCup to realize the recognition of generic balls for soccer robots. In this paper, a novel generic ball recognition algorithm based on omnidirectional vision is proposed by combining the modified Haar-like features and AdaBoost learning algorithm. The algorithm is divided into offline training and online recognition. During the phase of offline training, numerous sub-images are acquired from various panoramic images, including generic balls, and then the modified Haar-like features are extracted from them and used as the input of the AdaBoost learning algorithm to obtain a classifier. During the phase of online recognition, and according to the imaging characteristics of our omnidirectional vision system, rectangular windows are defined to search for the generic ball along the rotary and radial directions in the panoramic image, and the learned classifier is used to judge whether a ball is included in the window. After the ball has been recognized globally, ball tracking is realized by integrating a ball velocity estimation algorithm to reduce the computational cost. The experimental results show that good performance can be achieved using our algorithm, and that the generic ball can be recognized and tracked effectively.
Shin, Young Hoon; Seo, Jiwon
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Full Text Available A leukocyte recognition method for human peripheral blood smear based on island-clustering texture (ICT is proposed. By analyzing the features of the five typical classes of leukocyte images, a new ICT model is established. Firstly, some feature points are extracted in a gray leukocyte image by mean-shift clustering to be the centers of islands. Secondly, the growing region is employed to create regions of the islands in which the seeds are just these feature points. These islands distribution can describe a new texture. Finally, a distinguished parameter vector of these islands is created as the ICT features by combining the ICT features with the geometric features of the leukocyte. Then the five typical classes of leukocytes can be recognized successfully at the correct recognition rate of more than 92.3% with a total sample of 1310 leukocytes. Experimental results show the feasibility of the proposed method. Further analysis reveals that the method is robust and results can provide important information for disease diagnosis.
Full Text Available A feasible pavement crack detection system plays an important role in evaluating the road condition and providing the necessary road maintenance. In this paper, a back propagation neural network (BPNN is used to recognize pavement cracks from images. To improve the recognition accuracy of the BPNN, a complete framework of image processing is proposed including image preprocessing and crack information extraction. In this framework, the redundant image information is reduced as much as possible and two sets of feature parameters are constructed to classify the crack images. Then a BPNN is adopted to distinguish pavement images between linear and alligator cracks to acquire high recognition accuracy. Besides, the linear cracks can be further classified into transversal and longitudinal cracks according to the direction angle. Finally, the proposed method is evaluated on the data of 400 pavement images obtained by the Automatic Road Analyzer (ARAN in Northern China and the results show that the proposed method seems to be a powerful tool for pavement crack recognition. The rates of correct classification for alligator, transversal and longitudinal cracks are 97.5%, 100% and 88.0%, respectively. Compared to some previous studies, the method proposed in this paper is effective for all three kinds of cracks and the results are also acceptable for engineering application.
Full Text Available Automatic emotion recognition is one of the most challenging tasks. To detect emotion from nonstationary EEG signals, a sophisticated learning algorithm that can represent high-level abstraction is required. This study proposes the utilization of a deep learning network (DLN to discover unknown feature correlation between input signals that is crucial for the learning task. The DLN is implemented with a stacked autoencoder (SAE using hierarchical feature learning approach. Input features of the network are power spectral densities of 32-channel EEG signals from 32 subjects. To alleviate overfitting problem, principal component analysis (PCA is applied to extract the most important components of initial input features. Furthermore, covariate shift adaptation of the principal components is implemented to minimize the nonstationary effect of EEG signals. Experimental results show that the DLN is capable of classifying three different levels of valence and arousal with accuracy of 49.52% and 46.03%, respectively. Principal component based covariate shift adaptation enhances the respective classification accuracy by 5.55% and 6.53%. Moreover, DLN provides better performance compared to SVM and naive Bayes classifiers.
Full Text Available The cognitive overload not only affects the physical and mental diseases, but also affects the work efficiency and safety. Hence, the research of measuring cognitive load has been an important part of cognitive load theory. In this paper, we proposed a method to identify the state of cognitive load by using eye movement data in a noncontact manner. We designed a visual experiment to elicit human’s cognitive load as high and low state in two light intense environments and recorded the eye movement data in this whole process. Twelve salient features of the eye movement were selected by using statistic test. Algorithms for processing some features are proposed for increasing the recognition rate. Finally we used the support vector machine (SVM to classify high and low cognitive load. The experimental results show that the method can achieve 90.25% accuracy in light controlled condition.
Full Text Available This paper compares the recognition accuracy of a phoneme-based automatic speech recognition system with that of a grapheme-based system, using Afrikaans as case study. The first system is developed using a conventional pronunciation dictionary...
Xiang, Jie; Chen, Jianping; Lai, ZiLi; Yang, Wei
Yuan Yang terraces is one of the most famous terraces in China, and it was successfully listed in the world heritage list at the 37th world heritage convention. On the one hand, Yuan Yang terraces retain more soil and water, to reduce both hydrological connectivity and erosion, and to support irrigation. On the other hand, It has the important tourism value, bring the huge revenue to local residents. In order to protect and make use of Yuan Yang terraces better, This study analyzed the spatial distribution and spectral characteristics of terraces:(1) Through visual interpretation, the study recognized the terraces based on the spatial adjusted remote sensing image (2010 Geoeye-1 with resolution of 1m/pix), and extracted topographic feature (elevation, slope, aspect, etc.) based on the digital elevation model with resolution of 20m/pix. The terraces cover a total area of about 11.58Km2, accounted for 24.4% of the whole study area. The terraces appear at range from 1400m to 1800m in elevation, 10°to 20°in slope, northwest to northeast in aspect; (2) Using the method of weight of evidence, this study assessed the importance of different topographic feature. The results show that the sort of importance: elevation>slope>aspect; (3) The study counted the Normalized Difference Vegetation Index (NDVI) changes of terraces throughout the year, based on the landsat-5 image with resolution of 30m/pix. The results show that the changes of terraces' NDVI are bigger than other stuff (e.g. forest, road, house, etc.). Those work made a good preparations for establishing the dynamic remote sensing monitoring system of Yuan Yang terraces.
Xu, Haiyan; Xie, Yingjuan; Li, Min; Zhang, Zhuo; Zhang, Xuewu
Distributed fiber-optic vibration sensors receive extensive investigation and play a significant role in the sensor panorama. A fiber optic perimeter detection system based on all-fiber interferometric sensor is proposed, through the back-end analysis, processing and intelligent identification, which can distinguish effects of different intrusion activities. In this paper, an intrusion recognition based on the auditory selective attention mechanism is proposed. Firstly, considering the time-frequency of vibration, the spectrogram is calculated. Secondly, imitating the selective attention mechanism, the color, direction and brightness map of the spectrogram is computed. Based on these maps, the feature matrix is formed after normalization. The system could recognize the intrusion activities occurred along the perimeter sensors. Experiment results show that the proposed method for the perimeter is able to differentiate intrusion signals from ambient noises. What's more, the recognition rate of the system is improved while deduced the false alarm rate, the approach is proved by large practical experiment and project.
Full Text Available Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO is suggested. The suggested methodology contains four stages, namely, (i denoising, (ii feature mining (iii, vector quantization, and (iv IPSO based hidden Markov model (HMM technique (IP-HMM. At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC, mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.
Carbonetto , Peter; De Freitas , Nando; Gustafson , Paul; Thompson , Natalie
International audience; We present a method for variable selection/weighting in an unsupervised learning context using Bayesian shrinkage. The basis for the model parameters and cluster assignments can be computed simultaneous using an efficient EM algorithm. Applying our Bayesian shrinkage model to a complex problem in object recognition (Duygulu, Barnard, de Freitas and Forsyth 2002), our experiments yied good results.
Kim, Jaebok; Truong, Khiet; Englebienne, Gwenn; Evers, Vanessa
In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed