WorldWideScience

Sample records for svm-based protein residue

  1. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    Science.gov (United States)

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.

  2. Protein-protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM.

    Science.gov (United States)

    Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal

    2015-10-01

    Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.

  3. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  4. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    Science.gov (United States)

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  5. Classification of different kinds of pesticide residues on lettuce based on fluorescence spectra and WT-BCC-SVM algorithm

    Science.gov (United States)

    Zhou, Xin; Jun, Sun; Zhang, Bing; Jun, Wu

    2017-07-01

    In order to improve the reliability of the spectrum feature extracted by wavelet transform, a method combining wavelet transform (WT) with bacterial colony chemotaxis algorithm and support vector machine (BCC-SVM) algorithm (WT-BCC-SVM) was proposed in this paper. Besides, we aimed to identify different kinds of pesticide residues on lettuce leaves in a novel and rapid non-destructive way by using fluorescence spectra technology. The fluorescence spectral data of 150 lettuce leaf samples of five different kinds of pesticide residues on the surface of lettuce were obtained using Cary Eclipse fluorescence spectrometer. Standard normalized variable detrending (SNV detrending), Savitzky-Golay coupled with Standard normalized variable detrending (SG-SNV detrending) were used to preprocess the raw spectra, respectively. Bacterial colony chemotaxis combined with support vector machine (BCC-SVM) and support vector machine (SVM) classification models were established based on full spectra (FS) and wavelet transform characteristics (WTC), respectively. Moreover, WTC were selected by WT. The results showed that the accuracy of training set, calibration set and the prediction set of the best optimal classification model (SG-SNV detrending-WT-BCC-SVM) were 100%, 98% and 93.33%, respectively. In addition, the results indicated that it was feasible to use WT-BCC-SVM to establish diagnostic model of different kinds of pesticide residues on lettuce leaves.

  6. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.

    Science.gov (United States)

    Manavalan, Balachandran; Shin, Tae H; Lee, Gwang

    2018-01-01

    Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.

  7. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Balachandran Manavalan

    2018-03-01

    Full Text Available Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.

  8. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming.

    Science.gov (United States)

    Zhang, Huiling; Huang, Qingsheng; Bei, Zhendong; Wei, Yanjie; Floudas, Christodoulos A

    2016-03-01

    In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. © 2016 Wiley Periodicals, Inc.

  9. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.

    Science.gov (United States)

    Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun

    2016-11-01

    The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the "intermediate" decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys .

  10. Identification of NAD interacting residues in proteins

    Directory of Open Access Journals (Sweden)

    Raghava Gajendra PS

    2010-03-01

    Full Text Available Abstract Background Small molecular cofactors or ligands play a crucial role in the proper functioning of cells. Accurate annotation of their target proteins and binding sites is required for the complete understanding of reaction mechanisms. Nicotinamide adenine dinucleotide (NAD+ or NAD is one of the most commonly used organic cofactors in living cells, which plays a critical role in cellular metabolism, storage and regulatory processes. In the past, several NAD binding proteins (NADBP have been reported in the literature, which are responsible for a wide-range of activities in the cell. Attempts have been made to derive a rule for the binding of NAD+ to its target proteins. However, so far an efficient model could not be derived due to the time consuming process of structure determination, and limitations of similarity based approaches. Thus a sequence and non-similarity based method is needed to characterize the NAD binding sites to help in the annotation. In this study attempts have been made to predict NAD binding proteins and their interacting residues (NIRs from amino acid sequence using bioinformatics tools. Results We extracted 1556 proteins chains from 555 NAD binding proteins whose structure is available in Protein Data Bank. Then we removed all redundant protein chains and finally obtained 195 non-redundant NAD binding protein chains, where no two chains have more than 40% sequence identity. In this study all models were developed and evaluated using five-fold cross validation technique on the above dataset of 195 NAD binding proteins. While certain type of residues are preferred (e.g. Gly, Tyr, Thr, His in NAD interaction, residues like Ala, Glu, Leu, Lys are not preferred. A support vector machine (SVM based method has been developed using various window lengths of amino acid sequence for predicting NAD interacting residues and obtained maximum Matthew's correlation coefficient (MCC 0.47 with accuracy 74.13% at window length 17

  11. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

    Science.gov (United States)

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-05-22

    Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at http://svm-fold.c2b2.columbia.edu. Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach

  12. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    Science.gov (United States)

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  13. A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks.

    Science.gov (United States)

    Mei, Suyu; Zhu, Hao

    2015-01-26

    Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.

  14. MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction.

    Science.gov (United States)

    Li, Nan; Ainsworth, Richard I; Wu, Meixin; Ding, Bo; Wang, Wei

    2016-03-15

    MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. The program is available at http://wanglab.ucsd.edu/MIEC-SVM CONTACT: : wei-wang@ucsd.edu Supplementary data available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    Science.gov (United States)

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Prediction of protein-protein interactions between viruses and human by an SVM model

    Directory of Open Access Journals (Sweden)

    Cui Guangyu

    2012-05-01

    Full Text Available Abstract Background Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species. Results We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV and hepatitis C virus (HCV, our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO annotations of proteins, we predicted new interactions between virus proteins and human proteins. Conclusions Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1 it enables a prediction model to achieve a better performance than other representations, (2 it generates feature vectors of fixed length regardless of the sequence length, and (3 the same representation is applicable to different types of proteins.

  17. FaaPred: a SVM-based prediction method for fungal adhesins and adhesin-like proteins.

    Directory of Open Access Journals (Sweden)

    Jayashree Ramana

    Full Text Available Adhesion constitutes one of the initial stages of infection in microbial diseases and is mediated by adhesins. Hence, identification and comprehensive knowledge of adhesins and adhesin-like proteins is essential to understand adhesin mediated pathogenesis and how to exploit its therapeutic potential. However, the knowledge about fungal adhesins is rudimentary compared to that of bacterial adhesins. In addition to host cell attachment and mating, the fungal adhesins play a significant role in homotypic and xenotypic aggregation, foraging and biofilm formation. Experimental identification of fungal adhesins is labor- as well as time-intensive. In this work, we present a Support Vector Machine (SVM based method for the prediction of fungal adhesins and adhesin-like proteins. The SVM models were trained with different compositional features, namely, amino acid, dipeptide, multiplet fractions, charge and hydrophobic compositions, as well as PSI-BLAST derived PSSM matrices. The best classifiers are based on compositional properties as well as PSSM and yield an overall accuracy of 86%. The prediction method based on best classifiers is freely accessible as a world wide web based server at http://bioinfo.icgeb.res.in/faap. This work will aid rapid and rational identification of fungal adhesins, expedite the pace of experimental characterization of novel fungal adhesins and enhance our knowledge about role of adhesins in fungal infections.

  18. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information

    Directory of Open Access Journals (Sweden)

    Panwar Bharat

    2013-02-01

    Full Text Available Abstract Background The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. Results In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL. It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i vitamin interacting residues (VIRs, (ii vitamin-A interacting residues (VAIRs, (iii vitamin-B interacting residues (VBIRs and (iv pyridoxal-5-phosphate (vitamin B6 interacting residues (PLPIRs have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM features of protein sequences. Finally, we selected best performing SVM modules and

  19. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information.

    Science.gov (United States)

    Panwar, Bharat; Gupta, Sudheer; Raghava, Gajendra P S

    2013-02-07

    The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0

  20. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  1. Identification of mannose interacting residues using local composition.

    Directory of Open Access Journals (Sweden)

    Sandhya Agarwal

    Full Text Available BACKGROUND: Mannose binding proteins (MBPs play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs in order to understand mechanism of recognition of pathogens by MBPs. RESULTS: This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1 main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2 realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (http://www.imtech.res.in/raghava/premier/. CONCLUSIONS: Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function

  2. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    Science.gov (United States)

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  3. Generalized SMO algorithm for SVM-based multitask learning.

    Science.gov (United States)

    Cai, Feng; Cherkassky, Vladimir

    2012-06-01

    Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.

  4. SVM and SVM Ensembles in Breast Cancer Prediction.

    Science.gov (United States)

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  5. SVM and SVM Ensembles in Breast Cancer Prediction.

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    Full Text Available Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  6. Uniform design based SVM model selection for face recognition

    Science.gov (United States)

    Li, Weihong; Liu, Lijuan; Gong, Weiguo

    2010-02-01

    Support vector machine (SVM) has been proved to be a powerful tool for face recognition. The generalization capacity of SVM depends on the model with optimal hyperparameters. The computational cost of SVM model selection results in application difficulty in face recognition. In order to overcome the shortcoming, we utilize the advantage of uniform design--space filling designs and uniformly scattering theory to seek for optimal SVM hyperparameters. Then we propose a face recognition scheme based on SVM with optimal model which obtained by replacing the grid and gradient-based method with uniform design. The experimental results on Yale and PIE face databases show that the proposed method significantly improves the efficiency of SVM model selection.

  7. gkmSVM: an R package for gapped-kmer SVM.

    Science.gov (United States)

    Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A

    2016-07-15

    We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm mghandi@gmail.com or mbeer@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Protein structure based prediction of catalytic residues.

    Science.gov (United States)

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  9. Density-based penalty parameter optimization on C-SVM.

    Science.gov (United States)

    Liu, Yun; Lian, Jie; Bartolacci, Michael R; Zeng, Qing-An

    2014-01-01

    The support vector machine (SVM) is one of the most widely used approaches for data classification and regression. SVM achieves the largest distance between the positive and negative support vectors, which neglects the remote instances away from the SVM interface. In order to avoid a position change of the SVM interface as the result of an error system outlier, C-SVM was implemented to decrease the influences of the system's outliers. Traditional C-SVM holds a uniform parameter C for both positive and negative instances; however, according to the different number proportions and the data distribution, positive and negative instances should be set with different weights for the penalty parameter of the error terms. Therefore, in this paper, we propose density-based penalty parameter optimization of C-SVM. The experiential results indicated that our proposed algorithm has outstanding performance with respect to both precision and recall.

  10. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area.

    Directory of Open Access Journals (Sweden)

    Genki Terashi

    Full Text Available Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue-residue physical contacts rather than the three-dimensional (3D coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align, which uses the residue-residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1 agreement with the gold standard alignment, (2 alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3 consistency of the multiple alignments, and (4 classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins

  11. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area.

    Science.gov (United States)

    Terashi, Genki; Takeda-Shitaka, Mayuko

    2015-01-01

    Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue-residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue-residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both

  12. The generalization ability of online SVM classification based on Markov sampling.

    Science.gov (United States)

    Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang

    2015-03-01

    In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.

  13. Efficient identification of critical residues based only on protein structure by network analysis.

    Directory of Open Access Journals (Sweden)

    Michael P Cusack

    2007-05-01

    Full Text Available Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins.

  14. Functional assignment to JEV proteins using SVM.

    Science.gov (United States)

    Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep

    2008-01-01

    Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP).

  15. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  16. Signal peptide discrimination and cleavage site identification using SVM and NN.

    Science.gov (United States)

    Kazemian, H B; Yusuf, S A; White, K

    2014-02-01

    About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.

  17. Throughput Maximization Using an SVM for Multi-Class Hypothesis-Based Spectrum Sensing in Cognitive Radio

    Directory of Open Access Journals (Sweden)

    Sana Ullah Jan

    2018-03-01

    Full Text Available A framework of spectrum sensing with a multi-class hypothesis is proposed to maximize the achievable throughput in cognitive radio networks. The energy range of a sensing signal under the hypothesis that the primary user is absent (in a conventional two-class hypothesis is further divided into quantized regions, whereas the hypothesis that the primary user is present is conserved. The non-radio frequency energy harvesting-equiped secondary user transmits, when the primary user is absent, with transmission power based on the hypothesis result (the energy level of the sensed signal and the residual energy in the battery: the lower the energy of the received signal, the higher the transmission power, and vice versa. Conversely, the lower is the residual energy in the node, the lower is the transmission power. This technique increases the throughput of a secondary link by providing a higher number of transmission events, compared to the conventional two-class hypothesis. Furthermore, transmission with low power for higher energy levels in the sensed signal reduces the probability of interference with primary users if, for instance, detection was missed. The familiar machine learning algorithm known as a support vector machine (SVM is used in a one-versus-rest approach to classify the input signal into predefined classes. The input signal to the SVM is composed of three statistical features extracted from the sensed signal and a number ranging from 0 to 100 representing the percentage of residual energy in the node’s battery. To increase the generalization of the classifier, k-fold cross-validation is utilized in the training phase. The experimental results show that an SVM with the given features performs satisfactorily for all kernels, but an SVM with a polynomial kernel outperforms linear and radial-basis function kernels in terms of accuracy. Furthermore, the proposed multi-class hypothesis achieves higher throughput compared to the

  18. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    Science.gov (United States)

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  19. SVM-dependent pairwise HMM: an application to protein pairwise alignments.

    Science.gov (United States)

    Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F

    2017-12-15

    Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information

    Science.gov (United States)

    Kumar, Ravindra; Jain, Sohni; Kumari, Bandana; Kumar, Manish

    2014-01-01

    The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server. PMID:24897370

  1. Rigid Residue Scan Simulations Systematically Reveal Residue Entropic Roles in Protein Allostery.

    Directory of Open Access Journals (Sweden)

    Robert Kalescky

    2016-04-01

    Full Text Available Intra-protein information is transmitted over distances via allosteric processes. This ubiquitous protein process allows for protein function changes due to ligand binding events. Understanding protein allostery is essential to understanding protein functions. In this study, allostery in the second PDZ domain (PDZ2 in the human PTP1E protein is examined as model system to advance a recently developed rigid residue scan method combining with configurational entropy calculation and principal component analysis. The contributions from individual residues to whole-protein dynamics and allostery were systematically assessed via rigid body simulations of both unbound and ligand-bound states of the protein. The entropic contributions of individual residues to whole-protein dynamics were evaluated based on covariance-based correlation analysis of all simulations. The changes of overall protein entropy when individual residues being held rigid support that the rigidity/flexibility equilibrium in protein structure is governed by the La Châtelier's principle of chemical equilibrium. Key residues of PDZ2 allostery were identified with good agreement with NMR studies of the same protein bound to the same peptide. On the other hand, the change of entropic contribution from each residue upon perturbation revealed intrinsic differences among all the residues. The quasi-harmonic and principal component analyses of simulations without rigid residue perturbation showed a coherent allosteric mode from unbound and bound states, respectively. The projection of simulations with rigid residue perturbation onto coherent allosteric modes demonstrated the intrinsic shifting of ensemble distributions supporting the population-shift theory of protein allostery. Overall, the study presented here provides a robust and systematic approach to estimate the contribution of individual residue internal motion to overall protein dynamics and allostery.

  2. A SVM bases AI design for interactive gaming

    OpenAIRE

    Jiang, Yang; Jiang, Jianmin; Palmer, Ian

    2008-01-01

    Interactive gaming requires automatic processing on large volume of random data produced by players on spot, such as shooting, football kicking, boxing etc. In this paper, we describe an artificial intelligence approach in processing such random data for interactive gaming by using a one-class support vector machine (OC-SVM). In comparison with existing techniques, our OC-SVM based interactive gaming design has the features of: (i): high speed processing, providing instant response to the pla...

  3. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

    Science.gov (United States)

    Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2012-05-10

    RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based

  4. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    Science.gov (United States)

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  5. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    Directory of Open Access Journals (Sweden)

    Yi Zhang

    2015-01-01

    Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  6. Accurate Multisteps Traffic Flow Prediction Based on SVM

    Directory of Open Access Journals (Sweden)

    Zhang Mingheng

    2013-01-01

    Full Text Available Accurate traffic flow prediction is prerequisite and important for realizing intelligent traffic control and guidance, and it is also the objective requirement for intelligent traffic management. Due to the strong nonlinear, stochastic, time-varying characteristics of urban transport system, artificial intelligence methods such as support vector machine (SVM are now receiving more and more attentions in this research field. Compared with the traditional single-step prediction method, the multisteps prediction has the ability that can predict the traffic state trends over a certain period in the future. From the perspective of dynamic decision, it is far important than the current traffic condition obtained. Thus, in this paper, an accurate multi-steps traffic flow prediction model based on SVM was proposed. In which, the input vectors were comprised of actual traffic volume and four different types of input vectors were compared to verify their prediction performance with each other. Finally, the model was verified with actual data in the empirical analysis phase and the test results showed that the proposed SVM model had a good ability for traffic flow prediction and the SVM-HPT model outperformed the other three models for prediction.

  7. Identification of eggs from different production systems based on hyperspectra and CS-SVM.

    Science.gov (United States)

    Sun, J; Cong, S L; Mao, H P; Zhou, X; Wu, X H; Zhang, X D

    2017-06-01

    1. To identify the origin of table eggs more accurately, a method based on hyperspectral imaging technology was studied. 2. The hyperspectral data of 200 samples of intensive and extensive eggs were collected. Standard normalised variables combined with a Savitzky-Golay were used to eliminate noise, then stepwise regression (SWR) was used for feature selection. Grid search algorithm (GS), genetic search algorithm (GA), particle swarm optimisation algorithm (PSO) and cuckoo search algorithm (CS) were applied by support vector machine (SVM) methods to establish an SVM identification model with the optimal parameters. The full spectrum data and the data after feature selection were the input of the model, while egg category was the output. 3. The SWR-CS-SVM model performed better than the other models, including SWR-GS-SVM, SWR-GA-SVM, SWR-PSO-SVM and others based on full spectral data. The training and test classification accuracy of the SWR-CS-SVM model were respectively 99.3% and 96%. 4. SWR-CS-SVM proved effective for identifying egg varieties and could also be useful for the non-destructive identification of other types of egg.

  8. A method of neighbor classes based SVM classification for optical printed Chinese character recognition.

    Science.gov (United States)

    Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

    2013-01-01

    In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.

  9. A support vector machine (SVM) based voltage stability classifier

    Energy Technology Data Exchange (ETDEWEB)

    Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

    2007-07-01

    Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.

  10. Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model.

    Science.gov (United States)

    Xianfang, Wang; Junmei, Wang; Xiaolei, Wang; Yue, Zhang

    2017-01-01

    The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server.

  11. Feature selection based on SVM significance maps for classification of dementia

    NARCIS (Netherlands)

    E.E. Bron (Esther); M. Smits (Marion); J.C. van Swieten (John); W.J. Niessen (Wiro); S. Klein (Stefan)

    2014-01-01

    textabstractSupport vector machine significance maps (SVM p-maps) previously showed clusters of significantly different voxels in dementiarelated brain regions. We propose a novel feature selection method for classification of dementia based on these p-maps. In our approach, the SVM p-maps are

  12. Research on Classification of Chinese Text Data Based on SVM

    Science.gov (United States)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  13. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

    Science.gov (United States)

    Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

    2013-03-01

    Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Predication of Crane Condition Parameters Based on SVM and AR

    International Nuclear Information System (INIS)

    Xu Xiuzhong; Hu Xiong; Zhou Congxiao

    2011-01-01

    Through statistic analysis of vibration signals of motor on the container crane hoisting mechanism in a port, the feature vectors with vibration are obtained. Through data preprocessing and training data, Training models of condition parameters based on support vector machine (SVM) are established. The testing data of condition monitoring parameters can be predicted by the training models. During training the models, the penalty parameter and kernel function of model are optimized by cross validation. In order to analysis the accurate of SVM model, autoregressive model is used to predict the trend of vibration. The research showed the predicted results of model using SVM are better than the results by autoregressive (AR) modeling.

  15. Lamb Wave Damage Quantification Using GA-Based LS-SVM

    Directory of Open Access Journals (Sweden)

    Fuqiang Sun

    2017-06-01

    Full Text Available Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM and a genetic algorithm (GA. Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification.

  16. Lamb Wave Damage Quantification Using GA-Based LS-SVM.

    Science.gov (United States)

    Sun, Fuqiang; Wang, Ning; He, Jingjing; Guan, Xuefei; Yang, Jinsong

    2017-06-12

    Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE) for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM) and a genetic algorithm (GA). Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification.

  17. Laos Organization Name Using Cascaded Model Based on SVM and CRF

    Directory of Open Access Journals (Sweden)

    Duan Shaopeng

    2017-01-01

    Full Text Available According to the characteristics of Laos organization name, this paper proposes a two layer model based on conditional random field (CRF and support vector machine (SVM for Laos organization name recognition. A layer of model uses CRF to recognition simple organization name, and the result is used to support the decision of the second level. Based on the driving method, the second layer uses SVM and CRF to recognition the complicated organization name. Finally, the results of the two levels are combined, And by a subsequent treatment to correct results of low confidence recognition. The results show that this approach based on SVM and CRF is efficient in recognizing organization name through open test for real linguistics, and the recalling rate achieve 80. 83%and the precision rate achieves 82. 75%.

  18. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.

    Science.gov (United States)

    Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei

    2016-09-02

    Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.

  19. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons

    Directory of Open Access Journals (Sweden)

    Yi Long

    2016-09-01

    Full Text Available Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM optimized by particle swarm optimization (PSO to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz, a three-layer wavelet packet analysis (WPA is used for feature extraction, after which, the kernel principal component analysis (kPCA is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.

  20. SVM Classifiers: The Objects Identification on the Base of Their Hyperspectral Features

    Directory of Open Access Journals (Sweden)

    Demidova Liliya

    2017-01-01

    Full Text Available The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers on the base of the modified PSO algorithm, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.

  1. A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM.

    Science.gov (United States)

    Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei; Song, Houbing

    2018-01-15

    Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model's performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM's parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models' performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors.

  2. [Measurement of soil organic matter and available K based on SPA-LS-SVM].

    Science.gov (United States)

    Zhang, Hai-Liang; Liu, Xue-Mei; He, Yong

    2014-05-01

    Visible and short wave infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement of soil organic matter (OM) and available potassium (K). Four types of pretreatments including smoothing, SNV, MSC and SG smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares regression (PLSR) and least squares-support vector machine (LS-SVM) models were implemented for calibration models. The LS-SVM model was built by using characteristic wavelength based on successive projections algorithm (SPA). Simultaneously, the performance of LSSVM models was compared with PLSR models. The results indicated that LS-SVM models using characteristic wavelength as inputs based on SPA outperformed PLSR models. The optimal SPA-LS-SVM models were achieved, and the correlation coefficient (r), and RMSEP were 0. 860 2 and 2. 98 for OM and 0. 730 5 and 15. 78 for K, respectively. The results indicated that visible and short wave near infrared spectroscopy (Vis/SW-NIRS) (325 approximately 1 075 nm) combined with LS-SVM based on SPA could be utilized as a precision method for the determination of soil properties.

  3. A feature-based approach to modeling protein-protein interaction hot spots.

    Science.gov (United States)

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-05-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.

  4. [Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].

    Science.gov (United States)

    Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong

    2013-03-01

    Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.

  5. Human Walking Pattern Recognition Based on KPCA and SVM with Ground Reflex Pressure Signal

    Directory of Open Access Journals (Sweden)

    Zhaoqin Peng

    2013-01-01

    Full Text Available Algorithms based on the ground reflex pressure (GRF signal obtained from a pair of sensing shoes for human walking pattern recognition were investigated. The dimensionality reduction algorithms based on principal component analysis (PCA and kernel principal component analysis (KPCA for walking pattern data compression were studied in order to obtain higher recognition speed. Classifiers based on support vector machine (SVM, SVM-PCA, and SVM-KPCA were designed, and the classification performances of these three kinds of algorithms were compared using data collected from a person who was wearing the sensing shoes. Experimental results showed that the algorithm fusing SVM and KPCA had better recognition performance than the other two methods. Experimental outcomes also confirmed that the sensing shoes developed in this paper can be employed for automatically recognizing human walking pattern in unlimited environments which demonstrated the potential application in the control of exoskeleton robots.

  6. Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.

    Science.gov (United States)

    Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria

    2018-04-18

    Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to

  7. Diagnosis of Elevator Faults with LS-SVM Based on Optimization by K-CV

    Directory of Open Access Journals (Sweden)

    Zhou Wan

    2015-01-01

    Full Text Available Several common elevator malfunctions were diagnosed with a least square support vector machine (LS-SVM. After acquiring vibration signals of various elevator functions, their energy characteristics and time domain indicators were extracted by theoretically analyzing the optimal wavelet packet, in order to construct a feature vector of malfunctions for identifying causes of the malfunctions as input of LS-SVM. Meanwhile, parameters about LS-SVM were optimized by K-fold cross validation (K-CV. After diagnosing deviated elevator guide rail, deviated shape of guide shoe, abnormal running of tractor, erroneous rope groove of traction sheave, deviated guide wheel, and tension of wire rope, the results suggested that the LS-SVM based on K-CV optimization was one of effective methods for diagnosing elevator malfunctions.

  8. Applications of PCA and SVM-PSO Based Real-Time Face Recognition System

    Directory of Open Access Journals (Sweden)

    Ming-Yuan Shieh

    2014-01-01

    Full Text Available This paper incorporates principal component analysis (PCA with support vector machine-particle swarm optimization (SVM-PSO for developing real-time face recognition systems. The integrated scheme aims to adopt the SVM-PSO method to improve the validity of PCA based image recognition systems on dynamically visual perception. The face recognition for most human-robot interaction applications is accomplished by PCA based method because of its dimensionality reduction. However, PCA based systems are only suitable for processing the faces with the same face expressions and/or under the same view directions. Since the facial feature selection process can be considered as a problem of global combinatorial optimization in machine learning, the SVM-PSO is usually used as an optimal classifier of the system. In this paper, the PSO is used to implement a feature selection, and the SVMs serve as fitness functions of the PSO for classification problems. Experimental results demonstrate that the proposed method simplifies features effectively and obtains higher classification accuracy.

  9. Settlement Prediction of Road Soft Foundation Using a Support Vector Machine (SVM Based on Measured Data

    Directory of Open Access Journals (Sweden)

    Yu Huiling

    2016-01-01

    Full Text Available The suppor1t vector machine (SVM is a relatively new artificial intelligence technique which is increasingly being applied to geotechnical problems and is yielding encouraging results. SVM is a new machine learning method based on the statistical learning theory. A case study based on road foundation engineering project shows that the forecast results are in good agreement with the measured data. The SVM model is also compared with BP artificial neural network model and traditional hyperbola method. The prediction results indicate that the SVM model has a better prediction ability than BP neural network model and hyperbola method. Therefore, settlement prediction based on SVM model can reflect actual settlement process more correctly. The results indicate that it is effective and feasible to use this method and the nonlinear mapping relation between foundation settlement and its influence factor can be expressed well. It will provide a new method to predict foundation settlement.

  10. Intelligent Recognition of Lung Nodule Combining Rule-based and C-SVM Classifiers

    Directory of Open Access Journals (Sweden)

    Bin Li

    2012-02-01

    Full Text Available Computer-aided detection(CAD system for lung nodules plays the important role in the diagnosis of lung cancer. In this paper, an improved intelligent recognition method of lung nodule in HRCT combing rule-based and cost-sensitive support vector machine(C-SVM classifiers is proposed for detecting both solid nodules and ground-glass opacity(GGO nodules(part solid and nonsolid. This method consists of several steps. Firstly, segmentation of regions of interest(ROIs, including pulmonary parenchyma and lung nodule candidates, is a difficult task. On one side, the presence of noise lowers the visibility of low-contrast objects. On the other side, different types of nodules, including small nodules, nodules connecting to vasculature or other structures, part-solid or nonsolid nodules, are complex, noisy, weak edge or difficult to define the boundary. In order to overcome the difficulties of obvious boundary-leak and slow evolvement speed problem in segmentatioin of weak edge, an overall segmentation method is proposed, they are: the lung parenchyma is extracted based on threshold and morphologic segmentation method; the image denoising and enhancing is realized by nonlinear anisotropic diffusion filtering(NADF method; candidate pulmonary nodules are segmented by the improved C-V level set method, in which the segmentation result of EM-based fuzzy threshold method is used as the initial contour of active contour model and a constrained energy term is added into the PDE of level set function. Then, lung nodules are classified by using the intelligent classifiers combining rules and C-SVM. Rule-based classification is first used to remove easily dismissible nonnodule objects, then C-SVM classification are used to further classify nodule candidates and reduce the number of false positive(FP objects. In order to increase the efficiency of SVM, an improved training method is used to train SVM, which uses the grid search method to search the optimal

  11. Intelligent Recognition of Lung Nodule Combining Rule-based and C-SVM Classifiers

    Directory of Open Access Journals (Sweden)

    Bin Li

    2011-10-01

    Full Text Available Computer-aided detection(CAD system for lung nodules plays the important role in the diagnosis of lung cancer. In this paper, an improved intelligent recognition method of lung nodule in HRCT combing rule-based and costsensitive support vector machine(C-SVM classifiers is proposed for detecting both solid nodules and ground-glass opacity(GGO nodules(part solid and nonsolid. This method consists of several steps. Firstly, segmentation of regions of interest(ROIs, including pulmonary parenchyma and lung nodule candidates, is a difficult task. On one side, the presence of noise lowers the visibility of low-contrast objects. On the other side, different types of nodules, including small nodules, nodules connecting to vasculature or other structures, part-solid or nonsolid nodules, are complex, noisy, weak edge or difficult to define the boundary. In order to overcome the difficulties of obvious boundary-leak and slow evolvement speed problem in segmentatioin of weak edge, an overall segmentation method is proposed, they are: the lung parenchyma is extracted based on threshold and morphologic segmentation method; the image denoising and enhancing is realized by nonlinear anisotropic diffusion filtering(NADF method;candidate pulmonary nodules are segmented by the improved C-V level set method, in which the segmentation result of EM-based fuzzy threshold method is used as the initial contour of active contour model and a constrained energy term is added into the PDE of level set function. Then, lung nodules are classified by using the intelligent classifiers combining rules and C-SVM. Rule-based classification is first used to remove easily dismissible nonnodule objects, then C-SVM classification are used to further classify nodule candidates and reduce the number of false positive(FP objects. In order to increase the efficiency of SVM, an improved training method is used to train SVM, which uses the grid search method to search the optimal parameters

  12. HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information

    Directory of Open Access Journals (Sweden)

    Hu Jianjun

    2011-05-01

    Full Text Available Abstract Background Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Results Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM. The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone

  13. Isoelectric Point, Electric Charge, and Nomenclature of the Acid-Base Residues of Proteins

    Science.gov (United States)

    Maldonado, Andres A.; Ribeiro, Joao M.; Sillero, Antonio

    2010-01-01

    The main object of this work is to present the pedagogical usefulness of the theoretical methods, developed in this laboratory, for the determination of the isoelectric point (pI) and the net electric charge of proteins together with some comments on the naming of the acid-base residues of proteins. (Contains 8 figures and 4 tables.)

  14. Water dynamics clue to key residues in protein folding

    International Nuclear Information System (INIS)

    Gao, Meng; Zhu, Huaiqiu; Yao, Xin-Qiu; She, Zhen-Su

    2010-01-01

    A computational method independent of experimental protein structure information is proposed to recognize key residues in protein folding, from the study of hydration water dynamics. Based on all-atom molecular dynamics simulation, two key residues are recognized with distinct water dynamical behavior in a folding process of the Trp-cage protein. The identified key residues are shown to play an essential role in both 3D structure and hydrophobic-induced collapse. With observations on hydration water dynamics around key residues, a dynamical pathway of folding can be interpreted.

  15. Quantification of Drive-Response Relationships Between Residues During Protein Folding.

    Science.gov (United States)

    Qi, Yifei; Im, Wonpil

    2013-08-13

    Mutual correlation and cooperativity are commonly used to describe residue-residue interactions in protein folding/function. However, these metrics do not provide any information on the causality relationships between residues. Such drive-response relationships are poorly studied in protein folding/function and difficult to measure experimentally due to technical limitations. In this study, using the information theory transfer entropy (TE) that provides a direct measurement of causality between two times series, we have quantified the drive-response relationships between residues in the folding/unfolding processes of four small proteins generated by molecular dynamics simulations. Instead of using a time-averaged single TE value, the time-dependent TE is measured with the Q-scores based on residue-residue contacts and with the statistical significance analysis along the folding/unfolding processes. The TE analysis is able to identify the driving and responding residues that are different from the highly correlated residues revealed by the mutual information analysis. In general, the driving residues have more regular secondary structures, are more buried, and show greater effects on the protein stability as well as folding and unfolding rates. In addition, the dominant driving and responding residues from the TE analysis on the whole trajectory agree with those on a single folding event, demonstrating that the drive-response relationships are preserved in the non-equilibrium process. Our study provides detailed insights into the protein folding process and has potential applications in protein engineering and interpretation of time-dependent residue-based experimental observables for protein function.

  16. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  17. The 2nu-SVM: A Cost-Sensitive Extension of the nu-SVM

    National Research Council Canada - National Science Library

    Davenport, Mark A

    2005-01-01

    .... In this report we review cost-sensitive extensions of standard support vector machines (SVMs). In particular, we describe cost-sensitive extensions of the C-SVM and the nu-SVM, which we denote the 2C-SVM and 2nu-SVM respectively...

  18. DSP Based Direct Torque Control of Permanent Magnet Synchronous Motor (PMSM) using Space Vector Modulation (DTC-SVM)

    DEFF Research Database (Denmark)

    Swierczynski, Dariusz; Kazmierkowski, Marian P.; Blaabjerg, Frede

    2002-01-01

    DSP Based Direct Torque Control of Permanent Magnet Synchronous Motor (PMSM) using Space Vector Modulation (DTC-SVM)......DSP Based Direct Torque Control of Permanent Magnet Synchronous Motor (PMSM) using Space Vector Modulation (DTC-SVM)...

  19. Protein-Ligand Empirical Interaction Components for Virtual Screening.

    Science.gov (United States)

    Yan, Yuna; Wang, Weijun; Sun, Zhaoxi; Zhang, John Z H; Ji, Changge

    2017-08-28

    A major shortcoming of empirical scoring functions is that they often fail to predict binding affinity properly. Removing false positives of docking results is one of the most challenging works in structure-based virtual screening. Postdocking filters, making use of all kinds of experimental structure and activity information, may help in solving the issue. We describe a new method based on detailed protein-ligand interaction decomposition and machine learning. Protein-ligand empirical interaction components (PLEIC) are used as descriptors for support vector machine learning to develop a classification model (PLEIC-SVM) to discriminate false positives from true positives. Experimentally derived activity information is used for model training. An extensive benchmark study on 36 diverse data sets from the DUD-E database has been performed to evaluate the performance of the new method. The results show that the new method performs much better than standard empirical scoring functions in structure-based virtual screening. The trained PLEIC-SVM model is able to capture important interaction patterns between ligand and protein residues for one specific target, which is helpful in discarding false positives in postdocking filtering.

  20. Learning using privileged information: SVM+ and weighted SVM.

    Science.gov (United States)

    Lapin, Maksim; Hein, Matthias; Schiele, Bernt

    2014-05-01

    Prior knowledge can be used to improve predictive performance of learning algorithms or reduce the amount of data required for training. The same goal is pursued within the learning using privileged information paradigm which was recently introduced by Vapnik et al. and is aimed at utilizing additional information available only at training time-a framework implemented by SVM+. We relate the privileged information to importance weighting and show that the prior knowledge expressible with privileged features can also be encoded by weights associated with every training example. We show that a weighted SVM can always replicate an SVM+ solution, while the converse is not true and we construct a counterexample highlighting the limitations of SVM+. Finally, we touch on the problem of choosing weights for weighted SVMs when privileged features are not available. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Fault diagnosis method based on FFT-RPCA-SVM for Cascaded-Multilevel Inverter.

    Science.gov (United States)

    Wang, Tianzhen; Qi, Jie; Xu, Hao; Wang, Yide; Liu, Lei; Gao, Diju

    2016-01-01

    Thanks to reduced switch stress, high quality of load wave, easy packaging and good extensibility, the cascaded H-bridge multilevel inverter is widely used in wind power system. To guarantee stable operation of system, a new fault diagnosis method, based on Fast Fourier Transform (FFT), Relative Principle Component Analysis (RPCA) and Support Vector Machine (SVM), is proposed for H-bridge multilevel inverter. To avoid the influence of load variation on fault diagnosis, the output voltages of the inverter is chosen as the fault characteristic signals. To shorten the time of diagnosis and improve the diagnostic accuracy, the main features of the fault characteristic signals are extracted by FFT. To further reduce the training time of SVM, the feature vector is reduced based on RPCA that can get a lower dimensional feature space. The fault classifier is constructed via SVM. An experimental prototype of the inverter is built to test the proposed method. Compared to other fault diagnosis methods, the experimental results demonstrate the high accuracy and efficiency of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme.

    Science.gov (United States)

    Leong, Max K; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng

    2017-01-06

    The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r 2  = 0.928-0.988,  = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pK i values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r 2  = 0.967,  = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q 2  = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.

  3. In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches

    Directory of Open Access Journals (Sweden)

    Zhijun Liao

    2016-01-01

    Full Text Available Gamma-aminobutyric acid type-A receptors (GABAARs belong to multisubunit membrane spanning ligand-gated ion channels (LGICs which act as the principal mediators of rapid inhibitory synaptic transmission in the human brain. Therefore, the category prediction of GABAARs just from the protein amino acid sequence would be very helpful for the recognition and research of novel receptors. Based on the proteins’ physicochemical properties, amino acids composition and position, a GABAAR classifier was first constructed using a 188-dimensional (188D algorithm at 90% cd-hit identity and compared with pseudo-amino acid composition (PseAAC and ProtrWeb web-based algorithms for human GABAAR proteins. Then, four classifiers including gradient boosting decision tree (GBDT, random forest (RF, a library for support vector machine (libSVM, and k-nearest neighbor (k-NN were compared on the dataset at cd-hit 40% low identity. This work obtained the highest correctly classified rate at 96.8% and the highest specificity at 99.29%. But the values of sensitivity, accuracy, and Matthew’s correlation coefficient were a little lower than those of PseAAC and ProtrWeb; GBDT and libSVM can make a little better performance than RF and k-NN at the second dataset. In conclusion, a GABAAR classifier was successfully constructed using only the protein sequence information.

  4. VLSI Design of SVM-Based Seizure Detection System With On-Chip Learning Capability.

    Science.gov (United States)

    Feng, Lichen; Li, Zunchao; Wang, Yuanfa

    2018-02-01

    Portable automatic seizure detection system is very convenient for epilepsy patients to carry. In order to make the system on-chip trainable with high efficiency and attain high detection accuracy, this paper presents a very large scale integration (VLSI) design based on the nonlinear support vector machine (SVM). The proposed design mainly consists of a feature extraction (FE) module and an SVM module. The FE module performs the three-level Daubechies discrete wavelet transform to fit the physiological bands of the electroencephalogram (EEG) signal and extracts the time-frequency domain features reflecting the nonstationary signal properties. The SVM module integrates the modified sequential minimal optimization algorithm with the table-driven-based Gaussian kernel to enable efficient on-chip learning. The presented design is verified on an Altera Cyclone II field-programmable gate array and tested using the two publicly available EEG datasets. Experiment results show that the designed VLSI system improves the detection accuracy and training efficiency.

  5. [Identification of varieties of cashmere by Vis/NIR spectroscopy technology based on PCA-SVM].

    Science.gov (United States)

    Wu, Gui-Fang; He, Yong

    2009-06-01

    One mixed algorithm was presented to discriminate cashmere varieties with principal component analysis (PCA) and support vector machine (SVM). Cashmere fiber has such characteristics as threadlike, softness, glossiness and high tensile strength. The quality characters and economic value of each breed of cashmere are very different. In order to safeguard the consumer's rights and guarantee the quality of cashmere product, quickly, efficiently and correctly identifying cashmere has significant meaning to the production and transaction of cashmere material. The present research adopts Vis/NIRS spectroscopy diffuse techniques to collect the spectral data of cashmere. The near infrared fingerprint of cashmere was acquired by principal component analysis (PCA), and support vector machine (SVM) methods were used to further identify the cashmere material. The result of PCA indicated that the score map made by the scores of PC1, PC2 and PC3 was used, and 10 principal components (PCs) were selected as the input of support vector machine (SVM) based on the reliabilities of PCs of 99.99%. One hundred cashmere samples were used for calibration and the remaining 75 cashmere samples were used for validation. A one-against-all multi-class SVM model was built, the capabilities of SVM with different kernel function were comparatively analyzed, and the result showed that SVM possessing with the Gaussian kernel function has the best identification capabilities with the accuracy of 100%. This research indicated that the data mining method of PCA-SVM has a good identification effect, and can work as a new method for rapid identification of cashmere material varieties.

  6. Computational prediction of protein hot spot residues.

    Science.gov (United States)

    Morrow, John Kenneth; Zhang, Shuxing

    2012-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.

  7. Combined Forecasting Method of Landslide Deformation Based on MEEMD, Approximate Entropy, and WLS-SVM

    Directory of Open Access Journals (Sweden)

    Shaofeng Xie

    2017-01-01

    Full Text Available Given the chaotic characteristics of the time series of landslides, a new method based on modified ensemble empirical mode decomposition (MEEMD, approximate entropy and the weighted least square support vector machine (WLS-SVM was proposed. The method mainly started from the chaotic sequence of time-frequency analysis and improved the model performance as follows: first a deformation time series was decomposed into a series of subsequences with significantly different complexity using MEEMD. Then the approximate entropy method was used to generate a new subsequence for the combination of subsequences with similar complexity, which could effectively concentrate the component feature information and reduce the computational scale. Finally the WLS-SVM prediction model was established for each new subsequence. At the same time, phase space reconstruction theory and the grid search method were used to select the input dimension and the optimal parameters of the model, and then the superposition of each predicted value was the final forecasting result. Taking the landslide deformation data of Danba as an example, the experiments were carried out and compared with wavelet neural network, support vector machine, least square support vector machine and various combination schemes. The experimental results show that the algorithm has high prediction accuracy. It can ensure a better prediction effect even in landslide deformation periods of rapid fluctuation, and it can also better control the residual value and effectively reduce the error interval.

  8. NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues

    Directory of Open Access Journals (Sweden)

    Edward S. C. Shih

    2015-03-01

    Full Text Available Protein-protein docking (PPD predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

  9. Power quality events recognition using a SVM-based method

    Energy Technology Data Exchange (ETDEWEB)

    Cerqueira, Augusto Santiago; Ferreira, Danton Diego; Ribeiro, Moises Vidal; Duque, Carlos Augusto [Department of Electrical Circuits, Federal University of Juiz de Fora, Campus Universitario, 36036 900, Juiz de Fora MG (Brazil)

    2008-09-15

    In this paper, a novel SVM-based method for power quality event classification is proposed. A simple approach for feature extraction is introduced, based on the subtraction of the fundamental component from the acquired voltage signal. The resulting signal is presented to a support vector machine for event classification. Results from simulation are presented and compared with two other methods, the OTFR and the LCEC. The proposed method shown an improved performance followed by a reasonable computational cost. (author)

  10. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods

    Directory of Open Access Journals (Sweden)

    Pontil Massimiliano

    2009-10-01

    Full Text Available Abstract Background Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots" at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. Results We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. Conclusion We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been

  11. A tool for calculating binding-site residues on proteins from PDB structures

    Directory of Open Access Journals (Sweden)

    Hu Jing

    2009-08-01

    Full Text Available Abstract Background In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB that consists of the protein of interest and its interacting partner(s and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. Results In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. Conclusion The developed tool is very useful for the research on protein binding site analysis and prediction.

  12. SVM-based glioma grading. Optimization by feature reduction analysis

    International Nuclear Information System (INIS)

    Zoellner, Frank G.; Schad, Lothar R.; Emblem, Kyrre E.; Harvard Medical School, Boston, MA; Oslo Univ. Hospital

    2012-01-01

    We investigated the predictive power of feature reduction analysis approaches in support vector machine (SVM)-based classification of glioma grade. In 101 untreated glioma patients, three analytic approaches were evaluated to derive an optimal reduction in features; (i) Pearson's correlation coefficients (PCC), (ii) principal component analysis (PCA) and (iii) independent component analysis (ICA). Tumor grading was performed using a previously reported SVM approach including whole-tumor cerebral blood volume (CBV) histograms and patient age. Best classification accuracy was found using PCA at 85% (sensitivity = 89%, specificity = 84%) when reducing the feature vector from 101 (100-bins rCBV histogram + age) to 3 principal components. In comparison, classification accuracy by PCC was 82% (89%, 77%, 2 dimensions) and 79% by ICA (87%, 75%, 9 dimensions). For improved speed (up to 30%) and simplicity, feature reduction by all three methods provided similar classification accuracy to literature values (∝87%) while reducing the number of features by up to 98%. (orig.)

  13. SVM-based glioma grading. Optimization by feature reduction analysis

    Energy Technology Data Exchange (ETDEWEB)

    Zoellner, Frank G.; Schad, Lothar R. [University Medical Center Mannheim, Heidelberg Univ., Mannheim (Germany). Computer Assisted Clinical Medicine; Emblem, Kyrre E. [Massachusetts General Hospital, Charlestown, A.A. Martinos Center for Biomedical Imaging, Boston MA (United States). Dept. of Radiology; Harvard Medical School, Boston, MA (United States); Oslo Univ. Hospital (Norway). The Intervention Center

    2012-11-01

    We investigated the predictive power of feature reduction analysis approaches in support vector machine (SVM)-based classification of glioma grade. In 101 untreated glioma patients, three analytic approaches were evaluated to derive an optimal reduction in features; (i) Pearson's correlation coefficients (PCC), (ii) principal component analysis (PCA) and (iii) independent component analysis (ICA). Tumor grading was performed using a previously reported SVM approach including whole-tumor cerebral blood volume (CBV) histograms and patient age. Best classification accuracy was found using PCA at 85% (sensitivity = 89%, specificity = 84%) when reducing the feature vector from 101 (100-bins rCBV histogram + age) to 3 principal components. In comparison, classification accuracy by PCC was 82% (89%, 77%, 2 dimensions) and 79% by ICA (87%, 75%, 9 dimensions). For improved speed (up to 30%) and simplicity, feature reduction by all three methods provided similar classification accuracy to literature values ({proportional_to}87%) while reducing the number of features by up to 98%. (orig.)

  14. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  15. Abnormal Gait Behavior Detection for Elderly Based on Enhanced Wigner-Ville Analysis and Cloud Incremental SVM Learning

    Directory of Open Access Journals (Sweden)

    Jian Luo

    2016-01-01

    Full Text Available A cloud based health care system is proposed in this paper for the elderly by providing abnormal gait behavior detection, classification, online diagnosis, and remote aid service. Intelligent mobile terminals with triaxial acceleration sensor embedded are used to capture the movement and ambulation information of elderly. The collected signals are first enhanced by a Kalman filter. And the magnitude of signal vector features is then extracted and decomposed into a linear combination of enhanced Gabor atoms. The Wigner-Ville analysis method is introduced and the problem is studied by joint time-frequency analysis. In order to solve the large-scale abnormal behavior data lacking problem in training process, a cloud based incremental SVM (CI-SVM learning method is proposed. The original abnormal behavior data are first used to get the initial SVM classifier. And the larger abnormal behavior data of elderly collected by mobile devices are then gathered in cloud platform to conduct incremental training and get the new SVM classifier. By the CI-SVM learning method, the knowledge of SVM classifier could be accumulated due to the dynamic incremental learning. Experimental results demonstrate that the proposed method is feasible and can be applied to aged care, emergency aid, and related fields.

  16. Optimal structural design of the midship of a VLCC based on the strategy integrating SVM and GA

    Science.gov (United States)

    Sun, Li; Wang, Deyu

    2012-03-01

    In this paper a hybrid process of modeling and optimization, which integrates a support vector machine (SVM) and genetic algorithm (GA), was introduced to reduce the high time cost in structural optimization of ships. SVM, which is rooted in statistical learning theory and an approximate implementation of the method of structural risk minimization, can provide a good generalization performance in metamodeling the input-output relationship of real problems and consequently cuts down on high time cost in the analysis of real problems, such as FEM analysis. The GA, as a powerful optimization technique, possesses remarkable advantages for the problems that can hardly be optimized with common gradient-based optimization methods, which makes it suitable for optimizing models built by SVM. Based on the SVM-GA strategy, optimization of structural scantlings in the midship of a very large crude carrier (VLCC) ship was carried out according to the direct strength assessment method in common structural rules (CSR), which eventually demonstrates the high efficiency of SVM-GA in optimizing the ship structural scantlings under heavy computational complexity. The time cost of this optimization with SVM-GA has been sharply reduced, many more loops have been processed within a small amount of time and the design has been improved remarkably.

  17. Classification of cardiovascular tissues using LBP based descriptors and a cascade SVM.

    Science.gov (United States)

    Mazo, Claudia; Alegre, Enrique; Trujillo, Maria

    2017-08-01

    Histological images have characteristics, such as texture, shape, colour and spatial structure, that permit the differentiation of each fundamental tissue and organ. Texture is one of the most discriminative features. The automatic classification of tissues and organs based on histology images is an open problem, due to the lack of automatic solutions when treating tissues without pathologies. In this paper, we demonstrate that it is possible to automatically classify cardiovascular tissues using texture information and Support Vector Machines (SVM). Additionally, we realised that it is feasible to recognise several cardiovascular organs following the same process. The texture of histological images was described using Local Binary Patterns (LBP), LBP Rotation Invariant (LBPri), Haralick features and different concatenations between them, representing in this way its content. Using a SVM with linear kernel, we selected the more appropriate descriptor that, for this problem, was a concatenation of LBP and LBPri. Due to the small number of the images available, we could not follow an approach based on deep learning, but we selected the classifier who yielded the higher performance by comparing SVM with Random Forest and Linear Discriminant Analysis. Once SVM was selected as the classifier with a higher area under the curve that represents both higher recall and precision, we tuned it evaluating different kernels, finding that a linear SVM allowed us to accurately separate four classes of tissues: (i) cardiac muscle of the heart, (ii) smooth muscle of the muscular artery, (iii) loose connective tissue, and (iv) smooth muscle of the large vein and the elastic artery. The experimental validation was conducted using 3000 blocks of 100 × 100 sized pixels, with 600 blocks per class and the classification was assessed using a 10-fold cross-validation. using LBP as the descriptor, concatenated with LBPri and a SVM with linear kernel, the main four classes of tissues were

  18. A Roller Bearing Fault Diagnosis Method Based on LCD Energy Entropy and ACROA-SVM

    Directory of Open Access Journals (Sweden)

    HungLinh Ao

    2014-01-01

    Full Text Available This study investigates a novel method for roller bearing fault diagnosis based on local characteristic-scale decomposition (LCD energy entropy, together with a support vector machine designed using an Artificial Chemical Reaction Optimisation Algorithm, referred to as an ACROA-SVM. First, the original acceleration vibration signals are decomposed into intrinsic scale components (ISCs. Second, the concept of LCD energy entropy is introduced. Third, the energy features extracted from a number of ISCs that contain the most dominant fault information serve as input vectors for the support vector machine classifier. Finally, the ACROA-SVM classifier is proposed to recognize the faulty roller bearing pattern. The analysis of roller bearing signals with inner-race and outer-race faults shows that the diagnostic approach based on the ACROA-SVM and using LCD to extract the energy levels of the various frequency bands as features can identify roller bearing fault patterns accurately and effectively. The proposed method is superior to approaches based on Empirical Mode Decomposition method and requires less time.

  19. Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM.

    Science.gov (United States)

    Gu, Bin; Sheng, Victor S; Tay, Keng Yeow; Romano, Walter; Li, Shuo

    2017-06-01

    Model selection plays an important role in cost-sensitive SVM (CS-SVM). It has been proven that the global minimum cross validation (CV) error can be efficiently computed based on the solution path for one parameter learning problems. However, it is a challenge to obtain the global minimum CV error for CS-SVM based on one-dimensional solution path and traditional grid search, because CS-SVM is with two regularization parameters. In this paper, we propose a solution and error surfaces based CV approach (CV-SES). More specifically, we first compute a two-dimensional solution surface for CS-SVM based on a bi-parameter space partition algorithm, which can fit solutions of CS-SVM for all values of both regularization parameters. Then, we compute a two-dimensional validation error surface for each CV fold, which can fit validation errors of CS-SVM for all values of both regularization parameters. Finally, we obtain the CV error surface by superposing K validation error surfaces, which can find the global minimum CV error of CS-SVM. Experiments are conducted on seven datasets for cost sensitive learning and on four datasets for imbalanced learning. Experimental results not only show that our proposed CV-SES has a better generalization ability than CS-SVM with various hybrids between grid search and solution path methods, and than recent proposed cost-sensitive hinge loss SVM with three-dimensional grid search, but also show that CV-SES uses less running time.

  20. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.

    Science.gov (United States)

    Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai

    2017-12-26

    Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  1. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Xiaohui Lin

    2017-12-01

    Full Text Available Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  2. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    Science.gov (United States)

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  3. Computational design, construction, and characterization of a set of specificity determining residues in protein-protein interactions.

    Science.gov (United States)

    Nagao, Chioko; Izako, Nozomi; Soga, Shinji; Khan, Samia Haseeb; Kawabata, Shigeki; Shirai, Hiroki; Mizuguchi, Kenji

    2012-10-01

    Proteins interact with different partners to perform different functions and it is important to elucidate the determinants of partner specificity in protein complex formation. Although methods for detecting specificity determining positions have been developed previously, direct experimental evidence for these amino acid residues is scarce, and the lack of information has prevented further computational studies. In this article, we constructed a dataset that is likely to exhibit specificity in protein complex formation, based on available crystal structures and several intuitive ideas about interaction profiles and functional subclasses. We then defined a "structure-based specificity determining position (sbSDP)" as a set of equivalent residues in a protein family showing a large variation in their interaction energy with different partners. We investigated sequence and structural features of sbSDPs and demonstrated that their amino acid propensities significantly differed from those of other interacting residues and that the importance of many of these residues for determining specificity had been verified experimentally. Copyright © 2012 Wiley Periodicals, Inc.

  4. A feature-based approach to modeling protein–protein interaction hot spots

    Science.gov (United States)

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-01-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533

  5. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.

    Science.gov (United States)

    Li, Qiang; Gu, Yu; Jia, Jing

    2017-01-30

    Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  6. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier

    Directory of Open Access Journals (Sweden)

    Qiang Li

    2017-01-01

    Full Text Available Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS and support vector machine (SVM algorithms in a quartz crystal microbalance (QCM-based electronic nose (e-nose we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3% showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN classifier (93.3% and moving average-linear discriminant analysis (MA-LDA classifier (87.6%. The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  7. Universum Learning for Multiclass SVM

    OpenAIRE

    Dhar, Sauptik; Ramakrishnan, Naveen; Cherkassky, Vladimir; Shah, Mohak

    2016-01-01

    We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.

  8. Adaptive SVM for Data Stream Classification

    Directory of Open Access Journals (Sweden)

    Isah A. Lawal

    2017-07-01

    Full Text Available In this paper, we address the problem of learning an adaptive classifier for the classification of continuous streams of data. We present a solution based on incremental extensions of the Support Vector Machine (SVM learning paradigm that updates an existing SVM whenever new training data are acquired. To ensure that the SVM effectiveness is guaranteed while exploiting the newly gathered data, we introduce an on-line model selection approach in the incremental learning process. We evaluated the proposed method on real world applications including on-line spam email filtering and human action classification from videos. Experimental results show the effectiveness and the potential of the proposed approach.

  9. New KF-PP-SVM classification method for EEG in brain-computer interfaces.

    Science.gov (United States)

    Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian

    2014-01-01

    Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.

  10. Protein-protein docking with dynamic residue protonation states.

    Directory of Open Access Journals (Sweden)

    Krishna Praneeth Kilambi

    2014-12-01

    Full Text Available Protein-protein interactions depend on a host of environmental factors. Local pH conditions influence the interactions through the protonation states of the ionizable residues that can change upon binding. In this work, we present a pH-sensitive docking approach, pHDock, that can sample side-chain protonation states of five ionizable residues (Asp, Glu, His, Tyr, Lys on-the-fly during the docking simulation. pHDock produces successful local docking funnels in approximately half (79/161 the protein complexes, including 19 cases where standard RosettaDock fails. pHDock also performs better than the two control cases comprising docking at pH 7.0 or using fixed, predetermined protonation states. On average, the top-ranked pHDock structures have lower interface RMSDs and recover more native interface residue-residue contacts and hydrogen bonds compared to RosettaDock. Addition of backbone flexibility using a computationally-generated conformational ensemble further improves native contact and hydrogen bond recovery in the top-ranked structures. Although pHDock is designed to improve docking, it also successfully predicts a large pH-dependent binding affinity change in the Fc-FcRn complex, suggesting that it can be exploited to improve affinity predictions. The approaches in the study contribute to the goal of structural simulations of whole-cell protein-protein interactions including all the environmental factors, and they can be further expanded for pH-sensitive protein design.

  11. Scoring protein interaction decoys using exposed residues (SPIDER): a novel multibody interaction scoring function based on frequent geometric patterns of interfacial residues.

    Science.gov (United States)

    Khashan, Raed; Zheng, Weifan; Tropsha, Alexander

    2012-08-01

    Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multibody pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an undirectional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each "native" pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard "ZDOCK" benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein-protein docking methods. Copyright © 2012 Wiley Periodicals, Inc.

  12. A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images

    Science.gov (United States)

    Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong

    2016-01-01

    A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles’ in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians. PMID:27548179

  13. A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images.

    Science.gov (United States)

    Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong

    2016-08-19

    A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles' in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians.

  14. SVM classification model in depression recognition based on mutation PSO parameter optimization

    Directory of Open Access Journals (Sweden)

    Zhang Ming

    2017-01-01

    Full Text Available At present, the clinical diagnosis of depression is mainly through structured interviews by psychiatrists, which is lack of objective diagnostic methods, so it causes the higher rate of misdiagnosis. In this paper, a method of depression recognition based on SVM and particle swarm optimization algorithm mutation is proposed. To address on the problem that particle swarm optimization (PSO algorithm easily trap in local optima, we propose a feedback mutation PSO algorithm (FBPSO to balance the local search and global exploration ability, so that the parameters of the classification model is optimal. We compared different PSO mutation algorithms about classification accuracy for depression, and found the classification accuracy of support vector machine (SVM classifier based on feedback mutation PSO algorithm is the highest. Our study promotes important reference value for establishing auxiliary diagnostic used in depression recognition of clinical diagnosis.

  15. Prediction of the strength of concrete radiation shielding based on LS-SVM

    International Nuclear Information System (INIS)

    Juncai, Xu; Qingwen, Ren; Zhenzhong, Shen

    2015-01-01

    Highlights: • LS-SVM was introduced for prediction of the strength of RSC. • A model for prediction of the strength of RSC was implemented. • The grid search algorithm was used to optimize the parameters of the LS-SVM. • The performance of LS-SVM in predicting the strength of RSC was evaluated. - Abstract: Radiation-shielding concrete (RSC) and conventional concrete differ in strength because of their distinct constituents. Predicting the strength of RSC with different constituents plays a vital role in radiation shielding (RS) engineering design. In this study, a model to predict the strength of RSC is established using a least squares-support vector machine (LS-SVM) through grid search algorithm. The algorithm is used to optimize the parameters of the LS-SVM on the basis of traditional prediction methods for conventional concrete. The predicted results of the LS-SVM model are compared with the experimental data. The results of the prediction are stable and consistent with the experimental results. In addition, the studied parameters exhibit significant effects on the simulation results. Therefore, the proposed method can be applied in predicting the strength of RSC, and the predicted results can be adopted as an important reference for RS engineering design

  16. In-Vivo Imaging of Cell Migration Using Contrast Enhanced MRI and SVM Based Post-Processing.

    Science.gov (United States)

    Weis, Christian; Hess, Andreas; Budinsky, Lubos; Fabry, Ben

    2015-01-01

    The migration of cells within a living organism can be observed with magnetic resonance imaging (MRI) in combination with iron oxide nanoparticles as an intracellular contrast agent. This method, however, suffers from low sensitivity and specificty. Here, we developed a quantitative non-invasive in-vivo cell localization method using contrast enhanced multiparametric MRI and support vector machines (SVM) based post-processing. Imaging phantoms consisting of agarose with compartments containing different concentrations of cancer cells labeled with iron oxide nanoparticles were used to train and evaluate the SVM for cell localization. From the magnitude and phase data acquired with a series of T2*-weighted gradient-echo scans at different echo-times, we extracted features that are characteristic for the presence of superparamagnetic nanoparticles, in particular hyper- and hypointensities, relaxation rates, short-range phase perturbations, and perturbation dynamics. High detection quality was achieved by SVM analysis of the multiparametric feature-space. The in-vivo applicability was validated in animal studies. The SVM detected the presence of iron oxide nanoparticles in the imaging phantoms with high specificity and sensitivity with a detection limit of 30 labeled cells per mm3, corresponding to 19 μM of iron oxide. As proof-of-concept, we applied the method to follow the migration of labeled cancer cells injected in rats. The combination of iron oxide labeled cells, multiparametric MRI and a SVM based post processing provides high spatial resolution, specificity, and sensitivity, and is therefore suitable for non-invasive in-vivo cell detection and cell migration studies over prolonged time periods.

  17. Fault diagnosis of nuclear-powered equipment based on HMM and SVM

    International Nuclear Information System (INIS)

    Yue Xia; Zhang Chunliang; Zhu Houyao; Quan Yanming

    2012-01-01

    For the complexity and the small fault samples of nuclear-powered equipment, a hybrid HMM/SVM method was introduced in fault diagnosis. The hybrid method has two steps: first, HMM is utilized for primary diagnosis, in which the range of possible failure is reduced and the state trends can be observed; then faults can be recognized taking the advantage of the generalization ability of SVM. Experiments on the main pump failure simulator show that the HMM/SVM system has a high recognition rate and can be used in the fault diagnosis of nuclear-powered equipment. (authors)

  18. New Milk Protein-Derived Peptides with Potential Antimicrobial Activity: An Approach Based on Bioinformatic Studies

    Directory of Open Access Journals (Sweden)

    Bartłomiej Dziuba

    2014-08-01

    Full Text Available New peptides with potential antimicrobial activity, encrypted in milk protein sequences, were searched for with the use of bioinformatic tools. The major milk proteins were hydrolyzed in silico by 28 enzymes. The obtained peptides were characterized by the following parameters: molecular weight, isoelectric point, composition and number of amino acid residues, net charge at pH 7.0, aliphatic index, instability index, Boman index, and GRAVY index, and compared with those calculated for known 416 antimicrobial peptides including 59 antimicrobial peptides (AMPs from milk proteins listed in the BIOPEP database. A simple analysis of physico-chemical properties and the values of biological activity indicators were insufficient to select potentially antimicrobial peptides released in silico from milk proteins by proteolytic enzymes. The final selection was made based on the results of multidimensional statistical analysis such as support vector machines (SVM, random forest (RF, artificial neural networks (ANN and discriminant analysis (DA available in the Collection of Anti-Microbial Peptides (CAMP database. Eleven new peptides with potential antimicrobial activity were selected from all peptides released during in silico proteolysis of milk proteins.

  19. A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM

    Science.gov (United States)

    Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan

    2018-03-01

    In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.

  20. SVM and SVM Ensembles in Breast Cancer Prediction

    OpenAIRE

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction per...

  1. Learning SVM in Kreĭn Spaces.

    Science.gov (United States)

    Loosli, Gaelle; Canu, Stephane; Ong, Cheng Soon

    2016-06-01

    This paper presents a theoretical foundation for an SVM solver in Kreĭn spaces. Up to now, all methods are based either on the matrix correction, or on non-convex minimization, or on feature-space embedding. Here we justify and evaluate a solution that uses the original (indefinite) similarity measure, in the original Kreĭn space. This solution is the result of a stabilization procedure. We establish the correspondence between the stabilization problem (which has to be solved) and a classical SVM based on minimization (which is easy to solve). We provide simple equations to go from one to the other (in both directions). This link between stabilization and minimization problems is the key to obtain a solution in the original Kreĭn space. Using KSVM, one can solve SVM with usually troublesome kernels (large negative eigenvalues or large numbers of negative eigenvalues). We show experiments showing that our algorithm KSVM outperforms all previously proposed approaches to deal with indefinite matrices in SVM-like kernel methods.

  2. Hybrid PSO–SVM-based method for forecasting of the remaining useful life for aircraft engines and evaluation of its reliability

    International Nuclear Information System (INIS)

    García Nieto, P.J.; García-Gonzalo, E.; Sánchez Lasheras, F.; Cos Juez, F.J. de

    2015-01-01

    The present paper describes a hybrid PSO–SVM-based model for the prediction of the remaining useful life of aircraft engines. The proposed hybrid model combines support vector machines (SVMs), which have been successfully adopted for regression problems, with the particle swarm optimization (PSO) technique. This optimization technique involves kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not been yet widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid PSO–SVM-based model from the remaining measured parameters (input variables) for aircraft engines with success. A coefficient of determination equal to 0.9034 was obtained when this hybrid PSO–RBF–SVM-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. One of the main advantages of this predictive model is that it does not require information about the previous operation states of the engine. Finally, the main conclusions of this study are exposed. - Highlights: • A hybrid PSO–SVM-based model is built as a predictive model of the RUL values for aircraft engines. • The remaining physical–chemical variables in this process are studied in depth. • The obtained regression accuracy of our method is about 95%. • The results show that PSO–SVM-based model can assist in the diagnosis of the RUL values with accuracy

  3. Research on gesture recognition of augmented reality maintenance guiding system based on improved SVM

    Science.gov (United States)

    Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi

    2014-09-01

    Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.

  4. Predicting enhancer activity and variant impact using gkm-SVM.

    Science.gov (United States)

    Beer, Michael A

    2017-09-01

    We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with massively parallel reporter assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other models used the MPRA data for model training, and gkm-SVM did not. In addition, we compare gkm-SVM with other MPRA datasets and show that gkm-SVM is a reliable predictor of expression and that deltaSVM is a reliable predictor of variant impact in K562 cells and mouse retina. We further show that DHS (DNase-I hypersensitive sites) and ATAC-seq (assay for transposase-accessible chromatin using sequencing) data are equally predictive substrates for training gkm-SVM, and that DHS regions flanked by H3K27Ac and H3K4me1 marks are more predictive than DHS regions alone. © 2017 Wiley Periodicals, Inc.

  5. Constructing and Validating High-Performance MIEC-SVM Models in Virtual Screening for Kinases: A Better Way for Actives Discovery.

    Science.gov (United States)

    Sun, Huiyong; Pan, Peichen; Tian, Sheng; Xu, Lei; Kong, Xiaotian; Li, Youyong; Dan Li; Hou, Tingjun

    2016-04-22

    The MIEC-SVM approach, which combines molecular interaction energy components (MIEC) derived from free energy decomposition and support vector machine (SVM), has been found effective in capturing the energetic patterns of protein-peptide recognition. However, the performance of this approach in identifying small molecule inhibitors of drug targets has not been well assessed and validated by experiments. Thereafter, by combining different model construction protocols, the issues related to developing best MIEC-SVM models were firstly discussed upon three kinase targets (ABL, ALK, and BRAF). As for the investigated targets, the optimized MIEC-SVM models performed much better than the models based on the default SVM parameters and Autodock for the tested datasets. Then, the proposed strategy was utilized to screen the Specs database for discovering potential inhibitors of the ALK kinase. The experimental results showed that the optimized MIEC-SVM model, which identified 7 actives with IC50 < 10 μM from 50 purchased compounds (namely hit rate of 14%, and 4 in nM level) and performed much better than Autodock (3 actives with IC50 < 10 μM from 50 purchased compounds, namely hit rate of 6%, and 2 in nM level), suggesting that the proposed strategy is a powerful tool in structure-based virtual screening.

  6. A pairwise residue contact area-based mean force potential for discrimination of native protein structure

    Directory of Open Access Journals (Sweden)

    Pezeshk Hamid

    2010-01-01

    Full Text Available Abstract Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield

  7. sw-SVM: sensor weighting support vector machines for EEG-based brain-computer interfaces.

    Science.gov (United States)

    Jrad, N; Congedo, M; Phlypo, R; Rousseau, S; Flamary, R; Yger, F; Rakotomamonjy, A

    2011-10-01

    In many machine learning applications, like brain-computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.

  8. F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation

    OpenAIRE

    Wu, Xiaohe; Zuo, Wangmeng; Zhu, Yuanyuan; Lin, Liang

    2015-01-01

    The generalization error bound of support vector machine (SVM) depends on the ratio of radius and margin, while standard SVM only considers the maximization of the margin but ignores the minimization of the radius. Several approaches have been proposed to integrate radius and margin for joint learning of feature transformation and SVM classifier. However, most of them either require the form of the transformation matrix to be diagonal, or are non-convex and computationally expensive. In this ...

  9. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  10. Literature mining of protein-residue associations with graph rules learned through distant supervision.

    Science.gov (United States)

    Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin

    2012-10-05

    We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  11. Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.

    Science.gov (United States)

    Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel

    2017-01-01

    Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.

  12. Identification of key residues for protein conformational transition using elastic network model.

    Science.gov (United States)

    Su, Ji Guo; Xu, Xian Jin; Li, Chun Hua; Chen, Wei Zu; Wang, Cun Xin

    2011-11-07

    Proteins usually undergo conformational transitions between structurally disparate states to fulfill their functions. The large-scale allosteric conformational transitions are believed to involve some key residues that mediate the conformational movements between different regions of the protein. In the present work, a thermodynamic method based on the elastic network model is proposed to predict the key residues involved in protein conformational transitions. In our method, the key functional sites are identified as the residues whose perturbations largely influence the free energy difference between the protein states before and after transition. Two proteins, nucleotide binding domain of the heat shock protein 70 and human/rat DNA polymerase β, are used as case studies to identify the critical residues responsible for their open-closed conformational transitions. The results show that the functionally important residues mainly locate at the following regions for these two proteins: (1) the bridging point at the interface between the subdomains that control the opening and closure of the binding cleft; (2) the hinge region between different subdomains, which mediates the cooperative motions between the corresponding subdomains; and (3) the substrate binding sites. The similarity in the positions of the key residues for these two proteins may indicate a common mechanism in their conformational transitions.

  13. SVM models for analysing the headstreams of mine water inrush

    Energy Technology Data Exchange (ETDEWEB)

    Yan Zhi-gang; Du Pei-jun; Guo Da-zhi [China University of Science and Technology, Xuzhou (China). School of Environmental Science and Spatial Informatics

    2007-08-15

    The support vector machine (SVM) model was introduced to analyse the headstrean of water inrush in a coal mine. The SVM model, based on a hydrogeochemical method, was constructed for recognising two kinds of headstreams and the H-SVMs model was constructed for recognising multi- headstreams. The SVM method was applied to analyse the conditions of two mixed headstreams and the value of the SVM decision function was investigated as a means of denoting the hydrogeochemical abnormality. The experimental results show that the SVM is based on a strict mathematical theory, has a simple structure and a good overall performance. Moreover the parameter W in the decision function can describe the weights of discrimination indices of the headstream of water inrush. The value of the decision function can denote hydrogeochemistry abnormality, which is significant in the prevention of water inrush in a coal mine. 9 refs., 1 fig., 7 tabs.

  14. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    Science.gov (United States)

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent

  15. On the relationship between residue structural environment and sequence conservation in proteins.

    Science.gov (United States)

    Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

    2017-09-01

    Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.

  16. A Mass Spectrometric Analysis Method Based on PPCA and SVM for Early Detection of Ovarian Cancer.

    Science.gov (United States)

    Wu, Jiang; Ji, Yanju; Zhao, Ling; Ji, Mengying; Ye, Zhuang; Li, Suyi

    2016-01-01

    Background. Surfaced-enhanced laser desorption-ionization-time of flight mass spectrometry (SELDI-TOF-MS) technology plays an important role in the early diagnosis of ovarian cancer. However, the raw MS data is highly dimensional and redundant. Therefore, it is necessary to study rapid and accurate detection methods from the massive MS data. Methods. The clinical data set used in the experiments for early cancer detection consisted of 216 SELDI-TOF-MS samples. An MS analysis method based on probabilistic principal components analysis (PPCA) and support vector machine (SVM) was proposed and applied to the ovarian cancer early classification in the data set. Additionally, by the same data set, we also established a traditional PCA-SVM model. Finally we compared the two models in detection accuracy, specificity, and sensitivity. Results. Using independent training and testing experiments 10 times to evaluate the ovarian cancer detection models, the average prediction accuracy, sensitivity, and specificity of the PCA-SVM model were 83.34%, 82.70%, and 83.88%, respectively. In contrast, those of the PPCA-SVM model were 90.80%, 92.98%, and 88.97%, respectively. Conclusions. The PPCA-SVM model had better detection performance. And the model combined with the SELDI-TOF-MS technology had a prospect in early clinical detection and diagnosis of ovarian cancer.

  17. A method of distributed avionics data processing based on SVM classifier

    Science.gov (United States)

    Guo, Hangyu; Wang, Jinyan; Kang, Minyang; Xu, Guojing

    2018-03-01

    Under the environment of system combat, in order to solve the problem on management and analysis of the massive heterogeneous data on multi-platform avionics system, this paper proposes a management solution which called avionics "resource cloud" based on big data technology, and designs an aided decision classifier based on SVM algorithm. We design an experiment with STK simulation, the result shows that this method has a high accuracy and a broad application prospect.

  18. InterMap3D: predicting and visualizing co-evolving protein residues

    DEFF Research Database (Denmark)

    Oliveira, Rodrigo Gouveia; Roque, francisco jose sousa simôes almeida; Wernersson, Rasmus

    2009-01-01

    InterMap3D predicts co-evolving protein residues and plots them on the 3D protein structure. Starting with a single protein sequence, InterMap3D automatically finds a set of homologous sequences, generates an alignment and fetches the most similar 3D structure from the Protein Data Bank (PDB......). It can also accept a user-generated alignment. Based on the alignment, co-evolving residues are then predicted using three different methods: Row and Column Weighing of Mutual Information, Mutual Information/Entropy and Dependency. Finally, InterMap3D generates high-quality images of the protein...

  19. Residue preference mapping of ligand fragments in the Protein Data Bank.

    Science.gov (United States)

    Wang, Lirong; Xie, Zhaojun; Wipf, Peter; Xie, Xiang-Qun

    2011-04-25

    The interaction between small molecules and proteins is one of the major concerns for structure-based drug design because the principles of protein-ligand interactions and molecular recognition are not thoroughly understood. Fortunately, the analysis of protein-ligand complexes in the Protein Data Bank (PDB) enables unprecedented possibilities for new insights. Herein, we applied molecule-fragmentation algorithms to split the ligands extracted from PDB crystal structures into small fragments. Subsequently, we have developed a ligand fragment and residue preference mapping (LigFrag-RPM) algorithm to map the profiles of the interactions between these fragments and the 20 proteinogenic amino acid residues. A total of 4032 fragments were generated from 71 798 PDB ligands by a ring cleavage (RC) algorithm. Among these ligand fragments, 315 unique fragments were characterized with the corresponding fragment-residue interaction profiles by counting residues close to these fragments. The interaction profiles revealed that these fragments have specific preferences for certain types of residues. The applications of these interaction profiles were also explored and evaluated in case studies, showing great potential for the study of protein-ligand interactions and drug design. Our studies demonstrated that the fragment-residue interaction profiles generated from the PDB ligand fragments can be used to detect whether these fragments are in their favorable or unfavorable environments. The algorithm for a ligand fragment and residue preference mapping (LigFrag-RPM) developed here also has the potential to guide lead chemistry modifications as well as binding residues predictions.

  20. A Statistical Parameter Analysis and SVM Based Fault Diagnosis Strategy for Dynamically Tuned Gyroscopes

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Gyro's fault diagnosis plays a critical role in inertia navigation systems for higher reliability and precision. A new fault diagnosis strategy based on the statistical parameter analysis (SPA) and support vector machine(SVM) classification model was proposed for dynamically tuned gyroscopes (DTG). The SPA, a kind of time domain analysis approach, was introduced to compute a set of statistical parameters of vibration signal as the state features of DTG, with which the SVM model, a novel learning machine based on statistical learning theory (SLT), was applied and constructed to train and identify the working state of DTG. The experimental results verify that the proposed diagnostic strategy can simply and effectively extract the state features of DTG, and it outperforms the radial-basis function (RBF) neural network based diagnostic method and can more reliably and accurately diagnose the working state of DTG.

  1. SVM-Based Spectral Analysis for Heart Rate from Multi-Channel WPPG Sensor Signals.

    Science.gov (United States)

    Xiong, Jiping; Cai, Lisang; Wang, Fei; He, Xiaowei

    2017-03-03

    Although wrist-type photoplethysmographic (hereafter referred to as WPPG) sensor signals can measure heart rate quite conveniently, the subjects' hand movements can cause strong motion artifacts, and then the motion artifacts will heavily contaminate WPPG signals. Hence, it is challenging for us to accurately estimate heart rate from WPPG signals during intense physical activities. The WWPG method has attracted more attention thanks to the popularity of wrist-worn wearable devices. In this paper, a mixed approach called Mix-SVM is proposed, it can use multi-channel WPPG sensor signals and simultaneous acceleration signals to measurement heart rate. Firstly, we combine the principle component analysis and adaptive filter to remove a part of the motion artifacts. Due to the strong relativity between motion artifacts and acceleration signals, the further denoising problem is regarded as a sparse signals reconstruction problem. Then, we use a spectrum subtraction method to eliminate motion artifacts effectively. Finally, the spectral peak corresponding to heart rate is sought by an SVM-based spectral analysis method. Through the public PPG database in the 2015 IEEE Signal Processing Cup, we acquire the experimental results, i.e., the average absolute error was 1.01 beat per minute, and the Pearson correlation was 0.9972. These results also confirm that the proposed Mix-SVM approach has potential for multi-channel WPPG-based heart rate estimation in the presence of intense physical exercise.

  2. An SVM Based Approach for the Analysis Of Mammography Images

    Science.gov (United States)

    Gan, X.; Kapsokalivas, L.; Skaliotis, A.; Steinhöfel, K.; Tangaro, S.

    2007-09-01

    Mammography is among the most popular imaging techniques used in the diagnosis of breast cancer. Nevertheless distinguishing between healthy and ill images is hard even for an experienced radiologist, because a single image usually includes several regions of interest (ROIs). The hardness of this classification problem along with the substantial amount of data, gathered from patients' medical history, motivates the use of a machine learning approach as part of a CAD (Computer Aided Detection) tool, aiming to assist radiologists in the characterization of mammography images. Specifically, our approach involves: i) the ROI extraction, ii) the Feature Vector extraction, iii) the Support Vector Machine (SVM) classification of ROIs and iv) the characterization of the whole image. We evaluate the performance of our approach in terms of the SVM's training and testing error and in terms of ROI specificity—sensitivity. The results show a relation between the number of features used and the SVM's performance.

  3. An SVM Based Approach for the Analysis Of Mammography Images

    International Nuclear Information System (INIS)

    Gan, X.; Kapsokalivas, L.; Skaliotis, A.; Steinhoefel, K.; Tangaro, S.

    2007-01-01

    Mammography is among the most popular imaging techniques used in the diagnosis of breast cancer. Nevertheless distinguishing between healthy and ill images is hard even for an experienced radiologist, because a single image usually includes several regions of interest (ROIs). The hardness of this classification problem along with the substantial amount of data, gathered from patients' medical history, motivates the use of a machine learning approach as part of a CAD (Computer Aided Detection) tool, aiming to assist radiologists in the characterization of mammography images. Specifically, our approach involves: i) the ROI extraction, ii) the Feature Vector extraction, iii) the Support Vector Machine (SVM) classification of ROIs and iv) the characterization of the whole image. We evaluate the performance of our approach in terms of the SVM's training and testing error and in terms of ROI specificity - sensitivity. The results show a relation between the number of features used and the SVM's performance

  4. Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.

    Science.gov (United States)

    Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng

    2017-01-01

    Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.

  5. The Ising model for prediction of disordered residues from protein sequence alone

    International Nuclear Information System (INIS)

    Lobanov, Michail Yu; Galzitskaya, Oxana V

    2011-01-01

    Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered residues and disordered regions in protein chains using protein sequence alone. A new method (IsUnstruct) based on the Ising model for prediction of disordered residues from protein sequence alone has been developed. According to this model, each residue can be in one of two states: ordered or disordered. The model is an approximation of the Ising model in which the interaction term between neighbors has been replaced by a penalty for changing between states (the energy of border). The IsUnstruct has been compared with other available methods and found to perform well. The method correctly finds 77% of disordered residues as well as 87% of ordered residues in the CASP8 database, and 72% of disordered residues as well as 85% of ordered residues in the DisProt database

  6. Optimised Selection of Stroke Biomarker Based on Svm and Information Theory

    Directory of Open Access Journals (Sweden)

    Wang Xiang

    2017-01-01

    Full Text Available With the development of molecular biology and gene-engineering technology, gene diagnosis has been an emerging approach for modern life sciences. Biological marker, recognized as the hot topic in the molecular and gene fields, has important values in early diagnosis, malignant tumor stage, treatment and therapeutic efficacy evaluation. So far, the researcher has not found any effective way to predict and distinguish different type of stroke. In this paper, we aim to optimize stroke biomarker and figure out effective stroke detection index based on SVM (support vector machine and information theory. Through mutual information analysis and principal component analysis to complete the selection of biomarkers and then we use SVM to verify our model. According to the testing data of patients provided by Xuanwu Hospital, we explore the significant markers of the stroke through data analysis. Our model can predict stroke well. Then discuss the effects of each biomarker on the incidence of stroke.

  7. Linear SVM-Based Android Malware Detection for Reliable IoT Services

    Directory of Open Access Journals (Sweden)

    Hyo-Sik Ham

    2014-01-01

    Full Text Available Current many Internet of Things (IoT services are monitored and controlled through smartphone applications. By combining IoT with smartphones, many convenient IoT services have been provided to users. However, there are adverse underlying effects in such services including invasion of privacy and information leakage. In most cases, mobile devices have become cluttered with important personal user information as various services and contents are provided through them. Accordingly, attackers are expanding the scope of their attacks beyond the existing PC and Internet environment into mobile devices. In this paper, we apply a linear support vector machine (SVM to detect Android malware and compare the malware detection performance of SVM with that of other machine learning classifiers. Through experimental validation, we show that the SVM outperforms other machine learning classifiers.

  8. A structural SVM approach for reference parsing.

    Science.gov (United States)

    Zhang, Xiaoli; Zou, Jie; Le, Daniel X; Thoma, George R

    2011-06-09

    Automated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references. In this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels. When only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.

  9. Prediction of the residual strength of clay using functional networks

    Directory of Open Access Journals (Sweden)

    S.Z. Khan

    2016-01-01

    Full Text Available Landslides are common natural hazards occurring in most parts of the world and have considerable adverse economic effects. Residual shear strength of clay is one of the most important factors in the determination of stability of slopes or landslides. This effect is more pronounced in sensitive clays which show large changes in shear strength from peak to residual states. This study analyses the prediction of the residual strength of clay based on a new prediction model, functional networks (FN using data available in the literature. The performance of FN was compared with support vector machine (SVM and artificial neural network (ANN based on statistical parameters like correlation coefficient (R, Nash--Sutcliff coefficient of efficiency (E, absolute average error (AAE, maximum average error (MAE and root mean square error (RMSE. Based on R and E parameters, FN is found to be a better prediction tool than ANN for the given data. However, the R and E values for FN are less than SVM. A prediction equation is presented that can be used by practicing geotechnical engineers. A sensitivity analysis is carried out to ascertain the importance of various inputs in the prediction of the output.

  10. An Efficient Normalized Rank Based SVM for Room Level Indoor WiFi Localization with Diverse Devices

    Directory of Open Access Journals (Sweden)

    Yasmine Rezgui

    2017-01-01

    Full Text Available This paper proposes an efficient and effective WiFi fingerprinting-based indoor localization algorithm, which uses the Received Signal Strength Indicator (RSSI of WiFi signals. In practical harsh indoor environments, RSSI variation and hardware variance can significantly degrade the performance of fingerprinting-based localization methods. To address the problem of hardware variance and signal fluctuation in WiFi fingerprinting-based localization, we propose a novel normalized rank based Support Vector Machine classifier (NR-SVM. Moving from RSSI value based analysis to the normalized rank transformation based analysis, the principal features are prioritized and the dimensionalities of signature vectors are taken into account. The proposed method has been tested using sixteen different devices in a shopping mall with 88 shops. The experimental results demonstrate its robustness with no less than 98.75% correct estimation in 93.75% of the tested cases and 100% correct rate in 56.25% of cases. In the experiments, the new method shows better performance over the KNN, Naïve Bayes, Random Forest, and Neural Network algorithms. Furthermore, we have compared the proposed approach with three popular calibration-free transformation based methods, including difference method (DIFF, Signal Strength Difference (SSD, and the Hyperbolic Location Fingerprinting (HLF based SVM. The results show that the NR-SVM outperforms these popular methods.

  11. Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

    Science.gov (United States)

    Wang, Huiya; Feng, Jun; Wang, Hongyu

    2017-07-20

    Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.

  12. Evaluation of Effectiveness of Wavelet Based Denoising Schemes Using ANN and SVM for Bearing Condition Classification

    Directory of Open Access Journals (Sweden)

    Vijay G. S.

    2012-01-01

    Full Text Available The wavelet based denoising has proven its ability to denoise the bearing vibration signals by improving the signal-to-noise ratio (SNR and reducing the root-mean-square error (RMSE. In this paper seven wavelet based denoising schemes have been evaluated based on the performance of the Artificial Neural Network (ANN and the Support Vector Machine (SVM, for the bearing condition classification. The work consists of two parts, the first part in which a synthetic signal simulating the defective bearing vibration signal with Gaussian noise was subjected to these denoising schemes. The best scheme based on the SNR and the RMSE was identified. In the second part, the vibration signals collected from a customized Rolling Element Bearing (REB test rig for four bearing conditions were subjected to these denoising schemes. Several time and frequency domain features were extracted from the denoised signals, out of which a few sensitive features were selected using the Fisher’s Criterion (FC. Extracted features were used to train and test the ANN and the SVM. The best denoising scheme identified, based on the classification performances of the ANN and the SVM, was found to be the same as the one obtained using the synthetic signal.

  13. Elucidation of Metallic Plume and Spatter Characteristics Based on SVM During High-Power Disk Laser Welding

    International Nuclear Information System (INIS)

    Gao Xiangdong; Liu Guiqian

    2015-01-01

    During deep penetration laser welding, there exist plume (weak plasma) and spatters, which are the results of weld material ejection due to strong laser heating. The characteristics of plume and spatters are related to welding stability and quality. Characteristics of metallic plume and spatters were investigated during high-power disk laser bead-on-plate welding of Type 304 austenitic stainless steel plates at a continuous wave laser power of 10 kW. An ultraviolet and visible sensitive high-speed camera was used to capture the metallic plume and spatter images. Plume area, laser beam path through the plume, swing angle, distance between laser beam focus and plume image centroid, abscissa of plume centroid and spatter numbers are defined as eigenvalues, and the weld bead width was used as a characteristic parameter that reflected welding stability. Welding status was distinguished by SVM (support vector machine) after data normalization and characteristic analysis. Also, PCA (principal components analysis) feature extraction was used to reduce the dimensions of feature space, and PSO (particle swarm optimization) was used to optimize the parameters of SVM. Finally a classification model based on SVM was established to estimate the weld bead width and welding stability. Experimental results show that the established algorithm based on SVM could effectively distinguish the variation of weld bead width, thus providing an experimental example of monitoring high-power disk laser welding quality. (plasma technology)

  14. Fault detection of Tennessee Eastman process based on topological features and SVM

    Science.gov (United States)

    Zhao, Huiyang; Hu, Yanzhu; Ai, Xinbo; Hu, Yu; Meng, Zhen

    2018-03-01

    Fault detection in industrial process is a popular research topic. Although the distributed control system(DCS) has been introduced to monitor the state of industrial process, it still cannot satisfy all the requirements for fault detection of all the industrial systems. In this paper, we proposed a novel method based on topological features and support vector machine(SVM), for fault detection of industrial process. The proposed method takes global information of measured variables into account by complex network model and predicts whether a system has generated some faults or not by SVM. The proposed method can be divided into four steps, i.e. network construction, network analysis, model training and model testing respectively. Finally, we apply the model to Tennessee Eastman process(TEP). The results show that this method works well and can be a useful supplement for fault detection of industrial process.

  15. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    Science.gov (United States)

    Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  16. [Rapid determination of COD in aquaculture water based on LS-SVM with ultraviolet/visible spectroscopy].

    Science.gov (United States)

    Liu, Xue-Mei; Zhang, Hai-Liang

    2014-10-01

    Ultraviolet/visible (UV/Vis) spectroscopy was studied for the rapid determination of chemical oxygen demand (COD), which was an indicator to measure the concentration of organic matter in aquaculture water. In order to reduce the influence of the absolute noises of the spectra, the extracted 135 absorbance spectra were preprocessed by Savitzky-Golay smoothing (SG), EMD, and wavelet transform (WT) methods. The preprocessed spectra were then used to select latent variables (LVs) by partial least squares (PLS) methods. Partial least squares (PLS) was used to build models with the full spectra, and back- propagation neural network (BPNN) and least square support vector machine (LS-SVM) were applied to build models with the selected LVs. The overall results showed that BPNN and LS-SVM models performed better than PLS models, and the LS-SVM models with LVs based on WT preprocessed spectra obtained the best results with the determination coefficient (r2) and RMSE being 0. 83 and 14. 78 mg · L(-1) for calibration set, and 0.82 and 14.82 mg · L(-1) for the prediction set respectively. The method showed the best performance in LS-SVM model. The results indicated that it was feasible to use UV/Vis with LVs which were obtained by PLS method, combined with LS-SVM calibration could be applied to the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.

  17. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    Science.gov (United States)

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  18. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    Science.gov (United States)

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is

  19. DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

    Science.gov (United States)

    2011-01-01

    Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot. PMID:21689480

  20. SVM-Based Spectral Analysis for Heart Rate from Multi-Channel WPPG Sensor Signals

    Directory of Open Access Journals (Sweden)

    Jiping Xiong

    2017-03-01

    Full Text Available Although wrist-type photoplethysmographic (hereafter referred to as WPPG sensor signals can measure heart rate quite conveniently, the subjects’ hand movements can cause strong motion artifacts, and then the motion artifacts will heavily contaminate WPPG signals. Hence, it is challenging for us to accurately estimate heart rate from WPPG signals during intense physical activities. The WWPG method has attracted more attention thanks to the popularity of wrist-worn wearable devices. In this paper, a mixed approach called Mix-SVM is proposed, it can use multi-channel WPPG sensor signals and simultaneous acceleration signals to measurement heart rate. Firstly, we combine the principle component analysis and adaptive filter to remove a part of the motion artifacts. Due to the strong relativity between motion artifacts and acceleration signals, the further denoising problem is regarded as a sparse signals reconstruction problem. Then, we use a spectrum subtraction method to eliminate motion artifacts effectively. Finally, the spectral peak corresponding to heart rate is sought by an SVM-based spectral analysis method. Through the public PPG database in the 2015 IEEE Signal Processing Cup, we acquire the experimental results, i.e., the average absolute error was 1.01 beat per minute, and the Pearson correlation was 0.9972. These results also confirm that the proposed Mix-SVM approach has potential for multi-channel WPPG-based heart rate estimation in the presence of intense physical exercise.

  1. SVM classifier on chip for melanoma detection.

    Science.gov (United States)

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  2. Image Interpolation Scheme based on SVM and Improved PSO

    Science.gov (United States)

    Jia, X. F.; Zhao, B. T.; Liu, X. X.; Song, H. P.

    2018-01-01

    In order to obtain visually pleasing images, a support vector machines (SVM) based interpolation scheme is proposed, in which the improved particle swarm optimization is applied to support vector machine parameters optimization. Training samples are constructed by the pixels around the pixel to be interpolated. Then the support vector machine with optimal parameters is trained using training samples. After the training, we can get the interpolation model, which can be employed to estimate the unknown pixel. Experimental result show that the interpolated images get improvement PNSR compared with traditional interpolation methods, which is agrees with the subjective quality.

  3. pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine.

    Science.gov (United States)

    Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong

    2017-08-07

    DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  5. A Method for Aileron Actuator Fault Diagnosis Based on PCA and PGC-SVM

    Directory of Open Access Journals (Sweden)

    Wei-Li Qin

    2016-01-01

    Full Text Available Aileron actuators are pivotal components for aircraft flight control system. Thus, the fault diagnosis of aileron actuators is vital in the enhancement of the reliability and fault tolerant capability. This paper presents an aileron actuator fault diagnosis approach combining principal component analysis (PCA, grid search (GS, 10-fold cross validation (CV, and one-versus-one support vector machine (SVM. This method is referred to as PGC-SVM and utilizes the direct drive valve input, force motor current, and displacement feedback signal to realize fault detection and location. First, several common faults of aileron actuators, which include force motor coil break, sensor coil break, cylinder leakage, and amplifier gain reduction, are extracted from the fault quadrantal diagram; the corresponding fault mechanisms are analyzed. Second, the data feature extraction is performed with dimension reduction using PCA. Finally, the GS and CV algorithms are employed to train a one-versus-one SVM for fault classification, thus obtaining the optimal model parameters and assuring the generalization of the trained SVM, respectively. To verify the effectiveness of the proposed approach, four types of faults are introduced into the simulation model established by AMESim and Simulink. The results demonstrate its desirable diagnostic performance which outperforms that of the traditional SVM by comparison.

  6. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

    Directory of Open Access Journals (Sweden)

    Harris Lyndsay N

    2006-04-01

    Full Text Available Abstract Background Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. Results We developed a recursive support vector machine (R-SVM algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE, paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. Conclusion The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.

  7. lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

    Science.gov (United States)

    Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

    2015-01-01

    Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.

  8. Modulation transfer function (MTF) measurement method based on support vector machine (SVM)

    Science.gov (United States)

    Zhang, Zheng; Chen, Yueting; Feng, Huajun; Xu, Zhihai; Li, Qi

    2016-03-01

    An imaging system's spatial quality can be expressed by the system's modulation spread function (MTF) as a function of spatial frequency in terms of the linear response theory. Methods have been proposed to assess the MTF of an imaging system using point, slit or edge techniques. The edge method is widely used for the low requirement of targets. However, the traditional edge methods are limited by the edge angle. Besides, image noise will impair the measurement accuracy, making the measurement result unstable. In this paper, a novel measurement method based on the support vector machine (SVM) is proposed. Image patches with different edge angles and MTF levels are generated as the training set. Parameters related with MTF and image structure are extracted from the edge images. Trained with image parameters and the corresponding MTF, the SVM classifier can assess the MTF of any edge image. The result shows that the proposed method has an excellent performance on measuring accuracy and stability.

  9. A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein.

    Science.gov (United States)

    Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L

    2014-01-21

    Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  10. Efficient HIK SVM learning for image classification.

    Science.gov (United States)

    Wu, Jianxin

    2012-10-01

    Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.

  11. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    Directory of Open Access Journals (Sweden)

    Yushen Du

    2016-11-01

    Full Text Available Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp, we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.

  12. DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking

    Directory of Open Access Journals (Sweden)

    Vakser Ilya A

    2011-07-01

    Full Text Available Abstract Background Computational approaches to protein-protein docking typically include scoring aimed at improving the rank of the near-native structure relative to the false-positive matches. Knowledge-based potentials improve modeling of protein complexes by taking advantage of the rapidly increasing amount of experimentally derived information on protein-protein association. An essential element of knowledge-based potentials is defining the reference state for an optimal description of the residue-residue (or atom-atom pairs in the non-interaction state. Results The study presents a new Distance- and Environment-dependent, Coarse-grained, Knowledge-based (DECK potential for scoring of protein-protein docking predictions. Training sets of protein-protein matches were generated based on bound and unbound forms of proteins taken from the DOCKGROUND resource. Each residue was represented by a pseudo-atom in the geometric center of the side chain. To capture the long-range and the multi-body interactions, residues in different secondary structure elements at protein-protein interfaces were considered as different residue types. Five reference states for the potentials were defined and tested. The optimal reference state was selected and the cutoff effect on the distance-dependent potentials investigated. The potentials were validated on the docking decoys sets, showing better performance than the existing potentials used in scoring of protein-protein docking results. Conclusions A novel residue-based statistical potential for protein-protein docking was developed and validated on docking decoy sets. The results show that the scoring function DECK can successfully identify near-native protein-protein matches and thus is useful in protein docking. In addition to the practical application of the potentials, the study provides insights into the relative utility of the reference states, the scope of the distance dependence, and the coarse-graining of

  13. Modeling the milling tool wear by using an evolutionary SVM-based model from milling runs experimental data

    Science.gov (United States)

    Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade

    2015-12-01

    The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.

  14. Application of SVM classifier in thermographic image classification for early detection of breast cancer

    Science.gov (United States)

    Oleszkiewicz, Witold; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał

    2016-09-01

    This article presents the application of machine learning algorithms for early detection of breast cancer on the basis of thermographic images. Supervised learning model: Support vector machine (SVM) and Sequential Minimal Optimization algorithm (SMO) for the training of SVM classifier were implemented. The SVM classifier was included in a client-server application which enables to create a training set of examinations and to apply classifiers (including SVM) for the diagnosis and early detection of the breast cancer. The sensitivity and specificity of SVM classifier were calculated based on the thermographic images from studies. Furthermore, the heuristic method for SVM's parameters tuning was proposed.

  15. Determination of structural fluctuations of proteins from structure-based calculations of residual dipolar couplings

    International Nuclear Information System (INIS)

    Montalvao, Rinaldo W.; De Simone, Alfonso; Vendruscolo, Michele

    2012-01-01

    Residual dipolar couplings (RDCs) have the potential of providing detailed information about the conformational fluctuations of proteins. It is very challenging, however, to extract such information because of the complex relationship between RDCs and protein structures. A promising approach to decode this relationship involves structure-based calculations of the alignment tensors of protein conformations. By implementing this strategy to generate structural restraints in molecular dynamics simulations we show that it is possible to extract effectively the information provided by RDCs about the conformational fluctuations in the native states of proteins. The approach that we present can be used in a wide range of alignment media, including Pf1, charged bicelles and gels. The accuracy of the method is demonstrated by the analysis of the Q factors for RDCs not used as restraints in the calculations, which are significantly lower than those corresponding to existing high-resolution structures and structural ensembles, hence showing that we capture effectively the contributions to RDCs from conformational fluctuations.

  16. Accurate Fluid Level Measurement in Dynamic Environment Using Ultrasonic Sensor and ν-SVM

    Directory of Open Access Journals (Sweden)

    Jenny TERZIC

    2009-10-01

    Full Text Available A fluid level measurement system based on a single Ultrasonic Sensor and Support Vector Machines (SVM based signal processing and classification system has been developed to determine the fluid level in automotive fuel tanks. The novel approach based on the ν-SVM classification method uses the Radial Basis Function (RBF to compensate for the measurement error induced by the sloshing effects in the tank caused by vehicle motion. A broad investigation on selected pre-processing filters, namely, Moving Mean, Moving Median, and Wavelet filter, has also been presented. Field drive trials were performed under normal driving conditions at various fuel volumes ranging from 5 L to 50 L to acquire sample data from the ultrasonic sensor for the training of SVM model. Further drive trials were conducted to obtain data to verify the SVM results. A comparison of the accuracy of the predicted fluid level obtained using SVM and the pre-processing filters is provided. It is demonstrated that the ν-SVM model using the RBF kernel function and the Moving Median filter has produced the most accurate outcome compared with the other signal filtration methods in terms of fluid level measurement.

  17. SPECTRAL RECONSTRUCTION BASED ON SVM FOR CROSS CALIBRATION

    Directory of Open Access Journals (Sweden)

    H. Gao

    2017-05-01

    Full Text Available Chinese HY-1C/1D satellites will use a 5nm/10nm-resolutional visible-near infrared(VNIR hyperspectral sensor with the solar calibrator to cross-calibrate with other sensors. The hyperspectral radiance data are composed of average radiance in the sensor’s passbands and bear a spectral smoothing effect, a transform from the hyperspectral radiance data to the 1-nm-resolution apparent spectral radiance by spectral reconstruction need to be implemented. In order to solve the problem of noise cumulation and deterioration after several times of iteration by the iterative algorithm, a novel regression method based on SVM is proposed, which can approach arbitrary complex non-linear relationship closely and provide with better generalization capability by learning. In the opinion of system, the relationship between the apparent radiance and equivalent radiance is nonlinear mapping introduced by spectral response function(SRF, SVM transform the low-dimensional non-linear question into high-dimensional linear question though kernel function, obtaining global optimal solution by virtue of quadratic form. The experiment is performed using 6S-simulated spectrums considering the SRF and SNR of the hyperspectral sensor, measured reflectance spectrums of water body and different atmosphere conditions. The contrastive result shows: firstly, the proposed method is with more reconstructed accuracy especially to the high-frequency signal; secondly, while the spectral resolution of the hyperspectral sensor reduces, the proposed method performs better than the iterative method; finally, the root mean square relative error(RMSRE which is used to evaluate the difference of the reconstructed spectrum and the real spectrum over the whole spectral range is calculated, it decreses by one time at least by proposed method.

  18. LMethyR-SVM: Predict Human Enhancers Using Low Methylated Regions based on Weighted Support Vector Machines.

    Science.gov (United States)

    Xu, Jingting; Hu, Hong; Dai, Yang

    The identification of enhancers is a challenging task. Various types of epigenetic information including histone modification have been utilized in the construction of enhancer prediction models based on a diverse panel of machine learning schemes. However, DNA methylation profiles generated from the whole genome bisulfite sequencing (WGBS) have not been fully explored for their potential in enhancer prediction despite the fact that low methylated regions (LMRs) have been implied to be distal active regulatory regions. In this work, we propose a prediction framework, LMethyR-SVM, using LMRs identified from cell-type-specific WGBS DNA methylation profiles and a weighted support vector machine learning framework. In LMethyR-SVM, the set of cell-type-specific LMRs is further divided into three sets: reliable positive, like positive and likely negative, according to their resemblance to a small set of experimentally validated enhancers in the VISTA database based on an estimated non-parametric density distribution. Then, the prediction model is obtained by solving a weighted support vector machine. We demonstrate the performance of LMethyR-SVM by using the WGBS DNA methylation profiles derived from the human embryonic stem cell type (H1) and the fetal lung fibroblast cell type (IMR90). The predicted enhancers are highly conserved with a reasonable validation rate based on a set of commonly used positive markers including transcription factors, p300 binding and DNase-I hypersensitive sites. In addition, we show evidence that the large fraction of the LMethyR-SVM predicted enhancers are not predicted by ChromHMM in H1 cell type and they are more enriched for the FANTOM5 enhancers. Our work suggests that low methylated regions detected from the WGBS data are useful as complementary resources to histone modification marks in developing models for the prediction of cell-type-specific enhancers.

  19. Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage SVM-Based Classification.

    Science.gov (United States)

    Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu

    2017-09-01

    The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.

  20. Multi-view L2-SVM and its multi-view core vector machine.

    Science.gov (United States)

    Huang, Chengquan; Chung, Fu-lai; Wang, Shitong

    2016-03-01

    In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Short Term Prediction of Freeway Exiting Volume Based on SVM and KNN

    Directory of Open Access Journals (Sweden)

    Xiang Wang

    2015-09-01

    The model results indicate that the proposed algorithm is feasible and accurate. The Mean Absolute Percentage Error is under 10%. When comparing with the results of single KNN or SVM method, the results show that the combination of KNN and SVM can improve the reliability of the prediction significantly. The proposed method can be implemented in the on-line application of exiting volume prediction, which is able to consider different vehicle types.

  2. An SVM-based solution for fault detection in wind turbines.

    Science.gov (United States)

    Santos, Pedro; Villa, Luisa F; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús

    2015-03-09

    Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.

  3. An SVM-Based Solution for Fault Detection in Wind Turbines

    Directory of Open Access Journals (Sweden)

    Pedro Santos

    2015-03-01

    Full Text Available Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.

  4. Prediction of interface residue based on the features of residue interaction network.

    Science.gov (United States)

    Jiao, Xiong; Ranganathan, Shoba

    2017-11-07

    Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Prediction of interactions between viral and host proteins using supervised machine learning methods.

    Directory of Open Access Journals (Sweden)

    Ranjan Kumar Barman

    Full Text Available BACKGROUND: Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs has great implication for therapeutics. METHODS: In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques. RESULTS: Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49% and Random Forest (55.66%. However the specificity of Naïve Bayes was the highest (99.52% as compared with SVM (74% and Random Forest (89.08%. Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively. CONCLUSION: The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV, interacting partners of host

  6. [Non-destructive detection research for hollow heart of potato based on semi-transmission hyperspectral imaging and SVM].

    Science.gov (United States)

    Huang, Tao; Li, Xiao-yu; Xu, Meng-ling; Jin, Rui; Ku, Jing; Xu, Sen-miao; Wu, Zhen-zhong

    2015-01-01

    The quality of potato is directly related to their edible value and industrial value. Hollow heart of potato, as a physiological disease occurred inside the tuber, is difficult to be detected. This paper put forward a non-destructive detection method by using semi-transmission hyperspectral imaging with support vector machine (SVM) to detect hollow heart of potato. Compared to reflection and transmission hyperspectral image, semi-transmission hyperspectral image can get clearer image which contains the internal quality information of agricultural products. In this study, 224 potato samples (149 normal samples and 75 hollow samples) were selected as the research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images (390-1 040 nn) of the potato samples, and then the average spectrum of region of interest were extracted for spectral characteristics analysis. Normalize was used to preprocess the original spectrum, and prediction model were developed based on SVM using all wave bands, the accurate recognition rate of test set is only 87. 5%. In order to simplify the model competitive.adaptive reweighed sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to select important variables from the all 520 spectral variables and 8 variables were selected (454, 601, 639, 664, 748, 827, 874 and 936 nm). 94. 64% of the accurate recognition rate of test set was obtained by using the 8 variables to develop SVM model. Parameter optimization algorithms, including artificial fish swarm algorithm (AFSA), genetic algorithm (GA) and grid search algorithm, were used to optimize the SVM model parameters: penalty parameter c and kernel parameter g. After comparative analysis, AFSA, a new bionic optimization algorithm based on the foraging behavior of fish swarm, was proved to get the optimal model parameter (c=10. 659 1, g=0. 349 7), and the recognition accuracy of 10% were obtained for the AFSA-SVM

  7. Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.

    Science.gov (United States)

    Daberdaku, Sebastian; Ferrari, Carlo

    2018-02-06

    The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction

  8. [Study on application of SVM in prediction of coronary heart disease].

    Science.gov (United States)

    Zhu, Yue; Wu, Jianghua; Fang, Ying

    2013-12-01

    Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.

  9. Quality-Oriented Classification of Aircraft Material Based on SVM

    Directory of Open Access Journals (Sweden)

    Hongxia Cai

    2014-01-01

    Full Text Available The existing material classification is proposed to improve the inventory management. However, different materials have the different quality-related attributes, especially in the aircraft industry. In order to reduce the cost without sacrificing the quality, we propose a quality-oriented material classification system considering the material quality character, Quality cost, and Quality influence. Analytic Hierarchy Process helps to make feature selection and classification decision. We use the improved Kraljic Portfolio Matrix to establish the three-dimensional classification model. The aircraft materials can be divided into eight types, including general type, key type, risk type, and leveraged type. Aiming to improve the classification accuracy of various materials, the algorithm of Support Vector Machine is introduced. Finally, we compare the SVM and BP neural network in the application. The results prove that the SVM algorithm is more efficient and accurate and the quality-oriented material classification is valuable.

  10. Face Verification using MLP and SVM

    OpenAIRE

    Cardinaux, Fabien; Marcel, Sébastien

    2002-01-01

    The performance of machine learning algorithms has steadily improved over the past few years, such as MLP or more recently SVM. In this paper, we compare two successful discriminant machine learning algorithms apply to the problem of face verification: MLP and SVM. These two algorithms are tested on a benchmark database, namely XM2VTS. Results show that a MLP is better than a SVM on this particular task.

  11. LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.

    Science.gov (United States)

    Zhang, Tao; Chen, Wanzhong

    2017-08-01

    Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.

  12. LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran.

    Science.gov (United States)

    Ghaemi, Z; Alimohammadi, A; Farnaghi, M

    2018-04-20

    Due to critical impacts of air pollution, prediction and monitoring of air quality in urban areas are important tasks. However, because of the dynamic nature and high spatio-temporal variability, prediction of the air pollutant concentrations is a complex spatio-temporal problem. Distribution of pollutant concentration is influenced by various factors such as the historical pollution data and weather conditions. Conventional methods such as the support vector machine (SVM) or artificial neural networks (ANN) show some deficiencies when huge amount of streaming data have to be analyzed for urban air pollution prediction. In order to overcome the limitations of the conventional methods and improve the performance of urban air pollution prediction in Tehran, a spatio-temporal system is designed using a LaSVM-based online algorithm. Pollutant concentration and meteorological data along with geographical parameters are continually fed to the developed online forecasting system. Performance of the system is evaluated by comparing the prediction results of the Air Quality Index (AQI) with those of a traditional SVM algorithm. Results show an outstanding increase of speed by the online algorithm while preserving the accuracy of the SVM classifier. Comparison of the hourly predictions for next coming 24 h, with those of the measured pollution data in Tehran pollution monitoring stations shows an overall accuracy of 0.71, root mean square error of 0.54 and coefficient of determination of 0.81. These results are indicators of the practical usefulness of the online algorithm for real-time spatial and temporal prediction of the urban air quality.

  13. A Fault Diagnosis Approach for Gears Based on IMF AR Model and SVM

    Directory of Open Access Journals (Sweden)

    Yu Yang

    2008-05-01

    Full Text Available An accurate autoregressive (AR model can reflect the characteristics of a dynamic system based on which the fault feature of gear vibration signal can be extracted without constructing mathematical model and studying the fault mechanism of gear vibration system, which are experienced by the time-frequency analysis methods. However, AR model can only be applied to stationary signals, while the gear fault vibration signals usually present nonstationary characteristics. Therefore, empirical mode decomposition (EMD, which can decompose the vibration signal into a finite number of intrinsic mode functions (IMFs, is introduced into feature extraction of gear vibration signals as a preprocessor before AR models are generated. On the other hand, by targeting the difficulties of obtaining sufficient fault samples in practice, support vector machine (SVM is introduced into gear fault pattern recognition. In the proposed method in this paper, firstly, vibration signals are decomposed into a finite number of intrinsic mode functions, then the AR model of each IMF component is established; finally, the corresponding autoregressive parameters and the variance of remnant are regarded as the fault characteristic vectors and used as input parameters of SVM classifier to classify the working condition of gears. The experimental analysis results show that the proposed approach, in which IMF AR model and SVM are combined, can identify working condition of gears with a success rate of 100% even in the case of smaller number of samples.

  14. A SVM-based quantitative fMRI method for resting-state functional network detection.

    Science.gov (United States)

    Song, Xiaomu; Chen, Nan-kuei

    2014-09-01

    Resting-state functional magnetic resonance imaging (fMRI) aims to measure baseline neuronal connectivity independent of specific functional tasks and to capture changes in the connectivity due to neurological diseases. Most existing network detection methods rely on a fixed threshold to identify functionally connected voxels under the resting state. Due to fMRI non-stationarity, the threshold cannot adapt to variation of data characteristics across sessions and subjects, and generates unreliable mapping results. In this study, a new method is presented for resting-state fMRI data analysis. Specifically, the resting-state network mapping is formulated as an outlier detection process that is implemented using one-class support vector machine (SVM). The results are refined by using a spatial-feature domain prototype selection method and two-class SVM reclassification. The final decision on each voxel is made by comparing its probabilities of functionally connected and unconnected instead of a threshold. Multiple features for resting-state analysis were extracted and examined using an SVM-based feature selection method, and the most representative features were identified. The proposed method was evaluated using synthetic and experimental fMRI data. A comparison study was also performed with independent component analysis (ICA) and correlation analysis. The experimental results show that the proposed method can provide comparable or better network detection performance than ICA and correlation analysis. The method is potentially applicable to various resting-state quantitative fMRI studies. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  16. Template-based protein-protein docking exploiting pairwise interfacial residue restraints

    NARCIS (Netherlands)

    Xue, Li C; Garcia Lopes Maia Rodrigues, João; Dobbs, Drena; Honavar, Vasant; Bonvin, Alexandre M J J

    2016-01-01

    Although many advanced and sophisticatedab initioapproaches for modeling protein-protein complexes have been proposed in past decades, template-based modeling (TBM) remains the most accurate and widely used approach, given a reliable template is available. However, there are many different ways to

  17. A Study on SVM Based on the Weighted Elitist Teaching-Learning-Based Optimization and Application in the Fault Diagnosis of Chemical Process

    Directory of Open Access Journals (Sweden)

    Cao Junxiang

    2015-01-01

    Full Text Available Teaching-Learning-Based Optimization (TLBO is a new swarm intelligence optimization algorithm that simulates the class learning process. According to such problems of the traditional TLBO as low optimizing efficiency and poor stability, this paper proposes an improved TLBO algorithm mainly by introducing the elite thought in TLBO and adopting different inertia weight decreasing strategies for elite and ordinary individuals of the teacher stage and the student stage. In this paper, the validity of the improved TLBO is verified by the optimizations of several typical test functions and the SVM optimized by the weighted elitist TLBO is used in the diagnosis and classification of common failure data of the TE chemical process. Compared with the SVM combining other traditional optimizing methods, the SVM optimized by the weighted elitist TLBO has a certain improvement in the accuracy of fault diagnosis and classification.

  18. Chemical modifications of therapeutic proteins induced by residual ethylene oxide.

    Science.gov (United States)

    Chen, Louise; Sloey, Christopher; Zhang, Zhongqi; Bondarenko, Pavel V; Kim, Hyojin; Ren, Da; Kanapuram, Sekhar

    2015-02-01

    Ethylene oxide (EtO) is widely used in sterilization of drug product primary containers and medical devices. The impact of residual EtO on protein therapeutics is of significant interest in the biopharmaceutical industry. The potential for EtO to modify individual amino acids in proteins has been previously reported. However, specific identification of EtO adducts in proteins and the effect of residual EtO on the stability of therapeutic proteins has not been reported to date. This paper describes studies of residual EtO with two therapeutic proteins, a PEGylated form of the recombinant human granulocyte colony-stimulating factor (Peg-GCSF) and recombinant human erythropoietin (EPO) formulated with human serum albumin (HSA). Peg-GCSF was filled in an EtO sterilized delivery device and incubated at accelerated stress conditions. Glu-C peptide mapping and LC-MS analyses revealed residual EtO reacted with Peg-GCSF and resulted in EtO modifications at two methionine residues (Met-127 and Met-138). In addition, tryptic peptide mapping and LC-MS analyses revealed residual EtO in plastic vials reacted with HSA in EPO formulation at Met-328 and Cys-34. This paper details the work conducted to understand the effects of residual EtO on the chemical stability of protein therapeutics. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.

  19. Quantitative analysis of glycated albumin in serum based on ATR-FTIR spectrum combined with SiPLS and SVM.

    Science.gov (United States)

    Li, Yuanpeng; Li, Fucui; Yang, Xinhao; Guo, Liu; Huang, Furong; Chen, Zhenqiang; Chen, Xingdan; Zheng, Shifu

    2018-08-05

    A rapid quantitative analysis model for determining the glycated albumin (GA) content based on Attenuated total reflectance (ATR)-Fourier transform infrared spectroscopy (FTIR) combining with linear SiPLS and nonlinear SVM has been developed. Firstly, the real GA content in human serum was determined by GA enzymatic method, meanwhile, the ATR-FTIR spectra of serum samples from the population of health examination were obtained. The spectral data of the whole spectra mid-infrared region (4000-600 cm -1 ) and GA's characteristic region (1800-800 cm -1 ) were used as the research object of quantitative analysis. Secondly, several preprocessing steps including first derivative, second derivative, variable standardization and spectral normalization, were performed. Lastly, quantitative analysis regression models were established by using SiPLS and SVM respectively. The SiPLS modeling results are as follows: root mean square error of cross validation (RMSECV T ) = 0.523 g/L, calibration coefficient (R C ) = 0.937, Root Mean Square Error of Prediction (RMSEP T ) = 0.787 g/L, and prediction coefficient (R P ) = 0.938. The SVM modeling results are as follows: RMSECV T  = 0.0048 g/L, R C  = 0.998, RMSEP T  = 0.442 g/L, and R p  = 0.916. The results indicated that the model performance was improved significantly after preprocessing and optimization of characteristic regions. While modeling performance of nonlinear SVM was considerably better than that of linear SiPLS. Hence, the quantitative analysis model for GA in human serum based on ATR-FTIR combined with SiPLS and SVM is effective. And it does not need sample preprocessing while being characterized by simple operations and high time efficiency, providing a rapid and accurate method for GA content determination. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Soft-sensing Modeling Based on MLS-SVM Inversion for L-lysine Fermentation Processes

    Directory of Open Access Journals (Sweden)

    Bo Wang

    2015-06-01

    Full Text Available A modeling approach 63 based on multiple output variables least squares support vector machine (MLS-SVM inversion is presented by a combination of inverse system and support vector machine theory. Firstly, a dynamic system model is developed based on material balance relation of a fed-batch fermentation process, with which it is analyzed whether an inverse system exists or not, and into which characteristic information of a fermentation process is introduced to set up an extended inversion model. Secondly, an initial extended inversion model is developed off-line by the use of the fitting capacity of MLS-SVM; on-line correction is made by the use of a differential evolution (DE algorithm on the basis of deviation information. Finally, a combined pseudo-linear system is formed by means of a serial connection of a corrected extended inversion model behind the L-lysine fermentation processes; thereby crucial biochemical parameters of a fermentation process could be predicted on-line. The simulation experiment shows that this soft-sensing modeling method features very high prediction precision and can predict crucial biochemical parameters of L-lysine fermentation process very well.

  1. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    Science.gov (United States)

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  2. Anion induced conformational preference of Cα NN motif residues in functional proteins.

    Science.gov (United States)

    Patra, Piya; Ghosh, Mahua; Banerjee, Raja; Chakrabarti, Jaydeb

    2017-12-01

    Among different ligand binding motifs, anion binding C α NN motif consisting of peptide backbone atoms of three consecutive residues are observed to be important for recognition of free anions, like sulphate or biphosphate and participate in different key functions. Here we study the interaction of sulphate and biphosphate with C α NN motif present in different proteins. Instead of total protein, a peptide fragment has been studied keeping C α NN motif flanked in between other residues. We use classical force field based molecular dynamics simulations to understand the stability of this motif. Our data indicate fluctuations in conformational preferences of the motif residues in absence of the anion. The anion gives stability to one of these conformations. However, the anion induced conformational preferences are highly sequence dependent and specific to the type of anion. In particular, the polar residues are more favourable compared to the other residues for recognising the anion. © 2017 Wiley Periodicals, Inc.

  3. Determination of the carmine content based on spectrum fluorescence spectral and PSO-SVM

    Science.gov (United States)

    Wang, Shu-tao; Peng, Tao; Cheng, Qi; Wang, Gui-chuan; Kong, De-ming; Wang, Yu-tian

    2018-03-01

    Carmine is a widely used food pigment in various food and beverage additives. Excessive consumption of synthetic pigment shall do harm to body seriously. The food is generally associated with a variety of colors. Under the simulation context of various food pigments' coexistence, we adopted the technology of fluorescence spectroscopy, together with the PSO-SVM algorithm, so that to establish a method for the determination of carmine content in mixed solution. After analyzing the prediction results of PSO-SVM, we collected a bunch of data: the carmine average recovery rate was 100.84%, the root mean square error of prediction (RMSEP) for 1.03e-04, 0.999 for the correlation coefficient between the model output and the real value of the forecast. Compared with the prediction results of reverse transmission, the correlation coefficient of PSO-SVM was 2.7% higher, the average recovery rate for 0.6%, and the root mean square error was nearly one order of magnitude lower. According to the analysis results, it can effectively avoid the interference caused by pigment with the combination of the fluorescence spectrum technique and PSO-SVM, accurately determining the content of carmine in mixed solution with an effect better than that of BP.

  4. Mild hypothermic culture conditions affect residual host cell protein composition post-Protein A chromatography.

    Science.gov (United States)

    Goey, Cher Hui; Bell, David; Kontoravdi, Cleo

    2018-04-01

    Host cell proteins (HCPs) are endogenous impurities, and their proteolytic and binding properties can compromise the integrity, and, hence, the stability and efficacy of recombinant therapeutic proteins such as monoclonal antibodies (mAbs). Nonetheless, purification of mAbs currently presents a challenge because they often co-elute with certain HCP species during the capture step of protein A affinity chromatography. A Quality-by-Design (QbD) strategy to overcome this challenge involves identifying residual HCPs and tracing their source to the harvested cell culture fluid (HCCF) and the corresponding cell culture operating parameters. Then, problematic HCPs in HCCF may be reduced by cell engineering or culture process optimization. Here, we present experimental results linking cell culture temperature and post-protein A residual HCP profile. We had previously reported that Chinese hamster ovary cell cultures conducted at standard physiological temperature and with a shift to mild hypothermia on day 5 produced HCCF of comparable product titer and HCP concentration, but with considerably different HCP composition. In this study, we show that differences in HCP variety at harvest cascaded to downstream purification where different residual HCPs were present in the two sets of samples post-protein A purification. To detect low-abundant residual HCPs, we designed a looping liquid chromatography-mass spectrometry method with continuous expansion of a preferred, exclude, and targeted peptide list. Mild hypothermic cultures produced 20% more residual HCP species, especially cell membrane proteins, distinct from the control. Critically, we identified that half of the potentially immunogenic residual HCP species were different between the two sets of samples.

  5. Quantitative analysis of residual protein contamination of podiatry instruments reprocessed through local and central decontamination units.

    Science.gov (United States)

    Smith, Gordon Wg; Goldie, Frank; Long, Steven; Lappin, David F; Ramage, Gordon; Smith, Andrew J

    2011-01-10

    The cleaning stage of the instrument decontamination process has come under increased scrutiny due to the increasing complexity of surgical instruments and the adverse affects of residual protein contamination on surgical instruments. Instruments used in the podiatry field have a complex surface topography and are exposed to a wide range of biological contamination. Currently, podiatry instruments are reprocessed locally within surgeries while national strategies are favouring a move toward reprocessing in central facilities. The aim of this study was to determine the efficacy of local and central reprocessing on podiatry instruments by measuring residual protein contamination of instruments reprocessed by both methods. The residual protein of 189 instruments reprocessed centrally and 189 instruments reprocessed locally was determined using a fluorescent assay based on the reaction of proteins with o-phthaldialdehyde/sodium 2-mercaptoethanesulfonate. Residual protein was detected on 72% (n = 136) of instruments reprocessed centrally and 90% (n = 170) of instruments reprocessed locally. Significantly less protein (p podiatry instruments when protein contamination is considered, though no significant difference was found in residual protein between local decontamination unit and central decontamination unit processes for Blacks files. Further research is needed to undertake qualitative identification of protein contamination to identify any cross contamination risks and a standard for acceptable residual protein contamination applicable to different instruments and specialities should be considered as a matter of urgency.

  6. Optimization of Support Vector Machine (SVM) for Object Classification

    Science.gov (United States)

    Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin

    2012-01-01

    The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.

  7. A hybrid particle swarm optimization-SVM classification for automatic cardiac auscultation

    Directory of Open Access Journals (Sweden)

    Prasertsak Charoen

    2017-04-01

    Full Text Available Cardiac auscultation is a method for a doctor to listen to heart sounds, using a stethoscope, for examining the condition of the heart. Automatic cardiac auscultation with machine learning is a promising technique to classify heart conditions without need of doctors or expertise. In this paper, we develop a classification model based on support vector machine (SVM and particle swarm optimization (PSO for an automatic cardiac auscultation system. The model consists of two parts: heart sound signal processing part and a proposed PSO for weighted SVM (WSVM classifier part. In this method, the PSO takes into account the degree of importance for each feature extracted from wavelet packet (WP decomposition. Then, by using principle component analysis (PCA, the features can be selected. The PSO technique is used to assign diverse weights to different features for the WSVM classifier. Experimental results show that both continuous and binary PSO-WSVM models achieve better classification accuracy on the heart sound samples, by reducing system false negatives (FNs, compared to traditional SVM and genetic algorithm (GA based SVM.

  8. Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.

    Science.gov (United States)

    Huang, Xiaolin; Shi, Lei; Suykens, Johan A K

    2017-07-01

    Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.

  9. GenSVM: a generalized multiclass support vector machine

    NARCIS (Netherlands)

    G.J.J. van den Burg (Gertjan); P.J.F. Groenen (Patrick)

    2016-01-01

    textabstractTraditional extensions of the binary support vector machine (SVM) to multiclass problems are either heuristics or require solving a large dual optimization problem. Here, a generalized multiclass SVM is proposed called GenSVM. In this method classification boundaries for a K-class

  10. Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.

    Science.gov (United States)

    Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing

    2015-01-01

    Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.

  11. COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

    Directory of Open Access Journals (Sweden)

    M. J. Baheti

    2012-01-01

    Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.

  12. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications.

    Science.gov (United States)

    Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu

    2017-01-01

    This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.

  13. AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins.

    Directory of Open Access Journals (Sweden)

    Hon Cheng Muh

    Full Text Available Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928+/-0.004 and Matthew's correlation coefficient MCC = 0.738, performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at (http://tiger.dbs.nus.edu.sg/AllerHunter.

  14. Introducing instrumental variables in the LS-SVM based identification framework

    NARCIS (Netherlands)

    Laurain, V.; Zheng, W-X.; Toth, R.

    2011-01-01

    Least-Squares Support Vector Machines (LS-SVM) represent a promising approach to identify nonlinear systems via nonparametric estimation of the nonlinearities in a computationally and stochastically attractive way. All the methods dedicated to the solution of this problem rely on the minimization of

  15. Coevolving residues of (beta/alpha)(8)-barrel proteins play roles in stabilizing active site architecture and coordinating protein dynamics.

    Science.gov (United States)

    Shen, Hongbo; Xu, Feng; Hu, Hairong; Wang, Feifei; Wu, Qi; Huang, Qiang; Wang, Honghai

    2008-12-01

    Indole-3-glycerol phosphate synthase (IGPS) is a representative of (beta/alpha)(8)-barrel proteins-the most common enzyme fold in nature. To better understand how the constituent amino-acids work together to define the structure and to facilitate the function, we investigated the evolutionary and dynamical coupling of IGPS residues by combining statistical coupling analysis (SCA) and molecular dynamics (MD) simulations. The coevolving residues identified by the SCA were found to form a network which encloses the active site completely. The MD simulations showed that these coevolving residues are involved in the correlated and anti-correlated motions. The correlated residues are within van der Waals contact and appear to maintain the active site architecture; the anti-correlated residues are mainly distributed on opposite sides of the catalytic cavity and coordinate the motions likely required for the substrate entry and product release. Our findings might have broad implications for proteins with the highly conserved (betaalpha)(8)-barrel in assessing the roles of amino-acids that are moderately conserved and not directly involved in the active site of the (beta/alpha)(8)-barrel. The results of this study could also provide useful information for further exploring the specific residue motions for the catalysis and protein design based on the (beta/alpha)(8)-barrel scaffold.

  16. Nonlinear Time Series Prediction Using LS-SVM with Chaotic Mutation Evolutionary Programming for Parameter Optimization

    International Nuclear Information System (INIS)

    Xu Ruirui; Chen Tianlun; Gao Chengfeng

    2006-01-01

    Nonlinear time series prediction is studied by using an improved least squares support vector machine (LS-SVM) regression based on chaotic mutation evolutionary programming (CMEP) approach for parameter optimization. We analyze how the prediction error varies with different parameters (σ, γ) in LS-SVM. In order to select appropriate parameters for the prediction model, we employ CMEP algorithm. Finally, Nasdaq stock data are predicted by using this LS-SVM regression based on CMEP, and satisfactory results are obtained.

  17. Quantitative analysis of residual protein contamination of podiatry instruments reprocessed through local and central decontamination units

    Directory of Open Access Journals (Sweden)

    Ramage Gordon

    2011-01-01

    Full Text Available Abstract Background The cleaning stage of the instrument decontamination process has come under increased scrutiny due to the increasing complexity of surgical instruments and the adverse affects of residual protein contamination on surgical instruments. Instruments used in the podiatry field have a complex surface topography and are exposed to a wide range of biological contamination. Currently, podiatry instruments are reprocessed locally within surgeries while national strategies are favouring a move toward reprocessing in central facilities. The aim of this study was to determine the efficacy of local and central reprocessing on podiatry instruments by measuring residual protein contamination of instruments reprocessed by both methods. Methods The residual protein of 189 instruments reprocessed centrally and 189 instruments reprocessed locally was determined using a fluorescent assay based on the reaction of proteins with o-phthaldialdehyde/sodium 2-mercaptoethanesulfonate. Results Residual protein was detected on 72% (n = 136 of instruments reprocessed centrally and 90% (n = 170 of instruments reprocessed locally. Significantly less protein (p Conclusions Overall, the results show the superiority of central reprocessing for complex podiatry instruments when protein contamination is considered, though no significant difference was found in residual protein between local decontamination unit and central decontamination unit processes for Blacks files. Further research is needed to undertake qualitative identification of protein contamination to identify any cross contamination risks and a standard for acceptable residual protein contamination applicable to different instruments and specialities should be considered as a matter of urgency.

  18. SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518

  19. Geographical traceability of wild Boletus edulis based on data fusion of FT-MIR and ICP-AES coupled with data mining methods (SVM)

    Science.gov (United States)

    Li, Yun; Zhang, Ji; Li, Tao; Liu, Honggao; Li, Jieqing; Wang, Yuanzhong

    2017-04-01

    In this work, the data fusion strategy of Fourier transform mid infrared (FT-MIR) spectroscopy and inductively coupled plasma-atomic emission spectrometry (ICP-AES) was used in combination with Support Vector Machine (SVM) to determine the geographic origin of Boletus edulis collected from nine regions of Yunnan Province in China. Firstly, competitive adaptive reweighted sampling (CARS) was used for selecting an optimal combination of key wavenumbers of second derivative FT-MIR spectra, and thirteen elements were sorted with variable importance in projection (VIP) scores. Secondly, thirteen subsets of multi-elements with the best VIP score were generated and each subset was used to fuse with FT-MIR. Finally, the classification models were established by SVM, and the combination of parameter C and γ (gamma) of SVM models was calculated by the approaches of grid search (GS) and genetic algorithm (GA). The results showed that both GS-SVM and GA-SVM models achieved good performances based on the #9 subset and the prediction accuracy in calibration and validation sets of the two models were 81.40% and 90.91%, correspondingly. In conclusion, it indicated that the data fusion strategy of FT-MIR and ICP-AES coupled with the algorithm of SVM can be used as a reliable tool for accurate identification of B. edulis, and it can provide a useful way of thinking for the quality control of edible mushrooms.

  20. Entropy Transfer between Residue Pairs and Allostery in Proteins: Quantifying Allosteric Communication in Ubiquitin.

    Directory of Open Access Journals (Sweden)

    Aysima Hacisuleyman

    2017-01-01

    Full Text Available It has recently been proposed by Gunasakaran et al. that allostery may be an intrinsic property of all proteins. Here, we develop a computational method that can determine and quantify allosteric activity in any given protein. Based on Schreiber's transfer entropy formulation, our approach leads to an information transfer landscape for the protein that shows the presence of entropy sinks and sources and explains how pairs of residues communicate with each other using entropy transfer. The model can identify the residues that drive the fluctuations of others. We apply the model to Ubiquitin, whose allosteric activity has not been emphasized until recently, and show that there are indeed systematic pathways of entropy and information transfer between residues that correlate well with the activities of the protein. We use 600 nanosecond molecular dynamics trajectories for Ubiquitin and its complex with human polymerase iota and evaluate entropy transfer between all pairs of residues of Ubiquitin and quantify the binding susceptibility changes upon complex formation. We explain the complex formation propensities of Ubiquitin in terms of entropy transfer. Important residues taking part in allosteric communication in Ubiquitin predicted by our approach are in agreement with results of NMR relaxation dispersion experiments. Finally, we show that time delayed correlation of fluctuations of two interacting residues possesses an intrinsic causality that tells which residue controls the interaction and which one is controlled. Our work shows that time delayed correlations, entropy transfer and causality are the required new concepts for explaining allosteric communication in proteins.

  1. Entropy Transfer between Residue Pairs and Allostery in Proteins: Quantifying Allosteric Communication in Ubiquitin.

    Science.gov (United States)

    Hacisuleyman, Aysima; Erman, Burak

    2017-01-01

    It has recently been proposed by Gunasakaran et al. that allostery may be an intrinsic property of all proteins. Here, we develop a computational method that can determine and quantify allosteric activity in any given protein. Based on Schreiber's transfer entropy formulation, our approach leads to an information transfer landscape for the protein that shows the presence of entropy sinks and sources and explains how pairs of residues communicate with each other using entropy transfer. The model can identify the residues that drive the fluctuations of others. We apply the model to Ubiquitin, whose allosteric activity has not been emphasized until recently, and show that there are indeed systematic pathways of entropy and information transfer between residues that correlate well with the activities of the protein. We use 600 nanosecond molecular dynamics trajectories for Ubiquitin and its complex with human polymerase iota and evaluate entropy transfer between all pairs of residues of Ubiquitin and quantify the binding susceptibility changes upon complex formation. We explain the complex formation propensities of Ubiquitin in terms of entropy transfer. Important residues taking part in allosteric communication in Ubiquitin predicted by our approach are in agreement with results of NMR relaxation dispersion experiments. Finally, we show that time delayed correlation of fluctuations of two interacting residues possesses an intrinsic causality that tells which residue controls the interaction and which one is controlled. Our work shows that time delayed correlations, entropy transfer and causality are the required new concepts for explaining allosteric communication in proteins.

  2. Relationship between hot spot residues and ligand binding hot spots in protein-protein interfaces.

    Science.gov (United States)

    Zerbe, Brandon S; Hall, David R; Vajda, Sandor; Whitty, Adrian; Kozakov, Dima

    2012-08-27

    In the context of protein-protein interactions, the term "hot spot" refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research, a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening.

  3. A SVM-based method for sentiment analysis in Persian language

    Science.gov (United States)

    Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana

    2013-03-01

    Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.

  4. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications

    Science.gov (United States)

    Lou, Xin Yuan; Sun, Lin Fu

    2017-01-01

    This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem. PMID:28369096

  5. Carbon Nanotubes Facilitate Oxidation of Cysteine Residues of Proteins.

    Science.gov (United States)

    Hirano, Atsushi; Kameda, Tomoshi; Wada, Momoyo; Tanaka, Takeshi; Kataura, Hiromichi

    2017-10-19

    The adsorption of proteins onto nanoparticles such as carbon nanotubes (CNTs) governs the early stages of nanoparticle uptake into biological systems. Previous studies regarding these adsorption processes have primarily focused on the physical interactions between proteins and nanoparticles. In this study, using reduced lysozyme and intact human serum albumin in aqueous solutions, we demonstrated that CNTs interact chemically with proteins. The CNTs induce the oxidation of cysteine residues of the proteins, which is accounted for by charge transfer from the sulfhydryl groups of the cysteine residues to the CNTs. The redox reaction simultaneously suppresses the intermolecular association of proteins via disulfide bonds. These results suggest that CNTs can affect the folding and oxidation degree of proteins in biological systems such as blood and cytosol.

  6. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    Science.gov (United States)

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  7. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

    Science.gov (United States)

    Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

    2017-01-01

    DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  8. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features

    Directory of Open Access Journals (Sweden)

    Rianon Zaman

    2017-01-01

    Full Text Available DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  9. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    Science.gov (United States)

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Geographical traceability of wild Boletus edulis based on data fusion of FT-MIR and ICP-AES coupled with data mining methods (SVM).

    Science.gov (United States)

    Li, Yun; Zhang, Ji; Li, Tao; Liu, Honggao; Li, Jieqing; Wang, Yuanzhong

    2017-04-15

    In this work, the data fusion strategy of Fourier transform mid infrared (FT-MIR) spectroscopy and inductively coupled plasma-atomic emission spectrometry (ICP-AES) was used in combination with Support Vector Machine (SVM) to determine the geographic origin of Boletus edulis collected from nine regions of Yunnan Province in China. Firstly, competitive adaptive reweighted sampling (CARS) was used for selecting an optimal combination of key wavenumbers of second derivative FT-MIR spectra, and thirteen elements were sorted with variable importance in projection (VIP) scores. Secondly, thirteen subsets of multi-elements with the best VIP score were generated and each subset was used to fuse with FT-MIR. Finally, the classification models were established by SVM, and the combination of parameter C and γ (gamma) of SVM models was calculated by the approaches of grid search (GS) and genetic algorithm (GA). The results showed that both GS-SVM and GA-SVM models achieved good performances based on the #9 subset and the prediction accuracy in calibration and validation sets of the two models were 81.40% and 90.91%, correspondingly. In conclusion, it indicated that the data fusion strategy of FT-MIR and ICP-AES coupled with the algorithm of SVM can be used as a reliable tool for accurate identification of B. edulis, and it can provide a useful way of thinking for the quality control of edible mushrooms. Copyright © 2017. Published by Elsevier B.V.

  11. A novel application of wavelet based SVM to transient phenomena identification of power transformers

    International Nuclear Information System (INIS)

    Jazebi, S.; Vahidi, B.; Jannati, M.

    2011-01-01

    A novel differential protection approach is introduced in the present paper. The proposed scheme is a combination of Support Vector Machine (SVM) and wavelet transform theories. Two common transients such as magnetizing inrush current and internal fault are considered. A new wavelet feature is extracted which reduces the computational cost and enhances the discrimination accuracy of SVM. Particle swarm optimization technique (PSO) has been applied to tune SVM parameters. The suitable performance of this method is demonstrated by simulation of different faults and switching conditions on a power transformer in PSCAD/EMTDC software. The method has the advantages of high accuracy and low computational burden (less than a quarter of a cycle). The other advantage is that the method is not dependent on a specific threshold. Sympathetic and recovery inrush currents also have been simulated and investigated. Results show that the proposed method could remain stable even in noisy environments.

  12. Practical analysis of specificity-determining residues in protein families.

    Science.gov (United States)

    Chagoyen, Mónica; García-Martín, Juan A; Pazos, Florencio

    2016-03-01

    Determining the residues that are important for the molecular activity of a protein is a topic of broad interest in biomedicine and biotechnology. This knowledge can help understanding the protein's molecular mechanism as well as to fine-tune its natural function eventually with biotechnological or therapeutic implications. Some of the protein residues are essential for the function common to all members of a family of proteins, while others explain the particular specificities of certain subfamilies (like binding on different substrates or cofactors and distinct binding affinities). Owing to the difficulty in experimentally determining them, a number of computational methods were developed to detect these functional residues, generally known as 'specificity-determining positions' (or SDPs), from a collection of homologous protein sequences. These methods are mature enough for being routinely used by molecular biologists in directing experiments aimed at getting insight into the functional specificity of a family of proteins and eventually modifying it. In this review, we summarize some of the recent discoveries achieved through SDP computational identification in a number of relevant protein families, as well as the main approaches and software tools available to perform this type of analysis. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  13. On the analysis of protein-protein interactions via knowledge-based potentials for the prediction of protein-protein docking

    DEFF Research Database (Denmark)

    Feliu, Elisenda; Aloy, Patrick; Oliva, Baldo

    2011-01-01

    Development of effective methods to screen binary interactions obtained by rigid-body protein-protein docking is key for structure prediction of complexes and for elucidating physicochemical principles of protein-protein binding. We have derived empirical knowledge-based potential functions for s...... and with independence of the partner. This information is encoded at the residue level and could be easily incorporated in the initial grid scoring for Fast Fourier Transform rigid-body docking methods.......Development of effective methods to screen binary interactions obtained by rigid-body protein-protein docking is key for structure prediction of complexes and for elucidating physicochemical principles of protein-protein binding. We have derived empirical knowledge-based potential functions...... for selecting rigid-body docking poses. These potentials include the energetic component that provides the residues with a particular secondary structure and surface accessibility. These scoring functions have been tested on a state-of-art benchmark dataset and on a decoy dataset of permanent interactions. Our...

  14. Sales Growth Rate Forecasting Using Improved PSO and SVM

    Directory of Open Access Journals (Sweden)

    Xibin Wang

    2014-01-01

    Full Text Available Accurate forecast of the sales growth rate plays a decisive role in determining the amount of advertising investment. In this study, we present a preclassification and later regression based method optimized by improved particle swarm optimization (IPSO for sales growth rate forecasting. We use support vector machine (SVM as a classification model. The nonlinear relationship in sales growth rate forecasting is efficiently represented by SVM, while IPSO is optimizing the training parameters of SVM. IPSO addresses issues of traditional PSO, such as relapsing into local optimum, slow convergence speed, and low convergence precision in the later evolution. We performed two experiments; firstly, three classic benchmark functions are used to verify the validity of the IPSO algorithm against PSO. Having shown IPSO outperform PSO in convergence speed, precision, and escaping local optima, in our second experiment, we apply IPSO to the proposed model. The sales growth rate forecasting cases are used to testify the forecasting performance of proposed model. According to the requirements and industry knowledge, the sample data was first classified to obtain types of the test samples. Next, the values of the test samples were forecast using the SVM regression algorithm. The experimental results demonstrate that the proposed model has good forecasting performance.

  15. A Method of Particle Swarm Optimized SVM Hyper-spectral Remote Sensing Image Classification

    International Nuclear Information System (INIS)

    Liu, Q J; Jing, L H; Wang, L M; Lin, Q Z

    2014-01-01

    Support Vector Machine (SVM) has been proved to be suitable for classification of remote sensing image and proposed to overcome the Hughes phenomenon. Hyper-spectral sensors are intrinsically designed to discriminate among a broad range of land cover classes which may lead to high computational time in SVM mutil-class algorithms. Model selection for SVM involving kernel and the margin parameter values selection which is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyper-spectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, particle swarm algorithm is introduced to the optimal selection of SVM (PSSVM) kernel parameter σ and margin parameter C to improve the modelling efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for evaluating the novel PSSVM, as well as traditional SVM classifier with general Grid-Search cross-validation method (GSSVM). And then, evaluation indexes including SVM model training time, classification Overall Accuracy (OA) and Kappa index of both PSSVM and GSSVM are all analyzed quantitatively. It is demonstrated that OA of PSSVM on test samples and whole image are 85% and 82%, the differences with that of GSSVM are both within 0.08% respectively. And Kappa indexes reach 0.82 and 0.77, the differences with that of GSSVM are both within 0.001. While the modelling time of PSSVM can be only 1/10 of that of GSSVM, and the modelling. Therefore, PSSVM is an fast and accurate algorithm for hyper-spectral image classification and is superior to GSSVM

  16. A Soluble, Folded Protein without Charged Amino Acid Residues

    DEFF Research Database (Denmark)

    Højgaard, Casper; Kofoed, Christian; Espersen, Roall

    2016-01-01

    side chains can maintain solubility, stability, and function. As a model, we used a cellulose-binding domain from Cellulomonas fimi, which, among proteins of more than 100 amino acids, presently is the least charged in the Protein Data Bank, with a total of only four titratable residues. We find......Charges are considered an integral part of protein structure and function, enhancing solubility and providing specificity in molecular interactions. We wished to investigate whether charged amino acids are indeed required for protein biogenesis and whether a protein completely free of titratable...... that the protein shows a surprising resilience toward extremes of pH, demonstrating stability and function (cellulose binding) in the pH range from 2 to 11. To ask whether the four charged residues present were required for these properties of this protein, we altered them to nontitratable ones. Remarkably...

  17. A DWT and SVM based method for rolling element bearing fault diagnosis and its comparison with Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Sunil Tyagi

    2017-04-01

    Full Text Available A classification technique using Support Vector Machine (SVM classifier for detection of rolling element bearing fault is presented here.  The SVM was fed from features that were extracted from of vibration signals obtained from experimental setup consisting of rotating driveline that was mounted on rolling element bearings which were run in normal and with artificially faults induced conditions. The time-domain vibration signals were divided into 40 segments and simple features such as peaks in time domain and spectrum along with statistical features such as standard deviation, skewness, kurtosis etc. were extracted. Effectiveness of SVM classifier was compared with the performance of Artificial Neural Network (ANN classifier and it was found that the performance of SVM classifier is superior to that of ANN. The effect of pre-processing of the vibration signal by Discreet Wavelet Transform (DWT prior to feature extraction is also studied and it is shown that pre-processing of vibration signal with DWT enhances the effectiveness of both ANN and SVM classifiers. It has been demonstrated from experiment results that performance of SVM classifier is better than ANN in detection of bearing condition and pre-processing the vibration signal with DWT improves the performance of SVM classifier.

  18. An intelligent framework for medical image retrieval using MDCT and multi SVM.

    Science.gov (United States)

    Balan, J A Alex Rajju; Rajan, S Edward

    2014-01-01

    Volumes of medical images are rapidly generated in medical field and to manage them effectively has become a great challenge. This paper studies the development of innovative medical image retrieval based on texture features and accuracy. The objective of the paper is to analyze the image retrieval based on diagnosis of healthcare management systems. This paper traces the development of innovative medical image retrieval to estimate both the image texture features and accuracy. The texture features of medical images are extracted using MDCT and multi SVM. Both the theoretical approach and the simulation results revealed interesting observations and they were corroborated using MDCT coefficients and SVM methodology. All attempts to extract the data about the image in response to the query has been computed successfully and perfect image retrieval performance has been obtained. Experimental results on a database of 100 trademark medical images show that an integrated texture feature representation results in 98% of the images being retrieved using MDCT and multi SVM. Thus we have studied a multiclassification technique based on SVM which is prior suitable for medical images. The results show the retrieval accuracy of 98%, 99% for different sets of medical images with respect to the class of image.

  19. DCS-SVM: a novel semi-automated method for human brain MR image segmentation.

    Science.gov (United States)

    Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi

    2017-11-27

    In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.

  20. Characterization of protein and carbohydrate mid-IR spectral features in crop residues

    Science.gov (United States)

    Xin, Hangshu; Zhang, Yonggen; Wang, Mingjun; Li, Zhongyu; Wang, Zhibo; Yu, Peiqiang

    2014-08-01

    To the best of our knowledge, a few studies have been conducted on inherent structure spectral traits related to biopolymers of crop residues. The objective of this study was to characterize protein and carbohydrate structure spectral features of three field crop residues (rice straw, wheat straw and millet straw) in comparison with two crop vines (peanut vine and pea vine) by using Fourier transform infrared spectroscopy (FTIR) technique with attenuated total reflectance (ATR). Also, multivariate analyses were performed on spectral data sets within the regions mainly related to protein and carbohydrate in this study. The results showed that spectral differences existed in mid-IR peak intensities that are mainly related to protein and carbohydrate among these crop residue samples. With regard to protein spectral profile, peanut vine showed the greatest mid-IR band intensities that are related to protein amide and protein secondary structures, followed by pea vine and the rest three field crop straws. The crop vines had 48-134% higher spectral band intensity than the grain straws in spectral features associated with protein. Similar trends were also found in the bands that are mainly related to structural carbohydrates (such as cellulosic compounds). However, the field crop residues had higher peak intensity in total carbohydrates region than the crop vines. Furthermore, spectral ratios varied among the residue samples, indicating that these five crop residues had different internal structural conformation. However, multivariate spectral analyses showed that structural similarities still exhibited among crop residues in the regions associated with protein biopolymers and carbohydrate. Further study is needed to find out whether there is any relationship between spectroscopic information and nutrition supply in various kinds of crop residue when fed to animals.

  1. a Comparison Study of Different Kernel Functions for Svm-Based Classification of Multi-Temporal Polarimetry SAR Data

    Science.gov (United States)

    Yekkehkhany, B.; Safari, A.; Homayouni, S.; Hasanlou, M.

    2014-10-01

    In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.

  2. Comparative Study on KNN and SVM Based Weather Classification Models for Day Ahead Short Term Solar PV Power Forecasting

    Directory of Open Access Journals (Sweden)

    Fei Wang

    2017-12-01

    Full Text Available Accurate solar photovoltaic (PV power forecasting is an essential tool for mitigating the negative effects caused by the uncertainty of PV output power in systems with high penetration levels of solar PV generation. Weather classification based modeling is an effective way to increase the accuracy of day-ahead short-term (DAST solar PV power forecasting because PV output power is strongly dependent on the specific weather conditions in a given time period. However, the accuracy of daily weather classification relies on both the applied classifiers and the training data. This paper aims to reveal how these two factors impact the classification performance and to delineate the relation between classification accuracy and sample dataset scale. Two commonly used classification methods, K-nearest neighbors (KNN and support vector machines (SVM are applied to classify the daily local weather types for DAST solar PV power forecasting using the operation data from a grid-connected PV plant in Hohhot, Inner Mongolia, China. We assessed the performance of SVM and KNN approaches, and then investigated the influences of sample scale, the number of categories, and the data distribution in different categories on the daily weather classification results. The simulation results illustrate that SVM performs well with small sample scale, while KNN is more sensitive to the length of the training dataset and can achieve higher accuracy than SVM with sufficient samples.

  3. Assessment of ANN and SVM models for estimating normal direct irradiation (H_b)

    International Nuclear Information System (INIS)

    Santos, Cícero Manoel dos; Escobedo, João Francisco; Teramoto, Érico Tadao; Modenese Gorla da Silva, Silvia Helena

    2016-01-01

    Highlights: • The performance of SVM and ANN in estimating Normal Direct Irradiation (H_b) was evaluated. • 12 models using different input variables are developed (hourly and daily partitions). • The most relevant input variables for DNI are kt, H_s_c and insolation ratio (r′ = n/N). • Support Vector Machine (SVM) provides accurate estimates and outperforms the Artificial Neural Network (ANN). - Abstract: This study evaluates the estimation of hourly and daily normal direct irradiation (H_b) using machine learning techniques (ML): Artificial Neural Network (ANN) and Support Vector Machine (SVM). Time series of different meteorological variables measured over thirteen years in Botucatu were used for training and validating ANN and SVM. Seven different sets of input variables were tested and evaluated, which were chosen based on statistical models reported in the literature. Relative Mean Bias Error (rMBE), Relative Root Mean Square Error (rRMSE), determination coefficient (R"2) and “d” Willmott index were used to evaluate ANN and SVM models. When compared to statistical models which use the same set of input variables (R"2 between 0.22 and 0.78), ANN and SVM show higher values of R"2 (hourly models between 0.52 and 0.88; daily models between 0.42 and 0.91). Considering the input variables, atmospheric transmissivity of global radiation (kt), integrated solar constant (H_s_c) and insolation ratio (n/N, n is sunshine duration and N is photoperiod) were the most relevant in ANN and SVM models. The rMBE and rRMSE values in the two time partitions of SVM models are lower than those obtained with ANN. Hourly ANN and SVM models have higher rRMSE values than daily models. Optimal performance with hourly models was obtained with ANN4"h (rMBE = 12.24%, rRMSE = 23.99% and “d” = 0.96) and SVM4"h (rMBE = 1.75%, rRMSE = 20.10% and “d” = 0.96). Optimal performance with daily models was obtained with ANN2"d (rMBE = −3.09%, rRMSE = 18.95% and “d” = 0

  4. Optimization of elution salt concentration in stepwise elution of protein chromatography using linear gradient elution data. Reducing residual protein A by cation-exchange chromatography in monoclonal antibody purification.

    Science.gov (United States)

    Ishihara, Takashi; Kadoya, Toshihiko; Endo, Naomi; Yamamoto, Shuichi

    2006-05-05

    Our simple method for optimization of the elution salt concentration in stepwise elution was applied to the actual protein separation system, which involves several difficulties such as detection of the target. As a model separation system, reducing residual protein A by cation-exchange chromatography in human monoclonal antibody (hMab) purification was chosen. We carried out linear gradient elution experiments and obtained the data for the peak salt concentration of hMab and residual protein A, respectively. An enzyme-linked immunosorbent assay was applied to the measurement of the residual protein A. From these data, we calculated the distribution coefficient of the hMab and the residual protein A as a function of salt concentration. The optimal salt concentration of stepwise elution to reduce the residual protein A from the hMab was determined based on the relationship between the distribution coefficient and the salt concentration. Using the optimized condition, we successfully performed the separation, resulting in high recovery of hMab and the elimination of residual protein A.

  5. CNN-SVM for Microvascular Morphological Type Recognition with Data Augmentation.

    Science.gov (United States)

    Xue, Di-Xiu; Zhang, Rong; Feng, Hui; Wang, Ya-Lei

    2016-01-01

    This paper focuses on the problem of feature extraction and the classification of microvascular morphological types to aid esophageal cancer detection. We present a patch-based system with a hybrid SVM model with data augmentation for intraepithelial papillary capillary loop recognition. A greedy patch-generating algorithm and a specialized CNN named NBI-Net are designed to extract hierarchical features from patches. We investigate a series of data augmentation techniques to progressively improve the prediction invariance of image scaling and rotation. For classifier boosting, SVM is used as an alternative to softmax to enhance generalization ability. The effectiveness of CNN feature representation ability is discussed for a set of widely used CNN models, including AlexNet, VGG-16, and GoogLeNet. Experiments are conducted on the NBI-ME dataset. The recognition rate is up to 92.74% on the patch level with data augmentation and classifier boosting. The results show that the combined CNN-SVM model beats models of traditional features with SVM as well as the original CNN with softmax. The synthesis results indicate that our system is able to assist clinical diagnosis to a certain extent.

  6. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    Science.gov (United States)

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of PSVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several

  7. SVM Pixel Classification on Colour Image Segmentation

    Science.gov (United States)

    Barui, Subhrajit; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.

    2018-04-01

    The aim of image segmentation is to simplify the representation of an image with the help of cluster pixels into something meaningful to analyze. Segmentation is typically used to locate boundaries and curves in an image, precisely to label every pixel in an image to give each pixel an independent identity. SVM pixel classification on colour image segmentation is the topic highlighted in this paper. It holds useful application in the field of concept based image retrieval, machine vision, medical imaging and object detection. The process is accomplished step by step. At first we need to recognize the type of colour and the texture used as an input to the SVM classifier. These inputs are extracted via local spatial similarity measure model and Steerable filter also known as Gabon Filter. It is then trained by using FCM (Fuzzy C-Means). Both the pixel level information of the image and the ability of the SVM Classifier undergoes some sophisticated algorithm to form the final image. The method has a well developed segmented image and efficiency with respect to increased quality and faster processing of the segmented image compared with the other segmentation methods proposed earlier. One of the latest application result is the Light L16 camera.

  8. Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

    Science.gov (United States)

    Ganapathy, S.; Yogesh, P.; Kannan, A.

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036

  9. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Science.gov (United States)

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data.

    Science.gov (United States)

    Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li

    2011-02-16

    Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.

  11. Time Reversal Reconstruction Algorithm Based on PSO Optimized SVM Interpolation for Photoacoustic Imaging

    Directory of Open Access Journals (Sweden)

    Mingjian Sun

    2015-01-01

    Full Text Available Photoacoustic imaging is an innovative imaging technique to image biomedical tissues. The time reversal reconstruction algorithm in which a numerical model of the acoustic forward problem is run backwards in time is widely used. In the paper, a time reversal reconstruction algorithm based on particle swarm optimization (PSO optimized support vector machine (SVM interpolation method is proposed for photoacoustics imaging. Numerical results show that the reconstructed images of the proposed algorithm are more accurate than those of the nearest neighbor interpolation, linear interpolation, and cubic convolution interpolation based time reversal algorithm, which can provide higher imaging quality by using significantly fewer measurement positions or scanning times.

  12. Novel Hybrid of LS-SVM and Kalman Filter for GPS/INS Integration

    Science.gov (United States)

    Xu, Zhenkai; Li, Yong; Rizos, Chris; Xu, Xiaosu

    Integration of Global Positioning System (GPS) and Inertial Navigation System (INS) technologies can overcome the drawbacks of the individual systems. One of the advantages is that the integrated solution can provide continuous navigation capability even during GPS outages. However, bridging the GPS outages is still a challenge when Micro-Electro-Mechanical System (MEMS) inertial sensors are used. Methods being currently explored by the research community include applying vehicle motion constraints, optimal smoother, and artificial intelligence (AI) techniques. In the research area of AI, the neural network (NN) approach has been extensively utilised up to the present. In an NN-based integrated system, a Kalman filter (KF) estimates position, velocity and attitude errors, as well as the inertial sensor errors, to output navigation solutions while GPS signals are available. At the same time, an NN is trained to map the vehicle dynamics with corresponding KF states, and to correct INS measurements when GPS measurements are unavailable. To achieve good performance it is critical to select suitable quality and an optimal number of samples for the NN. This is sometimes too rigorous a requirement which limits real world application of NN-based methods.The support vector machine (SVM) approach is based on the structural risk minimisation principle, instead of the minimised empirical error principle that is commonly implemented in an NN. The SVM can avoid local minimisation and over-fitting problems in an NN, and therefore potentially can achieve a higher level of global performance. This paper focuses on the least squares support vector machine (LS-SVM), which can solve highly nonlinear and noisy black-box modelling problems. This paper explores the application of the LS-SVM to aid the GPS/INS integrated system, especially during GPS outages. The paper describes the principles of the LS-SVM and of the KF hybrid method, and introduces the LS-SVM regression algorithm. Field

  13. Relationships between residue Voronoi volume and sequence conservation in proteins.

    Science.gov (United States)

    Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung

    2018-02-01

    Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. COMPARISON OF PERFORMANCES OF DIFFERENT SVM IMPLEMENTATIONS WHEN USED FOR AUTOMATED EVALUATION OF DESCRIPTIVE ANSWERS

    Directory of Open Access Journals (Sweden)

    C. Sunil Kumar

    2015-04-01

    Full Text Available In this paper, we studied the performances of models built using various SVM implementations during the multiclass classification task of automated evaluation of descriptive answers. The performances were evaluated on five datasets each with 900 samples and with each of the datasets treated using symmetric uncertainty feature selection filter. We quantitatively analyzed the best SVM implementation technique from amongst the 17 different SVM implementation combinations derived by using various SVM classifier libraries, SVM types and Kernel methods. Accuracy, F Score, Kappa and Area under ROC curve are used as model evaluation metrics in order to evaluate the models and rank them according to their performances. Based on the results, we derived the conclusion that SMO classifier when used with Polynomial kernel is the overall best performing classifier applicable for auto evaluation of descriptive answers.

  15. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi; Kandaswamy, Krishna Kumar Umar; Chou -, Kuochen; Vivekanandan, Saravanan; Kolatkar, Prasanna R.

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  16. Thermodynamic Effects of Replacements of Pro Residues in Helix Interiors of Maltose-Binding Protein

    OpenAIRE

    Prajapati, RS; Lingaraju, GM; Bacchawat, Kiran; Surolia, Avadhesha; Varadarajan, Raghavan

    2003-01-01

    Introduction of Pro residues into helix interiors results in protein destabilization. It is currently unclear if the converse substitution (i.e., replacement of Pro residues that naturally occur in helix interiors would be stabilizing). Maltose-binding protein is a large 370-amino acid protein that contains 21 Pro residues. Of these, three nonconserved residues (P48, P133, and P159) occur at helix interiors. Each of the residues was replaced with Ala and Ser. Stabilities were characterized by...

  17. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    Science.gov (United States)

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  18. A linear-RBF multikernel SVM to classify big text corpora.

    Science.gov (United States)

    Romero, R; Iglesias, E L; Borrajo, L

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.

  19. Analisis Perbandingan KNN dengan SVM untuk Klasifikasi Penyakit Diabetes Retinopati berdasarkan Citra Eksudat dan Mikroaneurisma

    Directory of Open Access Journals (Sweden)

    SUCI AULIA

    2015-01-01

    Full Text Available ABSTRAK Penelitian mengenai pengklasifikasian tingkat keparahan penyakit Diabetes Retinopati berbasis image processing masih hangat dibicarakan, citra yang biasa digunakan untuk mendeteksi jenis penyakit ini adalah citra optik disk, mikroaneurisma, eksudat, dan hemorrhages yang berasal dari citra fundus. Pada penelitian ini telah dilakukan perbandingan algoritma SVM dengan KNN untuk klasifikasi penyakit diabetes retinopati (mild, moderate, severe berdasarkan citra eksudat dan microaneurisma. Untuk proses ekstraksi ciri digunakan metode wavelet  pada masing-masing kedua metode tersebut. Pada penelitian ini digunakan 160 data uji, masing-masing 40 citra untuk kelas normal, kelas mild, kelas moderate, kelas saviere. Tingkat akurasi yang diperoleh dengan menggunakan metode KNN lebih tinggi dibandingkan SVM, yaitu 65 % dan 62%. Klasifikasi dengan algoritma KNN diperoleh hasil terbaik dengan parameter K=9 cityblock. Sedangkan klasifikasi dengan metode SVM diperoleh hasil terbaik dengan parameter One Agains All. Kata kunci: Diabetic Retinopathy, KNN , SVM, Wavelet.   ABSTRACT Research based on severity classification of the disease diabetic retinopathy by using image processing method is still hotly debated, the image is used to detect the type of this disease is an optical image of the disk, microaneurysm, exudates, and bleeding of the image of the fundus. This study was performed to compare SVM method with KNN method for classification of diabetic retinopathy disease (mild, moderate, severe based on exudate and microaneurysm image. For feature extraction uses wavelet method, and each of the two methods. This study made use of 160 test data, each of 40 images for normal class, mild class, moderate class, severe class. The accuracy obtained by KNN higher than SVM, with 65% and 62%. KNN classification method achieved the best results with the parameters K = 9, cityblock. While the classification with SVM method obtained the best results with

  20. Nitrogen-to-Protein Conversion Factors for Crop Residues and Animal Manure Common in China.

    Science.gov (United States)

    Chen, Xueli; Zhao, Guanglu; Zhang, Yang; Han, Lujia; Xiao, Weihua

    2017-10-25

    Accurately determining protein content is essential in exploiting biomass as feed and fuel. A survey of biomass samples in China indicated protein contents from 2.65 to 3.98% for crop residues and from 6.07 to 10.24% for animal manure of dry basis. Conversion factors based on amino acid nitrogen (k A ) ranged from 5.42 to 6.00 for the former and from 4.78 to 5.36 for the latter, indicating that the traditional factor of 6.25 is not suitable for biomass samples. On the other hand, conversion factors from Kjeldahl nitrogen (k P ) ranged from 3.97 to 4.57 and from 2.76 to 4.31 for crop residues and animal manure, respectively. Of note, conversion factors were strongly affected by amino acid composition and levels of nonprotein nitrogen. Thus, k P values of 4.23 for crop residues, 4.11 for livestock manure, and 3.11 for poultry manure are recommended to better estimate protein content from total nitrogen.

  1. Sensitivity Analysis Based SVM Application on Automatic Incident Detection of Rural Road in China

    Directory of Open Access Journals (Sweden)

    Xingliang Liu

    2018-01-01

    Full Text Available Traditional automatic incident detection methods such as artificial neural networks, backpropagation neural network, and Markov chains are not suitable for addressing the incident detection problem of rural roads in China which have a relatively high accident rate and a low reaction speed caused by the character of small traffic volume. This study applies the support vector machine (SVM and parameter sensitivity analysis methods to build an accident detection algorithm in a rural road condition, based on real-time data collected in a field experiment. The sensitivity of four parameters (speed, front distance, vehicle group time interval, and free driving ratio is analyzed, and the data sets of two parameters with a significant sensitivity are chosen to form the traffic state feature vector. The SVM and k-fold cross validation (K-CV methods are used to build the accident detection algorithm, which shows an excellent performance in detection accuracy (98.15% of the training data set and 87.5% of the testing data set. Therefore, the problem of low incident reaction speed of rural roads in China could be solved to some extent.

  2. Detecting microcalcifications in mammograms by using SVM method for the diagnostics of breast cancer

    Science.gov (United States)

    Wan, Baikun; Wang, Ruiping; Qi, Hongzhi; Cao, Xuchen

    2005-01-01

    Support vector machine (SVM) is a new statistical learning method. Compared with the classical machine learning methods, SVM learning discipline is to minimize the structural risk instead of the empirical risk of the classical methods, and it gives better generative performance. Because SVM algorithm is a convex quadratic optimization problem, the local optimal solution is certainly the global optimal one. In this paper a SVM algorithm is applied to detect the micro-calcifications (MCCs) in mammograms for the diagnostics of breast cancer that has not been reported yet. It had been tested with 10 mammograms and the results show that the algorithm can achieve a higher true positive in comparison with artificial neural network (ANN) based on the empirical risk minimization, and is valuable for further study and application in the clinical engineering.

  3. Hybrid NN/SVM Computational System for Optimizing Designs

    Science.gov (United States)

    Rai, Man Mohan

    2009-01-01

    A computational method and system based on a hybrid of an artificial neural network (NN) and a support vector machine (SVM) (see figure) has been conceived as a means of maximizing or minimizing an objective function, optionally subject to one or more constraints. Such maximization or minimization could be performed, for example, to optimize solve a data-regression or data-classification problem or to optimize a design associated with a response function. A response function can be considered as a subset of a response surface, which is a surface in a vector space of design and performance parameters. A typical example of a design problem that the method and system can be used to solve is that of an airfoil, for which a response function could be the spatial distribution of pressure over the airfoil. In this example, the response surface would describe the pressure distribution as a function of the operating conditions and the geometric parameters of the airfoil. The use of NNs to analyze physical objects in order to optimize their responses under specified physical conditions is well known. NN analysis is suitable for multidimensional interpolation of data that lack structure and enables the representation and optimization of a succession of numerical solutions of increasing complexity or increasing fidelity to the real world. NN analysis is especially useful in helping to satisfy multiple design objectives. Feedforward NNs can be used to make estimates based on nonlinear mathematical models. One difficulty associated with use of a feedforward NN arises from the need for nonlinear optimization to determine connection weights among input, intermediate, and output variables. It can be very expensive to train an NN in cases in which it is necessary to model large amounts of information. Less widely known (in comparison with NNs) are support vector machines (SVMs), which were originally applied in statistical learning theory. In terms that are necessarily

  4. Evaluation of certain crop residues for carbohydrate and protein fractions by cornell net carbohydrate and protein system

    Directory of Open Access Journals (Sweden)

    Venkateswarulu Swarna

    2015-06-01

    Full Text Available Four locally available crop residues viz., jowar stover (JS, maize stover (MS, red gram straw (RGS and black gram straw (BGS were evaluated for carbohydrate and protein fractions using Cornell Net Carbohydrate and Protein (CNCP system. Lignin (% NDF was higher in legume straws as compared to cereal stovers while Non-structural carbohydrates (NSC (% DM followed the reverse trend. The carbohydrate fractions A and B1 were higher in BGS while B2 was higher in MS as compared to other crop residues. The unavailable cell wall fraction (C was higher in legume straws when compared to cereal stovers. Among protein fractions, B1 was higher in legume straws when compared to cereal stovers while B2 was higher in cereal stovers as compared to legume straws. Fraction B3 largely, bypass protein was highest in MS as compared to other crop residues. Acid detergent insoluble crude protein (ADICP (% CP or unavailable protein fraction C was lowest in MS and highest in BGS. It is concluded that MS is superior in nutritional value for feeding ruminants as compared to other crop residues.

  5. An SVM model with hybrid kernels for hydrological time series

    Science.gov (United States)

    Wang, C.; Wang, H.; Zhao, X.; Xie, Q.

    2017-12-01

    Support Vector Machine (SVM) models have been widely applied to the forecast of climate/weather and its impact on other environmental variables such as hydrologic response to climate/weather. When using SVM, the choice of the kernel function plays the key role. Conventional SVM models mostly use one single type of kernel function, e.g., radial basis kernel function. Provided that there are several featured kernel functions available, each having its own advantages and drawbacks, a combination of these kernel functions may give more flexibility and robustness to SVM approach, making it suitable for a wide range of application scenarios. This paper presents such a linear combination of radial basis kernel and polynomial kernel for the forecast of monthly flowrate in two gaging stations using SVM approach. The results indicate significant improvement in the accuracy of predicted series compared to the approach with either individual kernel function, thus demonstrating the feasibility and advantages of such hybrid kernel approach for SVM applications.

  6. Extraction of prostatic lumina and automated recognition for prostatic calculus image using PCA-SVM.

    Science.gov (United States)

    Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D Joshua

    2011-01-01

    Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi.

  7. Extraction of Prostatic Lumina and Automated Recognition for Prostatic Calculus Image Using PCA-SVM

    Science.gov (United States)

    Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D. Joshua

    2011-01-01

    Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi. PMID:21461364

  8. SVM-based Partial Discharge Pattern Classification for GIS

    Science.gov (United States)

    Ling, Yin; Bai, Demeng; Wang, Menglin; Gong, Xiaojin; Gu, Chao

    2018-01-01

    Partial discharges (PD) occur when there are localized dielectric breakdowns in small regions of gas insulated substations (GIS). It is of high importance to recognize the PD patterns, through which we can diagnose the defects caused by different sources so that predictive maintenance can be conducted to prevent from unplanned power outage. In this paper, we propose an approach to perform partial discharge pattern classification. It first recovers the PRPD matrices from the PRPD2D images; then statistical features are extracted from the recovered PRPD matrix and fed into SVM for classification. Experiments conducted on a dataset containing thousands of images demonstrates the high effectiveness of the method.

  9. Wetting of nonconserved residue-backbones: A feature indicative of aggregation associated regions of proteins.

    Science.gov (United States)

    Pradhan, Mohan R; Pal, Arumay; Hu, Zhongqiao; Kannan, Srinivasaraghavan; Chee Keong, Kwoh; Lane, David P; Verma, Chandra S

    2016-02-01

    Aggregation is an irreversible form of protein complexation and often toxic to cells. The process entails partial or major unfolding that is largely driven by hydration. We model the role of hydration in aggregation using "Dehydrons." "Dehydrons" are unsatisfied backbone hydrogen bonds in proteins that seek shielding from water molecules by associating with ligands or proteins. We find that the residues at aggregation interfaces have hydrated backbones, and in contrast to other forms of protein-protein interactions, are under less evolutionary pressure to be conserved. Combining evolutionary conservation of residues and extent of backbone hydration allows us to distinguish regions on proteins associated with aggregation (non-conserved dehydron-residues) from other interaction interfaces (conserved dehydron-residues). This novel feature can complement the existing strategies used to investigate protein aggregation/complexation. © 2015 Wiley Periodicals, Inc.

  10. Utilization of protein-rich residues in biotechnological processes.

    Science.gov (United States)

    Pleissner, Daniel; Venus, Joachim

    2016-03-01

    A drawback of biotechnological processes, where microorganisms convert biomass constituents, such as starch, cellulose, hemicelluloses, lipids, and proteins, into wanted products, is the economic feasibility. Particularly the cost of nitrogen sources in biotechnological processes can make up a large fraction of total process expenses. To further develop the bioeconomy, it is of considerable interest to substitute cost-intensive by inexpensive nitrogen sources. The aim of this mini-review was to provide a comprehensive insight of utilization methods of protein-rich residues, such as fish waste, green biomass, hairs, and food waste. The methods described include (i) production of enzymes, (ii) recovery of bioactive compounds, and/or (iii) usage as nitrogen source for microorganisms in biotechnological processes. In this aspect, the utilization of protein-rich residues, which are conventionally considered as waste, allows the development of value-adding processes for the production of bioactive compounds, biomolecules, chemicals, and materials.

  11. A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.

    Science.gov (United States)

    Halloran, John T; Rocke, David M

    2018-05-04

    Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l 2 -SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l 2 -SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l 2 -SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .

  12. Atterberg Limits Prediction Comparing SVM with ANFIS Model

    Directory of Open Access Journals (Sweden)

    Mohammad Murtaza Sherzoy

    2017-03-01

    Full Text Available Support Vector Machine (SVM and Adaptive Neuro-Fuzzy inference Systems (ANFIS both analytical methods are used to predict the values of Atterberg limits, such as the liquid limit, plastic limit and plasticity index. The main objective of this study is to make a comparison between both forecasts (SVM & ANFIS methods. All data of 54 soil samples are used and taken from the area of Peninsular Malaysian and tested for different parameters containing liquid limit, plastic limit, plasticity index and grain size distribution and were. The input parameter used in for this case are the fraction of grain size distribution which are the percentage of silt, clay and sand. The actual and predicted values of Atterberg limit which obtained from the SVM and ANFIS models are compared by using the correlation coefficient R2 and root mean squared error (RMSE value.  The outcome of the study show that the ANFIS model shows higher accuracy than SVM model for the liquid limit (R2 = 0.987, plastic limit (R2 = 0.949 and plastic index (R2 = 0966. RMSE value that obtained for both methods have shown that the ANFIS model has represent the best performance than SVM model to predict the Atterberg Limits as a whole.

  13. Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps

    KAUST Repository

    Oliva, Romina; Chermak, Edrisse; Cavallo, Luigi

    2015-01-01

    In view of the increasing interest both in inhibitors of protein-protein interactions and in protein drugs themselves, analysis of the three-dimensional structure of protein-protein complexes is assuming greater relevance in drug design. In the many cases where an experimental structure is not available, protein-protein docking becomes the method of choice for predicting the arrangement of the complex. However, reliably scoring protein-protein docking poses is still an unsolved problem. As a consequence, the screening of many docking models is usually required in the analysis step, to possibly single out the correct ones. Here, making use of exemplary cases, we review our recently introduced methods for the analysis of protein complex structures and for the scoring of protein docking poses, based on the use of inter-residue contacts and their visualization in inter-molecular contact maps. We also show that the ensemble of tools we developed can be used in the context of rational drug design targeting protein-protein interactions.

  14. Analysis and Ranking of Protein-Protein Docking Models Using Inter-Residue Contacts and Inter-Molecular Contact Maps

    KAUST Repository

    Oliva, Romina

    2015-07-01

    In view of the increasing interest both in inhibitors of protein-protein interactions and in protein drugs themselves, analysis of the three-dimensional structure of protein-protein complexes is assuming greater relevance in drug design. In the many cases where an experimental structure is not available, protein-protein docking becomes the method of choice for predicting the arrangement of the complex. However, reliably scoring protein-protein docking poses is still an unsolved problem. As a consequence, the screening of many docking models is usually required in the analysis step, to possibly single out the correct ones. Here, making use of exemplary cases, we review our recently introduced methods for the analysis of protein complex structures and for the scoring of protein docking poses, based on the use of inter-residue contacts and their visualization in inter-molecular contact maps. We also show that the ensemble of tools we developed can be used in the context of rational drug design targeting protein-protein interactions.

  15. Prediction of protein subcellular localization using support vector machine with the choice of proper kernel

    Directory of Open Access Journals (Sweden)

    Al Mehedi Hasan

    2017-07-01

    Full Text Available The prediction of subcellular locations of proteins can provide useful hints for revealing their functions as well as for understanding the mechanisms of some diseases and, finally, for developing novel drugs. As the number of newly discovered proteins has been growing exponentially, laboratory-based experiments to determine the location of an uncharacterized protein in a living cell have become both expensive and time-consuming. Consequently, to tackle these challenges, computational methods are being developed as an alternative to help biologists in selecting target proteins and designing related experiments. However, the success of protein subcellular localization prediction is still a complicated and challenging problem, particularly when query proteins may have multi-label characteristics, i.e. their simultaneous existence in more than one subcellular location, or if they move between two or more different subcellular locations as well. At this point, to get rid of this problem, several types of subcellular localization prediction methods with different levels of accuracy have been proposed. The support vector machine (SVM has been employed to provide potential solutions for problems connected with the prediction of protein subcellular localization. However, the practicability of SVM is affected by difficulties in selecting its appropriate kernel as well as in selecting the parameters of that selected kernel. The literature survey has shown that most researchers apply the radial basis function (RBF kernel to build a SVM based subcellular localization prediction system. Surprisingly, there are still many other kernel functions which have not yet been applied in the prediction of protein subcellular localization. However, the nature of this classification problem requires the application of different kernels for SVM to ensure an optimal result. From this viewpoint, this paper presents the work to apply different kernels for SVM in protein

  16. A prediction model of drug-induced ototoxicity developed by an optimal support vector machine (SVM) method.

    Science.gov (United States)

    Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong

    2014-08-01

    Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Large-scale evaluation of dynamically important residues in proteins predicted by the perturbation analysis of a coarse-grained elastic model

    Directory of Open Access Journals (Sweden)

    Tekpinar Mustafa

    2009-07-01

    Full Text Available Abstract Backgrounds It is increasingly recognized that protein functions often require intricate conformational dynamics, which involves a network of key amino acid residues that couple spatially separated functional sites. Tremendous efforts have been made to identify these key residues by experimental and computational means. Results We have performed a large-scale evaluation of the predictions of dynamically important residues by a variety of computational protocols including three based on the perturbation and correlation analysis of a coarse-grained elastic model. This study is performed for two lists of test cases with >500 pairs of protein structures. The dynamically important residues predicted by the perturbation and correlation analysis are found to be strongly or moderately conserved in >67% of test cases. They form a sparse network of residues which are clustered both in 3D space and along protein sequence. Their overall conservation is attributed to their dynamic role rather than ligand binding or high network connectivity. Conclusion By modeling how the protein structural fluctuations respond to residue-position-specific perturbations, our highly efficient perturbation and correlation analysis can be used to dissect the functional conformational changes in various proteins with a residue level of detail. The predictions of dynamically important residues serve as promising targets for mutational and functional studies.

  18. Comparison of SVM RBF-NN and DT for crop and weed identification based on spectral measurement over corn fields

    Science.gov (United States)

    It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...

  19. Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification.

    Science.gov (United States)

    Younghak Shin; Balasingham, Ilangko

    2017-07-01

    Colonoscopy is a standard method for screening polyps by highly trained physicians. Miss-detected polyps in colonoscopy are potential risk factor for colorectal cancer. In this study, we investigate an automatic polyp classification framework. We aim to compare two different approaches named hand-craft feature method and convolutional neural network (CNN) based deep learning method. Combined shape and color features are used for hand craft feature extraction and support vector machine (SVM) method is adopted for classification. For CNN approach, three convolution and pooling based deep learning framework is used for classification purpose. The proposed framework is evaluated using three public polyp databases. From the experimental results, we have shown that the CNN based deep learning framework shows better classification performance than the hand-craft feature based methods. It achieves over 90% of classification accuracy, sensitivity, specificity and precision.

  20. Age and gender estimation using Region-SIFT and multi-layered SVM

    Science.gov (United States)

    Kim, Hyunduk; Lee, Sang-Heon; Sohn, Myoung-Kyu; Hwang, Byunghun

    2018-04-01

    In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.

  1. Damage Detection of Structures for Ambient Loading Based on Cross Correlation Function Amplitude and SVM

    Directory of Open Access Journals (Sweden)

    Lin-sheng Huo

    2016-01-01

    Full Text Available An effective method for the damage detection of skeletal structures which combines the cross correlation function amplitude (CCFA with the support vector machine (SVM is presented in this paper. The proposed method consists of two stages. Firstly, the data features are extracted from the CCFA, which, calculated from dynamic responses and as a representation of the modal shapes of the structure, changes when damage occurs on the structure. The data features are then input into the SVM with the one-against-one (OAO algorithm to classify the damage status of the structure. The simulation data of IASC-ASCE benchmark model and a vibration experiment of truss structure are adopted to verify the feasibility of proposed method. The results show that the proposed method is suitable for the damage identification of skeletal structures with the limited sensors subjected to ambient excitation. As the CCFA based data features are sensitive to damage, the proposed method demonstrates its reliability in the diagnosis of structures with damage, especially for those with minor damage. In addition, the proposed method shows better noise robustness and is more suitable for noisy environments.

  2. Energetic frustrations in protein folding at residue resolution: a homologous simulation study of Im9 proteins.

    Directory of Open Access Journals (Sweden)

    Yunxiang Sun

    Full Text Available Energetic frustration is becoming an important topic for understanding the mechanisms of protein folding, which is a long-standing big biological problem usually investigated by the free energy landscape theory. Despite the significant advances in probing the effects of folding frustrations on the overall features of protein folding pathways and folding intermediates, detailed characterizations of folding frustrations at an atomic or residue level are still lacking. In addition, how and to what extent folding frustrations interact with protein topology in determining folding mechanisms remains unclear. In this paper, we tried to understand energetic frustrations in the context of protein topology structures or native-contact networks by comparing the energetic frustrations of five homologous Im9 alpha-helix proteins that share very similar topology structures but have a single hydrophilic-to-hydrophobic mutual mutation. The folding simulations were performed using a coarse-grained Gō-like model, while non-native hydrophobic interactions were introduced as energetic frustrations using a Lennard-Jones potential function. Energetic frustrations were then examined at residue level based on φ-value analyses of the transition state ensemble structures and mapped back to native-contact networks. Our calculations show that energetic frustrations have highly heterogeneous influences on the folding of the four helices of the examined structures depending on the local environment of the frustration centers. Also, the closer the introduced frustration is to the center of the native-contact network, the larger the changes in the protein folding. Our findings add a new dimension to the understanding of protein folding the topology determination in that energetic frustrations works closely with native-contact networks to affect the protein folding.

  3. Substantial conformational change mediated by charge-triad residues of the death effector domain in protein-protein interactions.

    Directory of Open Access Journals (Sweden)

    Edward C Twomey

    Full Text Available Protein conformational changes are commonly associated with the formation of protein complexes. The non-catalytic death effector domains (DEDs mediate protein-protein interactions in a variety of cellular processes, including apoptosis, proliferation and migration, and glucose metabolism. Here, using NMR residual dipolar coupling (RDC data, we report a conformational change in the DED of the phosphoprotein enriched in astrocytes, 15 kDa (PEA-15 protein in the complex with a mitogen-activated protein (MAP kinase, extracellular regulated kinase 2 (ERK2, which is essential in regulating ERK2 cellular distribution and function in cell proliferation and migration. The most significant conformational change in PEA-15 happens at helices α2, α3, and α4, which also possess the highest flexibility among the six-helix bundle of the DED. This crucial conformational change is modulated by the D/E-RxDL charge-triad motif, one of the prominent structural features of DEDs, together with a number of other electrostatic and hydrogen bonding interactions on the protein surface. Charge-triad motif promotes the optimal orientation of key residues and expands the binding interface to accommodate protein-protein interactions. However, the charge-triad residues are not directly involved in the binding interface between PEA-15 and ERK2.

  4. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    Directory of Open Access Journals (Sweden)

    C. Fernandez-Lozano

    2013-01-01

    Full Text Available Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM. Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA, the most representative variables for a specific classification problem can be selected.

  5. Lex-SVM: exploring the potential of exon expression profiling for disease classification.

    Science.gov (United States)

    Yuan, Xiongying; Zhao, Yi; Liu, Changning; Bu, Dongbo

    2011-04-01

    Exon expression profiling technologies, including exon arrays and RNA-Seq, measure the abundance of every exon in a gene. Compared with gene expression profiling technologies like 3' array, exon expression profiling technologies could detect alterations in both transcription and alternative splicing, therefore they are expected to be more sensitive in diagnosis. However, exon expression profiling also brings higher dimension, more redundancy, and significant correlation among features. Ignoring the correlation structure among exons of a gene, a popular classification method like L1-SVM selects exons individually from each gene and thus is vulnerable to noise. To overcome this limitation, we present in this paper a new variant of SVM named Lex-SVM to incorporate correlation structure among exons and known splicing patterns to promote classification performance. Specifically, we construct a new norm, ex-norm, including our prior knowledge on exon correlation structure to regularize the coefficients of a linear SVM. Lex-SVM can be solved efficiently using standard linear programming techniques. The advantage of Lex-SVM is that it can select features group-wisely, force features in a subgroup to take equal weihts and exclude the features that contradict the majority in the subgroup. Experimental results suggest that on exon expression profile, Lex-SVM is more accurate than existing methods. Lex-SVM also generates a more compact model and selects genes more consistently in cross-validation. Unlike L1-SVM selecting only one exon in a gene, Lex-SVM assigns equal weights to as many exons in a gene as possible, lending itself easier for further interpretation.

  6. Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.

    Science.gov (United States)

    She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng

    2015-01-01

    Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.

  7. gRINN: a tool for calculation of residue interaction energies and protein energy network analysis of molecular dynamics simulations.

    Science.gov (United States)

    Serçinoglu, Onur; Ozbek, Pemra

    2018-05-25

    Atomistic molecular dynamics (MD) simulations generate a wealth of information related to the dynamics of proteins. If properly analyzed, this information can lead to new insights regarding protein function and assist wet-lab experiments. Aiming to identify interactions between individual amino acid residues and the role played by each in the context of MD simulations, we present a stand-alone software called gRINN (get Residue Interaction eNergies and Networks). gRINN features graphical user interfaces (GUIs) and a command-line interface for generating and analyzing pairwise residue interaction energies and energy correlations from protein MD simulation trajectories. gRINN utilizes the features of NAMD or GROMACS MD simulation packages and automatizes the steps necessary to extract residue-residue interaction energies from user-supplied simulation trajectories, greatly simplifying the analysis for the end-user. A GUI, including an embedded molecular viewer, is provided for visualization of interaction energy time-series, distributions, an interaction energy matrix, interaction energy correlations and a residue correlation matrix. gRINN additionally offers construction and analysis of Protein Energy Networks, providing residue-based metrics such as degrees, betweenness-centralities, closeness centralities as well as shortest path analysis. gRINN is free and open to all users without login requirement at http://grinn.readthedocs.io.

  8. Steady Modeling for an Ammonia Synthesis Reactor Based on a Novel CDEAS-LS-SVM Model

    Directory of Open Access Journals (Sweden)

    Zhuoqian Liu

    2014-01-01

    Full Text Available A steady-state mathematical model is built in order to represent plant behavior under stationary operating conditions. A novel modeling using LS-SVR based on Cultural Differential Evolution with Ant Search is proposed. LS-SVM is adopted to establish the model of the net value of ammonia. The modeling method has fast convergence speed and good global adaptability for identification of the ammonia synthesis process. The LS-SVR model was established using the above-mentioned method. Simulation results verify the validity of the method.

  9. Assessing food allergy risks from residual peanut protein in highly refined vegetable oil

    NARCIS (Netherlands)

    Blom, W.M.; Kruizinga, A.G.; Rubingh, C.M.; Remington, B.C.; Crevel, R.W.R.; Houben, G.F.

    2017-01-01

    Refined vegetable oils including refined peanut oil are widely used in foods. Due to shared production processes, refined non-peanut vegetable oils can contain residual peanut proteins. We estimated the predicted number of allergic reactions to residual peanut proteins using probabilistic risk

  10. 基于信息熵的SVM入侵检测技术%Exploring SVM-based intrusion detection through information entropy theory

    Institute of Scientific and Technical Information of China (English)

    朱文杰; 王强; 翟献军

    2013-01-01

    在传统基于SVM的入侵检测中,核函数构造和特征选择采用先验知识,普遍存在准确度不高、效率低下的问题.通过信息熵理论与SVM算法相结合的方法改进为基于信息熵的SVM入侵检测算法,可以提高入侵检测的准确性,提升入侵检测的效率.基于信息熵的SVM入侵检测算法包括两个方面:一方面,根据样本包含的用户信息熵和方差,将样本特征统一,以特征是否属于置信区间来度量.将得到的样本特征置信向量作为SVM核函数的构造参数,既可保证训练样本集与最优分类面之间的对应关系,又可得到入侵检测需要的最大分类间隔;另一方面,将样本包含的用户信息量作为度量大幅度约简样本特征子集,不但降低了样本计算规模,而且提高了分类器的训练速度.实验表明,该算法在入侵检测系统中的应用优于传统的SVM算法.%In traditional SVM based intrusion detection approaches,both core function construction and feature selection use prior knowdege.Due to this,they are not only inefficient but also inaccurate.It is observed that integrating information entropy theory into SVM-based intrusion detection can enhance both the precision and the speed.Concludely speaking,SVM-based entropy intrusion detection algorithms are made up of two aspects:on one hand,setting sample confidence vector as core function's constructor of SVM algorithm can guarantee the mapping relationship between training sample and optimization classification plane.Also,the intrusion detection's maximum interval can be acquired.On the other hand,simplifying feature subset with samples's entropy as metric standard can not only shrink the computing scale but also improve the speed.Experiments prove that the SVM based entropy intrusion detection algoritm outperfomrs other tradional algorithms.

  11. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.

    Science.gov (United States)

    Subasi, Abdulhamit

    2013-06-01

    Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems.

    Science.gov (United States)

    Cho, Ming-Yuan; Hoang, Thi Thom

    2017-01-01

    Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.

  13. Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems

    Directory of Open Access Journals (Sweden)

    Ming-Yuan Cho

    2017-01-01

    Full Text Available Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO based support vector machine (SVM classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR method with a pseudorandom binary sequence (PRBS stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.

  14. A Realistic Seizure Prediction Study Based on Multiclass SVM.

    Science.gov (United States)

    Direito, Bruno; Teixeira, César A; Sales, Francisco; Castelo-Branco, Miguel; Dourado, António

    2017-05-01

    A patient-specific algorithm, for epileptic seizure prediction, based on multiclass support-vector machines (SVM) and using multi-channel high-dimensional feature sets, is presented. The feature sets, combined with multiclass classification and post-processing schemes aim at the generation of alarms and reduced influence of false positives. This study considers 216 patients from the European Epilepsy Database, and includes 185 patients with scalp EEG recordings and 31 with intracranial data. The strategy was tested over a total of 16,729.80[Formula: see text]h of inter-ictal data, including 1206 seizures. We found an overall sensitivity of 38.47% and a false positive rate per hour of 0.20. The performance of the method achieved statistical significance in 24 patients (11% of the patients). Despite the encouraging results previously reported in specific datasets, the prospective demonstration on long-term EEG recording has been limited. Our study presents a prospective analysis of a large heterogeneous, multicentric dataset. The statistical framework based on conservative assumptions, reflects a realistic approach compared to constrained datasets, and/or in-sample evaluations. The improvement of these results, with the definition of an appropriate set of features able to improve the distinction between the pre-ictal and nonpre-ictal states, hence minimizing the effect of confounding variables, remains a key aspect.

  15. A Soluble, Folded Protein without Charged Amino Acid Residues

    DEFF Research Database (Denmark)

    Højgaard, Casper; Kofoed, Christian; Espersen, Roall

    2016-01-01

    Charges are considered an integral part of protein structure and function, enhancing solubility and providing specificity in molecular interactions. We wished to investigate whether charged amino acids are indeed required for protein biogenesis and whether a protein completely free of titratable...... side chains can maintain solubility, stability, and function. As a model, we used a cellulose-binding domain from Cellulomonas fimi, which, among proteins of more than 100 amino acids, presently is the least charged in the Protein Data Bank, with a total of only four titratable residues. We find...

  16. Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood.

    Science.gov (United States)

    Zhang, Fan; Kaufman, Howard L; Deng, Youping; Drabier, Renee

    2013-01-01

    Breast cancer is worldwide the second most common type of cancer after lung cancer. Traditional mammography and Tissue Microarray has been studied for early cancer detection and cancer prediction. However, there is a need for more reliable diagnostic tools for early detection of breast cancer. This can be a challenge due to a number of factors and logistics. First, obtaining tissue biopsies can be difficult. Second, mammography may not detect small tumors, and is often unsatisfactory for younger women who typically have dense breast tissue. Lastly, breast cancer is not a single homogeneous disease but consists of multiple disease states, each arising from a distinct molecular mechanism and having a distinct clinical progression path which makes the disease difficult to detect and predict in early stages. In the paper, we present a Support Vector Machine based on Recursive Feature Elimination and Cross Validation (SVM-RFE-CV) algorithm for early detection of breast cancer in peripheral blood and show how to use SVM-RFE-CV to model the classification and prediction problem of early detection of breast cancer in peripheral blood.The training set which consists of 32 health and 33 cancer samples and the testing set consisting of 31 health and 34 cancer samples were randomly separated from a dataset of peripheral blood of breast cancer that is downloaded from Gene Express Omnibus. First, we identified the 42 differentially expressed biomarkers between "normal" and "cancer". Then, with the SVM-RFE-CV we extracted 15 biomarkers that yield zero cross validation score. Lastly, we compared the classification and prediction performance of SVM-RFE-CV with that of SVM and SVM Recursive Feature Elimination (SVM-RFE). We found that 1) the SVM-RFE-CV is suitable for analyzing noisy high-throughput microarray data, 2) it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features, and 3) it can improve the prediction performance (Area Under

  17. Customer and performance rating in QFD using SVM classification

    Science.gov (United States)

    Dzulkifli, Syarizul Amri; Salleh, Mohd Najib Mohd; Leman, A. M.

    2017-09-01

    In a classification problem, where each input is associated to one output. Training data is used to create a model which predicts values to the true function. SVM is a popular method for binary classification due to their theoretical foundation and good generalization performance. However, when trained with noisy data, the decision hyperplane might deviate from optimal position because of the sum of misclassification errors in the objective function. In this paper, we introduce fuzzy in weighted learning approach for improving the accuracy of Support Vector Machine (SVM) classification. The main aim of this work is to determine appropriate weighted for SVM to adjust the parameters of learning method from a given set of noisy input to output data. The performance and customer rating in Quality Function Deployment (QFD) is used as our case study to determine implementing fuzzy SVM is highly scalable for very large data sets and generating high classification accuracy.

  18. Hadamard Kernel SVM with applications for breast cancer outcome predictions.

    Science.gov (United States)

    Jiang, Hao; Ching, Wai-Ki; Cheung, Wai-Shun; Hou, Wenpin; Yin, Hong

    2017-12-21

    Breast cancer is one of the leading causes of deaths for women. It is of great necessity to develop effective methods for breast cancer detection and diagnosis. Recent studies have focused on gene-based signatures for outcome predictions. Kernel SVM for its discriminative power in dealing with small sample pattern recognition problems has attracted a lot attention. But how to select or construct an appropriate kernel for a specified problem still needs further investigation. Here we propose a novel kernel (Hadamard Kernel) in conjunction with Support Vector Machines (SVMs) to address the problem of breast cancer outcome prediction using gene expression data. Hadamard Kernel outperform the classical kernels and correlation kernel in terms of Area under the ROC Curve (AUC) values where a number of real-world data sets are adopted to test the performance of different methods. Hadamard Kernel SVM is effective for breast cancer predictions, either in terms of prognosis or diagnosis. It may benefit patients by guiding therapeutic options. Apart from that, it would be a valuable addition to the current SVM kernel families. We hope it will contribute to the wider biology and related communities.

  19. Application of EMD-Based SVD and SVM to Coal-Gangue Interface Detection

    Directory of Open Access Journals (Sweden)

    Wei Liu

    2014-01-01

    Full Text Available Coal-gangue interface detection during top-coal caving mining is a challenging problem. This paper proposes a new vibration signal analysis approach to detecting the coal-gangue interface based on singular value decomposition (SVD techniques and support vector machines (SVMs. Due to the nonstationary characteristics in vibration signals of the tail boom support of the longwall mining machine in this complicated environment, the empirical mode decomposition (EMD is used to decompose the raw vibration signals into a number of intrinsic mode functions (IMFs by which the initial feature vector matrices can be formed automatically. By applying the SVD algorithm to the initial feature vector matrices, the singular values of matrices can be obtained and used as the input feature vectors of SVMs classifier. The analysis results of vibration signals from the tail boom support of a longwall mining machine show that the method based on EMD, SVD, and SVM is effective for coal-gangue interface detection even when the number of samples is small.

  20. Are tyrosine residues involved in the photoconversion of the water-soluble chlorophyll-binding protein of Chenopodium album?

    Science.gov (United States)

    Takahashi, S; Seki, Y; Uchida, A; Nakayama, K; Satoh, H

    2015-05-01

    Non-photosynthetic and hydrophilic chlorophyll (Chl) proteins, called water-soluble Chl-binding proteins (WSCPs), are distributed in various species of Chenopodiaceae, Amaranthaceae, Polygonaceae and Brassicaceae. Based on their photoconvertibility, WSCPs are categorised into two classes: Class I (photoconvertible) and Class II (non-photoconvertible). Chenopodium album WSCP (CaWSCP; Class I) is able to convert the chlorin skeleton of Chl a into a bacteriochlorin-like skeleton under light in the presence of molecular oxygen. Potassium iodide (KI) is a strong inhibitor of the photoconversion. Because KI attacks tyrosine residues in proteins, tyrosine residues in CaWSCP are considered to be important amino acid residues for the photoconversion. Recently, we identified the gene encoding CaWSCP and found that the mature region of CaWSCP contained four tyrosine residues: Tyr13, Tyr14, Tyr87 and Tyr134. To gain insight into the effect of the tyrosine residues on the photoconversion, we constructed 15 mutant proteins (Y13A, Y14A, Y87A, Y134A, Y13-14A, Y13-87A, Y13-134A, Y14-87A, Y14-134A, Y87-134A, Y13-14-87A, Y13-14-134A, Y13-87-134A, Y14-87-134A and Y13-14-87-134A) using site-directed mutagenesis. Amazingly, all the mutant proteins retained not only chlorophyll-binding activity, but also photoconvertibility. Furthermore, we found that KI strongly inhibited the photoconversion of Y13-14-87-134A. These findings indicated that the four tyrosine residues are not essential for the photoconversion. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  1. SVM-Based Control System for a Robot Manipulator

    Directory of Open Access Journals (Sweden)

    Foudil Abdessemed

    2012-12-01

    Full Text Available Real systems are usually non-linear, ill-defined, have variable parameters and are subject to external disturbances. Modelling these systems is often an approximation of the physical phenomena involved. However, it is from this approximate system of representation that we propose - in this paper - to build a robust control, in the sense that it must ensure low sensitivity towards parameters, uncertainties, variations and external disturbances. The computed torque method is a well-established robot control technique which takes account of the dynamic coupling between the robot links. However, its main disadvantage lies on the assumption of an exactly known dynamic model which is not realizable in practice. To overcome this issue, we propose the estimation of the dynamics model of the nonlinear system with a machine learning regression method. The output of this regressor is used in conjunction with a PD controller to achieve the tracking trajectory task of a robot manipulator. In cases where some of the parameters of the plant undergo a change in their values, poor performance may result. To cope with this drawback, a fuzzy precompensator is inserted to reinforce the SVM computed torque-based controller and avoid any deterioration. The theory is developed and the simulation results are carried out on a two-degree of freedom robot manipulator to demonstrate the validity of the proposed approach.

  2. Prediction and analysis of beta-turns in proteins by support vector machine.

    Science.gov (United States)

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2003-01-01

    Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.

  3. Thermodynamic effects of replacements of Pro residues in helix interiors of maltose-binding protein.

    Science.gov (United States)

    Prajapati, R S; Lingaraju, G M; Bacchawat, Kiran; Surolia, Avadhesha; Varadarajan, Raghavan

    2003-12-01

    Introduction of Pro residues into helix interiors results in protein destabilization. It is currently unclear if the converse substitution (i.e., replacement of Pro residues that naturally occur in helix interiors would be stabilizing). Maltose-binding protein is a large 370-amino acid protein that contains 21 Pro residues. Of these, three nonconserved residues (P48, P133, and P159) occur at helix interiors. Each of the residues was replaced with Ala and Ser. Stabilities were characterized by differential scanning calorimetry (DSC) as a function of pH and by isothermal urea denaturation studies as a function of temperature. The P48S and P48A mutants were found to be marginally more stable than the wild-type protein. In the pH range of 5-9, there is an average increase in T(m) values of P48A and P48S of 0.4 degrees C and 0.2 degrees C, respectively, relative to the wild-type protein. The other mutants are less stable than the wild type. Analysis of the effects of such Pro substitutions in MBP and in three other proteins studied to date suggests that substitutions are more likely to be stabilizing if the carbonyl group i-3 or i-4 to the mutation site is not hydrogen bonded in the wild-type protein. Copyright 2003 Wiley-Liss, Inc.

  4. Protein consensus-based surface engineering (ProCoS): a computer-assisted method for directed protein evolution.

    Science.gov (United States)

    Shivange, Amol V; Hoeffken, Hans Wolfgang; Haefner, Stefan; Schwaneberg, Ulrich

    2016-12-01

    Protein consensus-based surface engineering (ProCoS) is a simple and efficient method for directed protein evolution combining computational analysis and molecular biology tools to engineer protein surfaces. ProCoS is based on the hypothesis that conserved residues originated from a common ancestor and that these residues are crucial for the function of a protein, whereas highly variable regions (situated on the surface of a protein) can be targeted for surface engineering to maximize performance. ProCoS comprises four main steps: ( i ) identification of conserved and highly variable regions; ( ii ) protein sequence design by substituting residues in the highly variable regions, and gene synthesis; ( iii ) in vitro DNA recombination of synthetic genes; and ( iv ) screening for active variants. ProCoS is a simple method for surface mutagenesis in which multiple sequence alignment is used for selection of surface residues based on a structural model. To demonstrate the technique's utility for directed evolution, the surface of a phytase enzyme from Yersinia mollaretii (Ymphytase) was subjected to ProCoS. Screening just 1050 clones from ProCoS engineering-guided mutant libraries yielded an enzyme with 34 amino acid substitutions. The surface-engineered Ymphytase exhibited 3.8-fold higher pH stability (at pH 2.8 for 3 h) and retained 40% of the enzyme's specific activity (400 U/mg) compared with the wild-type Ymphytase. The pH stability might be attributed to a significantly increased (20 percentage points; from 9% to 29%) number of negatively charged amino acids on the surface of the engineered phytase.

  5. Prediction of Protein-Protein Interaction By Metasample-Based Sparse Representation

    Directory of Open Access Journals (Sweden)

    Xiuquan Du

    2015-01-01

    Full Text Available Protein-protein interactions (PPIs play key roles in many cellular processes such as transcription regulation, cell metabolism, and endocrine function. Understanding these interactions takes a great promotion to the pathogenesis and treatment of various diseases. A large amount of data has been generated by experimental techniques; however, most of these data are usually incomplete or noisy, and the current biological experimental techniques are always very time-consuming and expensive. In this paper, we proposed a novel method (metasample-based sparse representation classification, MSRC for PPIs prediction. A group of metasamples are extracted from the original training samples and then use the l1-regularized least square method to express a new testing sample as the linear combination of these metasamples. PPIs prediction is achieved by using a discrimination function defined in the representation coefficients. The MSRC is applied to PPIs dataset; it achieves 84.9% sensitivity, and 94.55% specificity, which is slightly lower than support vector machine (SVM and much higher than naive Bayes (NB, neural networks (NN, and k-nearest neighbor (KNN. The result shows that the MSRC is efficient for PPIs prediction.

  6. Microbial Physiology of the Conversion of Residual Oil to Methane: A Protein Prospective

    Science.gov (United States)

    Morris, Brandon E. L.; Bastida-Lopez, Felipe; von Bergen, Martin; Richnow, Hans-Hermann; Suflita, Joseph M.

    2010-05-01

    Traditional petroleum recovery techniques are unable to extract the majority of oil in most petroliferous deposits. The recovery of even a fraction of residual hydrocarbon in conventional reserves could represent a substantive energy supply. To this end, the microbial conversion of residual oil to methane has gained increasing relevance in recent years [1,2]. Worldwide demand for methane is expected to increase through 2030 [3], as it is a cleaner-burning alternative to traditional fuels [4]. To investigate the microbial physiology of hydrocarbon-decomposition and ultimate methanogenesis, we initiated a two-pronged approach. First, a model alkane-degrading sulfate-reducing bacterium, Desulfoglaeba alkanexedens, was used to interrogate the predominant metabolic pathway(s) differentially expressed during growth on either n-decane or butyrate. A total of 81 proteins were differentially expressed during bacterial growth on butyrate, while 100 proteins were unique to the alkane-grown condition. Proteins related to alkylsuccinate synthase, or the homologous 1-methyl alkylsuccinate synthase, were identified only in the presence of the hydrocarbon. Secondly, we used a newly developed stable isotope probing technique [5] targeted towards proteins to monitor the flux of carbon through a residual oil-degrading bacterial consortium enriched from a gas-condensate contaminated aquifer [1]. Combined carbon and hydrogen stable isotope fractionation identified acetoclastic methanogenesis as the dominant process in this system. Such findings agree with the previous clone library characterization of the consortium. Furthermore, hydrocarbon activation was determined to be the rate-limiting process during the net conversion of residual oil to methane. References 1. Gieg, L.M., K.E. Duncan, and J.M. Suflita, Bioenegy production via microbial conversion of residual oil to natural gas. Appl Environ Micro, 2008. 74(10): p. 3022-3029. 2. Jones, D.M., et al., Crude-oil biodegradation via

  7. An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

    Science.gov (United States)

    Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

    2015-09-01

    Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.

  8. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  9. Hardware realization of an SVM algorithm implemented in FPGAs

    Science.gov (United States)

    Wiśniewski, Remigiusz; Bazydło, Grzegorz; Szcześniak, Paweł

    2017-08-01

    The paper proposes a technique of hardware realization of a space vector modulation (SVM) of state function switching in matrix converter (MC), oriented on the implementation in a single field programmable gate array (FPGA). In MC the SVM method is based on the instantaneous space-vector representation of input currents and output voltages. The traditional computation algorithms usually involve digital signal processors (DSPs) which consumes the large number of power transistors (18 transistors and 18 independent PWM outputs) and "non-standard positions of control pulses" during the switching sequence. Recently, hardware implementations become popular since computed operations may be executed much faster and efficient due to nature of the digital devices (especially concurrency). In the paper, we propose a hardware algorithm of SVM computation. In opposite to the existing techniques, the presented solution applies COordinate Rotation DIgital Computer (CORDIC) method to solve the trigonometric operations. Furthermore, adequate arithmetic modules (that is, sub-devices) used for intermediate calculations, such as code converters or proper sectors selectors (for output voltages and input current) are presented in detail. The proposed technique has been implemented as a design described with the use of Verilog hardware description language. The preliminary results of logic implementation oriented on the Xilinx FPGA (particularly, low-cost device from Artix-7 family from Xilinx was used) are also presented.

  10. Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

    Directory of Open Access Journals (Sweden)

    Ji-Yong An

    2016-01-01

    Full Text Available We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM model and Local Phase Quantization (LPQ to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM, reducing the influence of noise using a Principal Component Analysis (PCA, and using a Relevance Vector Machine (RVM based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

  11. PCA-MLP SVM distinction of salivary Raman spectra of dengue fever infection.

    Science.gov (United States)

    Radzol, A R M; Lee, Khuan Y; Mansor, W; Wong, P S; Looi, I

    2017-07-01

    Dengue fever (DF) is a disease of major concern caused by flavivirus infection. Delayed diagnosis leads to severe stages, which could be deadly. Of recent, non-structural protein (NS1) has been acknowledged as a biomarker, alternative to immunoglobulins for early detection of dengue in blood. Further, non-invasive detection of NS1 in saliva makes the approach more appealing. However, since its concentration in saliva is less than blood, a sensitive and specific technique, Surface Enhanced Raman Spectroscopy (SERS), is employed. Our work here intends to define an optimal PCA-SVM (Principal Component Analysis-Support Vector Machine) with Multilayer Layer Perceptron (MLP) kernel model to distinct between positive and negative NS1 infected samples from salivary SERS spectra, which, to the best of our knowledge, has never been explored. Salivary samples of DF positive and negative subjects were collected, pre-processed and analyzed. PCA and SVM classifier were then used to differentiate the SERS analyzed spectra. Since performance of the model depends on the PCA criterion and MLP parameters, both are examined in tandem. Its performance is also compared to our previous works on simulated NS1 salivary samples. It is found that the best PCA-SVM (MLP) model can be defined by 95 PCs from CPV criterion with P1 and P2 values of 0.01 and -0.2 respectively. A classification performance of [76.88%, 85.92%, 67.83%] is achieved.

  12. Elman RNN based classification of proteins sequences on account of their mutual information.

    Science.gov (United States)

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Robust LS-SVM-based adaptive constrained control for a class of uncertain nonlinear systems with time-varying predefined performance

    Science.gov (United States)

    Luo, Jianjun; Wei, Caisheng; Dai, Honghua; Yuan, Jianping

    2018-03-01

    This paper focuses on robust adaptive control for a class of uncertain nonlinear systems subject to input saturation and external disturbance with guaranteed predefined tracking performance. To reduce the limitations of classical predefined performance control method in the presence of unknown initial tracking errors, a novel predefined performance function with time-varying design parameters is first proposed. Then, aiming at reducing the complexity of nonlinear approximations, only two least-square-support-vector-machine-based (LS-SVM-based) approximators with two design parameters are required through norm form transformation of the original system. Further, a novel LS-SVM-based adaptive constrained control scheme is developed under the time-vary predefined performance using backstepping technique. Wherein, to avoid the tedious analysis and repeated differentiations of virtual control laws in the backstepping technique, a simple and robust finite-time-convergent differentiator is devised to only extract its first-order derivative at each step in the presence of external disturbance. In this sense, the inherent demerit of backstepping technique-;explosion of terms; brought by the recursive virtual controller design is conquered. Moreover, an auxiliary system is designed to compensate the control saturation. Finally, three groups of numerical simulations are employed to validate the effectiveness of the newly developed differentiator and the proposed adaptive constrained control scheme.

  14. Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.

    Science.gov (United States)

    Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad

    2018-01-01

    Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.

  15. A Multi-Classification Method of Improved SVM-based Information Fusion for Traffic Parameters Forecasting

    Directory of Open Access Journals (Sweden)

    Hongzhuan Zhao

    2016-04-01

    Full Text Available With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS. Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.

  16. Measurement of conformational constraints in an elastin-mimetic protein by residue-pair selected solid-state NMR

    International Nuclear Information System (INIS)

    Hong, Mei; McMillan, R. Andrew; Conticello, Vincent P.

    2002-01-01

    We introduce a solid-state NMR technique for selective detection of a residue pair in multiply labeled proteins to obtain site-specific structural constraints. The method exploits the frequency-offset dependence of cross polarization to achieve 13 CO i → 15 N i → 13 Cα i transfer between two residues. A 13 C, 15 N-labeled elastin mimetic protein (VPGVG) n is used to demonstrate the method. The technique selected the Gly3 Cα signal while suppressing the Gly5 Cα signal, and allowed the measurement of the Gly3 Cα chemical shift anisotropy to derive information on the protein conformation. This residue-pair selection technique should simplify the study of protein structure at specific residues

  17. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues.

    Science.gov (United States)

    Guo, Song; Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.

  18. SVM-Based Dynamic Reconfiguration CPS for Manufacturing System in Industry 4.0

    Directory of Open Access Journals (Sweden)

    Hyun-Jun Shin

    2018-01-01

    Full Text Available CPS is potential application in various fields, such as medical, healthcare, energy, transportation, and defense, as well as Industry 4.0 in Germany. Although studies on the equipment aging and prediction of problem have been done by combining CPS with Industry 4.0, such studies were based on small numbers and majority of the papers focused primarily on CPS methodology. Therefore, it is necessary to study active self-protection to enable self-management functions, such as self-healing by applying CPS in shop-floor. In this paper, we have proposed modeling of shop-floor and a dynamic reconfigurable CPS scheme that can predict the occurrence of anomalies and self-protection in the model. For this purpose, SVM was used as a machine learning technology and it was possible to restrain overloading in manufacturing process. In addition, we design CPS framework based on machine learning for Industry 4.0, simulate it, and perform. Simulation results show the simulation model autonomously detects the abnormal situation and it is dynamically reconfigured through self-healing.

  19. Protein structure modelling and evaluation based on a 4-distance description of side-chain interactions

    Directory of Open Access Journals (Sweden)

    Inbar Yuval

    2010-07-01

    Full Text Available Abstract Background Accurate evaluation and modelling of residue-residue interactions within and between proteins is a key aspect of computational structure prediction including homology modelling, protein-protein docking, refinement of low-resolution structures, and computational protein design. Results Here we introduce a method for accurate protein structure modelling and evaluation based on a novel 4-distance description of residue-residue interaction geometry. Statistical 4-distance preferences were extracted from high-resolution protein structures and were used as a basis for a knowledge-based potential, called Hunter. We demonstrate that 4-distance description of side chain interactions can be used reliably to discriminate the native structure from a set of decoys. Hunter ranked the native structure as the top one in 217 out of 220 high-resolution decoy sets, in 25 out of 28 "Decoys 'R' Us" decoy sets and in 24 out of 27 high-resolution CASP7/8 decoy sets. The same concept was applied to side chain modelling in protein structures. On a set of very high-resolution protein structures the average RMSD was 1.47 Å for all residues and 0.73 Å for buried residues, which is in the range of attainable accuracy for a model. Finally, we show that Hunter performs as good or better than other top methods in homology modelling based on results from the CASP7 experiment. The supporting web site http://bioinfo.weizmann.ac.il/hunter/ was developed to enable the use of Hunter and for visualization and interactive exploration of 4-distance distributions. Conclusions Our results suggest that Hunter can be used as a tool for evaluation and for accurate modelling of residue-residue interactions in protein structures. The same methodology is applicable to other areas involving high-resolution modelling of biomolecules.

  20. SVM-based feature extraction and classification of aflatoxin contaminated corn using fluorescence hyperspectral data

    Science.gov (United States)

    Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...

  1. Comparison between SARS CoV and MERS CoV Using Apriori Algorithm, Decision Tree, SVM

    Directory of Open Access Journals (Sweden)

    Jang Seongpil

    2016-01-01

    Full Text Available MERS (Middle East Respiratory Syndrome is a worldwide disease these days. The number of infected people is 1038(08/03/2015 in Saudi Arabia and 186(08/03/2015 in South Korea. MERS is all over the world including Europe and the fatality rate is 38.8%, East Asia and the Middle East. The MERS is also known as a cousin of SARS (Severe Acute Respiratory Syndrome because both diseases show similar symptoms such as high fever and difficulty in breathing. This is why we compared MERS with SARS. We used data of the spike glycoprotein from NCBI. As a way of analyzing the protein, apriori algorithm, decision tree, SVM were used, and particularly SVM was iterated by normal, polynomial, and sigmoid. The result came out that the MERS and the SARS are alike but also different in some way.

  2. Extended SVM algorithms for multilevel trans-Z-source inverter

    Directory of Open Access Journals (Sweden)

    Aida Baghbany Oskouei

    2016-03-01

    Full Text Available This paper suggests extended algorithms for multilevel trans-Z-source inverter. These algorithms are based on space vector modulation (SVM, which works with high switching frequency and does not generate the mean value of the desired load voltage in every switching interval. In this topology the output voltage is not limited to dc voltage source similar to traditional cascaded multilevel inverter and can be increased with trans-Z-network shoot-through state control. Besides, it is more reliable against short circuit, and due to several number of dc sources in each phase of this topology, it is possible to use it in hybrid renewable energy. Proposed SVM algorithms include the following: Combined modulation algorithm (SVPWM and shoot-through implementation in dwell times of voltage vectors algorithm. These algorithms are compared from viewpoint of simplicity, accuracy, number of switching, and THD. Simulation and experimental results are presented to demonstrate the expected representations.

  3. A novel transmission line protection using DOST and SVM

    Directory of Open Access Journals (Sweden)

    M. Jaya Bharata Reddy

    2016-06-01

    Full Text Available This paper proposes a smart fault detection, classification and location (SFDCL methodology for transmission systems with multi-generators using discrete orthogonal Stockwell transform (DOST. The methodology is based on synchronized current measurements from remote telemetry units (RTUs installed at both ends of the transmission line. The energy coefficients extracted from the transient current signals due to occurrence of different types of faults using DOST are being utilized for real-time fault detection and classification. Support vector machine (SVM has been deployed for locating the fault distance using the extracted coefficients. A comparative study is performed for establishing the superiority of SVM over other popular computational intelligence methods, such as adaptive neuro-fuzzy inference system (ANFIS and artificial neural network (ANN, for more precise and reliable estimation of fault distance. The results corroborate the effectiveness of the suggested SFDCL algorithm for real-time transmission line fault detection, classification and localization.

  4. SVM-based multisensor data fusion for phase concentration measurement in biomass-coal co-combustion

    Science.gov (United States)

    Wang, Xiaoxin; Hu, Hongli; Jia, Huiqin; Tang, Kaihao

    2018-05-01

    In this paper, the electrical method combines the electrostatic sensor and capacitance sensor to measure the phase concentration of pulverized coal/biomass/air three-phase flow through data fusion technology. In order to eliminate the effects of flow regimes and improve the accuracy of the phase concentration measurement, the mel frequency cepstrum coefficient features extracted from electrostatic signals are used to train the Continuous Gaussian Mixture Hidden Markov Model (CGHMM) for flow regime identification. Support Vector Machine (SVM) is introduced to establish the concentration information fusion model under identified flow regimes. The CGHMM models and SVM models are transplanted on digital signal processing (DSP) to realize on-line accurate measurement. The DSP flow regime identification time is 1.4 ms, and the concentration predict time is 164 μs, which can fully meet the real-time requirement. The average absolute value of the relative error of the pulverized coal is about 1.5% and that of the biomass is about 2.2%.

  5. Biometric identification based on feature fusion with PCA and SVM

    Science.gov (United States)

    Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina

    2018-04-01

    Biometric identification is gaining ground compared to traditional identification methods. Many biometric measurements may be used for secure human identification. The most reliable among them is the iris pattern because of its uniqueness, stability, unforgeability and inalterability over time. The approach presented in this paper is a fusion of different feature descriptor methods such as HOG, LIOP, LBP, used for extracting iris texture information. The classifiers obtained through the SVM and PCA methods demonstrate the effectiveness of our system applied to one and both irises. The performances measured are highly accurate and foreshadow a fusion system with a rate of identification approaching 100% on the UPOL database.

  6. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

    Science.gov (United States)

    Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

    2011-05-09

    Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM

  7. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

    Science.gov (United States)

    Paiva, Joana S; Cardoso, João; Pereira, Tânia

    2018-01-01

    The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach

    Directory of Open Access Journals (Sweden)

    Khader Shameer

    2010-06-01

    Full Text Available 3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neuro-degenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

  9. SVM and ANFIS Models for precipitaton Modeling (Case Study: GonbadKavouse

    Directory of Open Access Journals (Sweden)

    N. Zabet Pishkhani

    2016-10-01

    Full Text Available Introduction: In recent years, according to the intelligent models increased as new techniques and tools in hydrological processes such as precipitation forecasting. ANFIS model has good ability in train, construction and classification, and also has the advantage that allows the extraction of fuzzy rules from numerical information or knowledge. Another intelligent technique in recent years has been used in various areas is support vector machine (SVM. In this paper the ability of artificial intelligence methods including support vector machine (SVM and adaptive neuro fuzzy inference system (ANFIS were analyzed in monthly precipitation prediction. Materials and Methods: The study area was the city of Gonbad in Golestan Province. The city has a temperate climate in the southern highlands and southern plains, mountains and temperate humid, semi-arid and semi-arid in the north of Gorganroud river. In total, the city's climate is temperate and humid. In the present study, monthly precipitation was modeled in Gonbad using ANFIS and SVM and two different database structures were designed. The first structure: input layer consisted of mean temperature, relative humidity, pressure and wind speed at Gonbad station. The second structure: According to Pearson coefficient, the monthly precipitation data were used from four stations: Arazkoose, Bahalke, Tamar and Aqqala which had a higher correlation with Gonbad station precipitation. In this study precipitation data was used from 1995 to 2012. 80% data were used for model training and the remaining 20% of data for validation. SVM was developed from support vector machines in the 1990s by Vapnik. SVM has been widely recognized as a powerful tool to deal with function fitting problems. An Adaptive Neuro-Fuzzy Inference System (ANFIS refers, in general, to an adaptive network which performs the function of a fuzzy inference system. The most commonly used fuzzy system in ANFIS architectures is the Sugeno model

  10. Defining an essence of structure determining residue contacts in proteins.

    Science.gov (United States)

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the

  11. The Relationship Between Low-Frequency Motions and Community Structure of Residue Network in Protein Molecules.

    Science.gov (United States)

    Sun, Weitao

    2018-01-01

    The global shape of a protein molecule is believed to be dominant in determining low-frequency deformational motions. However, how structure dynamics relies on residue interactions remains largely unknown. The global residue community structure and the local residue interactions are two important coexisting factors imposing significant effects on low-frequency normal modes. In this work, an algorithm for community structure partition is proposed by integrating Miyazawa-Jernigan empirical potential energy as edge weight. A sensitivity parameter is defined to measure the effect of local residue interaction on low-frequency movement. We show that community structure is a more fundamental feature of residue contact networks. Moreover, we surprisingly find that low-frequency normal mode eigenvectors are sensitive to some local critical residue interaction pairs (CRIPs). A fair amount of CRIPs act as bridges and hold distributed structure components into a unified tertiary structure by bonding nearby communities. Community structure analysis and CRIP detection of 116 catalytic proteins reveal that breaking up of a CRIP can cause low-frequency allosteric movement of a residue at the far side of protein structure. The results imply that community structure and CRIP may be the structural basis for low-frequency motions.

  12. Measurement of conformational constraints in an elastin-mimetic protein by residue-pair selected solid-state NMR

    Energy Technology Data Exchange (ETDEWEB)

    Hong, Mei [Iowa State University, Department of Chemistry (United States)], E-mail: mhong@iastate.edu; McMillan, R. Andrew; Conticello, Vincent P. [Emory University, Department of Chemistry (United States)

    2002-02-15

    We introduce a solid-state NMR technique for selective detection of a residue pair in multiply labeled proteins to obtain site-specific structural constraints. The method exploits the frequency-offset dependence of cross polarization to achieve {sup 13}CO{sub i} {sup {yields}} {sup 15}N{sub i} {sup {yields}} {sup 13}C{alpha}{sub i} transfer between two residues. A {sup 13}C, {sup 15}N-labeled elastin mimetic protein (VPGVG){sub n} is used to demonstrate the method. The technique selected the Gly3 C{alpha} signal while suppressing the Gly5 C{alpha} signal, and allowed the measurement of the Gly3 C{alpha} chemical shift anisotropy to derive information on the protein conformation. This residue-pair selection technique should simplify the study of protein structure at specific residues.

  13. BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins

    Directory of Open Access Journals (Sweden)

    MuthuKrishnan Selvaraj

    2016-01-01

    Full Text Available The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM models were developed for predicting HbL proteins based upon amino acid composition (AC, dipeptide composition (DC, hybrid method (AC + DC, and position specific scoring matrix (PSSM. In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM profiles. The average accuracy, standard deviation (SD, false positive rate (FPR, confusion matrix, and receiver operating characteristic (ROC were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction.

  14. Electrostatics of cysteine residues in proteins: Parameterization and validation of a simple model

    Science.gov (United States)

    Salsbury, Freddie R.; Poole, Leslie B.; Fetrow, Jacquelyn S.

    2013-01-01

    One of the most popular and simple models for the calculation of pKas from a protein structure is the semi-macroscopic electrostatic model MEAD. This model requires empirical parameters for each residue to calculate pKas. Analysis of current, widely used empirical parameters for cysteine residues showed that they did not reproduce expected cysteine pKas; thus, we set out to identify parameters consistent with the CHARMM27 force field that capture both the behavior of typical cysteines in proteins and the behavior of cysteines which have perturbed pKas. The new parameters were validated in three ways: (1) calculation across a large set of typical cysteines in proteins (where the calculations are expected to reproduce expected ensemble behavior); (2) calculation across a set of perturbed cysteines in proteins (where the calculations are expected to reproduce the shifted ensemble behavior); and (3) comparison to experimentally determined pKa values (where the calculation should reproduce the pKa within experimental error). Both the general behavior of cysteines in proteins and the perturbed pKa in some proteins can be predicted reasonably well using the newly determined empirical parameters within the MEAD model for protein electrostatics. This study provides the first general analysis of the electrostatics of cysteines in proteins, with specific attention paid to capturing both the behavior of typical cysteines in a protein and the behavior of cysteines whose pKa should be shifted, and validation of force field parameters for cysteine residues. PMID:22777874

  15. A Novel Approach for Protein-Named Entity Recognition and Protein-Protein Interaction Extraction

    Directory of Open Access Journals (Sweden)

    Meijing Li

    2015-01-01

    Full Text Available Many researchers focus on developing protein-named entity recognition (Protein-NER or PPI extraction systems. However, the studies about these two topics cannot be merged well; then existing PPI extraction systems’ Protein-NER still needs to improve. In this paper, we developed the protein-protein interaction extraction system named PPIMiner based on Support Vector Machine (SVM and parsing tree. PPIMiner consists of three main models: natural language processing (NLP model, Protein-NER model, and PPI discovery model. The Protein-NER model, which is named ProNER, identifies the protein names based on two methods: dictionary-based method and machine learning-based method. ProNER is capable of identifying more proteins than dictionary-based Protein-NER model in other existing systems. The final discovered PPIs extracted via PPI discovery model are represented in detail because we showed the protein interaction types and the occurrence frequency through two different methods. In the experiments, the result shows that the performances achieved by our ProNER and PPI discovery model are better than other existing tools. PPIMiner applied this protein-named entity recognition approach and parsing tree based PPI extraction method to improve the performance of PPI extraction. We also provide an easy-to-use interface to access PPIs database and an online system for PPIs extraction and Protein-NER.

  16. An SVM classifier to separate false signals from microcalcifications in digital mammograms

    Energy Technology Data Exchange (ETDEWEB)

    Bazzani, Armando; Bollini, Dante; Brancaccio, Rosa; Campanini, Renato; Riccardi, Alessandro; Romani, Davide [Department of Physics, University of Bologna (Italy); INFN, Bologna (Italy); Lanconelli, Nico [Department of Physics, University of Bologna, and INFN, Bologna (Italy). E-mail: nico.lanconelli@bo.infn.it; Bevilacqua, Alessandro [Department of Electronics, Computer Science and Systems, University of Bologna, and INFN, Bologna (Italy)

    2001-06-01

    In this paper we investigate the feasibility of using an SVM (support vector machine) classifier in our automatic system for the detection of clustered microcalcifications in digital mammograms. SVM is a technique for pattern recognition which relies on the statistical learning theory. It minimizes a function of two terms: the number of misclassified vectors of the training set and a term regarding the generalization classifier capability. We compare the SVM classifier with an MLP (multi-layer perceptron) in the false-positive reduction phase of our detection scheme: a detected signal is considered either microcalcification or false signal, according to the value of a set of its features. The SVM classifier gets slightly better results than the MLP one (Az value of 0.963 against 0.958) in the presence of a high number of training data; the improvement becomes much more evident (Az value of 0.952 against 0.918) in training sets of reduced size. Finally, the setting of the SVM classifier is much easier than the MLP one. (author)

  17. Rapid measurement of residual dipolar couplings for fast fold elucidation of proteins

    Energy Technology Data Exchange (ETDEWEB)

    Rasia, Rodolfo M. [Jean-Pierre Ebel CNRS/CEA/UJF, Institut de Biologie Structurale (France); Lescop, Ewen [CNRS, Institut de Chimie des Substances Naturelles (France); Palatnik, Javier F. [Universidad Nacional de Rosario, Instituto de Biologia Molecular y Celular de Rosario, Facultad de Ciencias Bioquimicas y Farmaceuticas (Argentina); Boisbouvier, Jerome, E-mail: jerome.boisbouvier@ibs.fr; Brutscher, Bernhard, E-mail: Bernhard.brutscher@ibs.fr [Jean-Pierre Ebel CNRS/CEA/UJF, Institut de Biologie Structurale (France)

    2011-11-15

    It has been demonstrated that protein folds can be determined using appropriate computational protocols with NMR chemical shifts as the sole source of experimental restraints. While such approaches are very promising they still suffer from low convergence resulting in long computation times to achieve accurate results. Here we present a suite of time- and sensitivity optimized NMR experiments for rapid measurement of up to six RDCs per residue. Including such an RDC data set, measured in less than 24 h on a single aligned protein sample, greatly improves convergence of the Rosetta-NMR protocol, allowing for overnight fold calculation of small proteins. We demonstrate the performance of our fast fold calculation approach for ubiquitin as a test case, and for two RNA-binding domains of the plant protein HYL1. Structure calculations based on simulated RDC data highlight the importance of an accurate and precise set of several complementary RDCs as additional input restraints for high-quality de novo structure determination.

  18. Glycated Lysine Residues: A Marker for Non-Enzymatic Protein Glycation in Age-Related Diseases

    Directory of Open Access Journals (Sweden)

    Nadeem A. Ansari

    2011-01-01

    Full Text Available Nonenzymatic glycosylation or glycation of macromolecules, especially proteins leading to their oxidation, play an important role in diseases. Glycation of proteins primarily results in the formation of an early stage and stable Amadori-lysine product which undergo further irreversible chemical reactions to form advanced glycation endproducts (AGEs. This review focuses these products in lysine rich proteins such as collagen and human serum albumin for their role in aging and age-related diseases. Antigenic characteristics of glycated lysine residues in proteins together with the presence of serum autoantibodies to the glycated lysine products and lysine-rich proteins in diabetes and arthritis patients indicates that these modified lysine residues may be a novel biomarker for protein glycation in aging and age-related diseases.

  19. Electrostatics of cysteine residues in proteins: parameterization and validation of a simple model.

    Science.gov (United States)

    Salsbury, Freddie R; Poole, Leslie B; Fetrow, Jacquelyn S

    2012-11-01

    One of the most popular and simple models for the calculation of pK(a) s from a protein structure is the semi-macroscopic electrostatic model MEAD. This model requires empirical parameters for each residue to calculate pK(a) s. Analysis of current, widely used empirical parameters for cysteine residues showed that they did not reproduce expected cysteine pK(a) s; thus, we set out to identify parameters consistent with the CHARMM27 force field that capture both the behavior of typical cysteines in proteins and the behavior of cysteines which have perturbed pK(a) s. The new parameters were validated in three ways: (1) calculation across a large set of typical cysteines in proteins (where the calculations are expected to reproduce expected ensemble behavior); (2) calculation across a set of perturbed cysteines in proteins (where the calculations are expected to reproduce the shifted ensemble behavior); and (3) comparison to experimentally determined pK(a) values (where the calculation should reproduce the pK(a) within experimental error). Both the general behavior of cysteines in proteins and the perturbed pK(a) in some proteins can be predicted reasonably well using the newly determined empirical parameters within the MEAD model for protein electrostatics. This study provides the first general analysis of the electrostatics of cysteines in proteins, with specific attention paid to capturing both the behavior of typical cysteines in a protein and the behavior of cysteines whose pK(a) should be shifted, and validation of force field parameters for cysteine residues. Copyright © 2012 Wiley Periodicals, Inc.

  20. Fault Diagnosis of Complex Industrial Process Using KICA and Sparse SVM

    Directory of Open Access Journals (Sweden)

    Jie Xu

    2013-01-01

    Full Text Available New approaches are proposed for complex industrial process monitoring and fault diagnosis based on kernel independent component analysis (KICA and sparse support vector machine (SVM. The KICA method is a two-phase algorithm: whitened kernel principal component analysis (KPCA. The data are firstly mapped into high-dimensional feature subspace. Then, the ICA algorithm seeks the projection directions in the KPCA whitened space. Performance monitoring is implemented through constructing the statistical index and control limit in the feature space. If the statistical indexes exceed the predefined control limit, a fault may have occurred. Then, the nonlinear score vectors are calculated and fed into the sparse SVM to identify the faults. The proposed method is applied to the simulation of Tennessee Eastman (TE chemical process. The simulation results show that the proposed method can identify various types of faults accurately and rapidly.

  1. An improved conjugate gradient scheme to the solution of least squares SVM.

    Science.gov (United States)

    Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya

    2005-03-01

    The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.

  2. Receptor-based screening assays for the detection of antibiotics residues - A review.

    Science.gov (United States)

    Ahmed, Saeed; Ning, Jianan; Cheng, Guyue; Ahmad, Ijaz; Li, Jun; Mingyue, Liu; Qu, Wei; Iqbal, Mujahid; Shabbir, M A B; Yuan, Zonghui

    2017-05-01

    Consumer and regulatory agencies have a high concern to antibiotic residues in food producing animals, so appropriate screening assays of fast, sensitive, low cost, and easy sample preparation for the identification of these residues are essential for the food-safety insurance. Great efforts in the development of a high-throughput antibiotic screening assay have been made in recent years. Concerning the screening of antibiotic residue, this review elaborate an overview on the availability, advancement and applicability of antibiotic receptor based screening assays for the safety assessment of antibiotics usage (i.e. radio receptor assay, enzyme labeling assays, colloidal gold receptor assay, enzyme colorimetry assay and biosensor assay). This manuscript also tries to shed a light on the selection, preparation and future perspective of receptor protein for antibiotic residue detection. These assays have been introduced for the screening of numerous food samples. Receptor based screening technology for antibiotic detection has high accuracy. It has been concluded that at the same time, it can detect a class of drugs for certain receptor, and realize the multi-residue detection. These assays offer fast, easy and precise detection of antibiotics. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. SVM-based prediction of propeptide cleavage sites in spider toxins identifies toxin innovation in an Australian tarantula.

    Directory of Open Access Journals (Sweden)

    Emily S W Wong

    Full Text Available Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree and developed an algorithm (SpiderP for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html, a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from

  4. Parameters Optimization and Application to Glutamate Fermentation Model Using SVM

    OpenAIRE

    Zhang, Xiangsheng; Pan, Feng

    2015-01-01

    Aimed at the parameters optimization in support vector machine (SVM) for glutamate fermentation modelling, a new method is developed. It optimizes the SVM parameters via an improved particle swarm optimization (IPSO) algorithm which has better global searching ability. The algorithm includes detecting and handling the local convergence and exhibits strong ability to avoid being trapped in local minima. The material step of the method was shown. Simulation experiments demonstrate the effective...

  5. Multiplex protein pattern unmixing using a non-linear variable-weighted support vector machine as optimized by a particle swarm optimization algorithm.

    Science.gov (United States)

    Yang, Qin; Zou, Hong-Yan; Zhang, Yan; Tang, Li-Juan; Shen, Guo-Li; Jiang, Jian-Hui; Yu, Ru-Qin

    2016-01-15

    Most of the proteins locate more than one organelle in a cell. Unmixing the localization patterns of proteins is critical for understanding the protein functions and other vital cellular processes. Herein, non-linear machine learning technique is proposed for the first time upon protein pattern unmixing. Variable-weighted support vector machine (VW-SVM) is a demonstrated robust modeling technique with flexible and rational variable selection. As optimized by a global stochastic optimization technique, particle swarm optimization (PSO) algorithm, it makes VW-SVM to be an adaptive parameter-free method for automated unmixing of protein subcellular patterns. Results obtained by pattern unmixing of a set of fluorescence microscope images of cells indicate VW-SVM as optimized by PSO is able to extract useful pattern features by optimally rescaling each variable for non-linear SVM modeling, consequently leading to improved performances in multiplex protein pattern unmixing compared with conventional SVM and other exiting pattern unmixing methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Directory of Open Access Journals (Sweden)

    Xiaoxia Yang

    Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  7. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Science.gov (United States)

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  8. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

    Directory of Open Access Journals (Sweden)

    Huiying Zhao

    Full Text Available As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions. A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC of 0.77 with high precision (94% and high sensitivity (65%. We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA] is available as an on-line server at http://sparks-lab.org.

  9. Maximizing Selective Cleavages at Aspartic Acid and Proline Residues for the Identification of Intact Proteins

    Science.gov (United States)

    Foreman, David J.; Dziekonski, Eric T.; McLuckey, Scott A.

    2018-04-01

    A new approach for the identification of intact proteins has been developed that relies on the generation of relatively few abundant products from specific cleavage sites. This strategy is intended to complement standard approaches that seek to generate many fragments relatively non-selectively. Specifically, this strategy seeks to maximize selective cleavage at aspartic acid and proline residues via collisional activation of precursor ions formed via electrospray ionization (ESI) under denaturing conditions. A statistical analysis of the SWISS-PROT database was used to predict the number of arginine residues for a given intact protein mass and predict a m/z range where the protein carries a similar charge to the number of arginine residues thereby enhancing cleavage at aspartic acid residues by limiting proton mobility. Cleavage at aspartic acid residues is predicted to be most favorable in the m/z range of 1500-2500, a range higher than that normally generated by ESI at low pH. Gas-phase proton transfer ion/ion reactions are therefore used for precursor ion concentration from relatively high charge states followed by ion isolation and subsequent generation of precursor ions within the optimal m/z range via a second proton transfer reaction step. It is shown that the majority of product ion abundance is concentrated into cleavages C-terminal to aspartic acid residues and N-terminal to proline residues for ions generated by this process. Implementation of a scoring system that weights both ion fragment type and ion fragment area demonstrated identification of standard proteins, ranging in mass from 8.5 to 29.0 kDa. [Figure not available: see fulltext.

  10. Support vector machine regression (LS-SVM)--an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-06-28

    A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is

  11. "Active Flux" DTFC-SVM Sensorless Control of IPMSM

    DEFF Research Database (Denmark)

    Boldea, Ion; Codruta Paicu, Mihaela; Gheorghe-Daniel, Andreescu,

    2009-01-01

    This paper proposes an implementation of a motionsensorless control system in wide speed range based on "active flux" observer, and direct torque and flux control with space vector modulation (DTFC-SVM) for the interior permanent magnet synchronous motor (IPMSM), without signal injection....... The concept of "active flux" (or "torque producing flux") turns all the rotor salient-pole ac machines into fully nonsalient-pole ones. A new function for Lq inductance depending on torque is introduced to model the magnetic saturation. Notable simplification in the rotor position and speed estimation...

  12. Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

    Directory of Open Access Journals (Sweden)

    Taigang Liu

    2015-12-01

    Full Text Available The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE. These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

  13. Predicting protein folding rate change upon point mutation using residue-level coevolutionary information.

    Science.gov (United States)

    Mallik, Saurav; Das, Smita; Kundu, Sudip

    2016-01-01

    Change in folding kinetics of globular proteins upon point mutation is crucial to a wide spectrum of biological research, such as protein misfolding, toxicity, and aggregations. Here we seek to address whether residue-level coevolutionary information of globular proteins can be informative to folding rate changes upon point mutations. Generating residue-level coevolutionary networks of globular proteins, we analyze three parameters: relative coevolution order (rCEO), network density (ND), and characteristic path length (CPL). A point mutation is considered to be equivalent to a node deletion of this network and respective percentage changes in rCEO, ND, CPL are found linearly correlated (0.84, 0.73, and -0.61, respectively) with experimental folding rate changes. The three parameters predict the folding rate change upon a point mutation with 0.031, 0.045, and 0.059 standard errors, respectively. © 2015 Wiley Periodicals, Inc.

  14. Comparison of two Classification methods (MLC and SVM) to extract land use and land cover in Johor Malaysia

    Science.gov (United States)

    Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.

    2014-06-01

    Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.

  15. Comparison of two Classification methods (MLC and SVM) to extract land use and land cover in Johor Malaysia

    International Nuclear Information System (INIS)

    Deilmai, B Rokni; Ahmad, B Bin; Zabihi, H

    2014-01-01

    Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification

  16. [Application of optimized parameters SVM based on photoacoustic spectroscopy method in fault diagnosis of power transformer].

    Science.gov (United States)

    Zhang, Yu-xin; Cheng, Zhi-feng; Xu, Zheng-ping; Bai, Jing

    2015-01-01

    In order to solve the problems such as complex operation, consumption for the carrier gas and long test period in traditional power transformer fault diagnosis approach based on dissolved gas analysis (DGA), this paper proposes a new method which is detecting 5 types of characteristic gas content in transformer oil such as CH4, C2H2, C2H4, C2H6 and H2 based on photoacoustic Spectroscopy and C2H2/C2H4, CH4/H2, C2H4/C2H6 three-ratios data are calculated. The support vector machine model was constructed using cross validation method under five support vector machine functions and four kernel functions, heuristic algorithms were used in parameter optimization for penalty factor c and g, which to establish the best SVM model for the highest fault diagnosis accuracy and the fast computing speed. Particles swarm optimization and genetic algorithm two types of heuristic algorithms were comparative studied in this paper for accuracy and speed in optimization. The simulation result shows that SVM model composed of C-SVC, RBF kernel functions and genetic algorithm obtain 97. 5% accuracy in test sample set and 98. 333 3% accuracy in train sample set, and genetic algorithm was about two times faster than particles swarm optimization in computing speed. The methods described in this paper has many advantages such as simple operation, non-contact measurement, no consumption for the carrier gas, long test period, high stability and sensitivity, the result shows that the methods described in this paper can instead of the traditional transformer fault diagnosis by gas chromatography and meets the actual project needs in transformer fault diagnosis.

  17. Proteomic Investigation of Protein Profile Changes and Amino Acid Residue Level Modification in Cooked Lamb Meat: The Effect of Boiling.

    Science.gov (United States)

    Yu, Tzer-Yang; Morton, James D; Clerens, Stefan; Dyer, Jolon M

    2015-10-21

    Hydrothermal treatment (heating in water) is a common method of general food processing and preparation. For red-meat-based foods, boiling is common; however, how the molecular level effects of this treatment correlate to the overall food properties is not yet well-understood. The effects of differing boiling times on lamb meat and the resultant cooking water were here examined through proteomic evaluation. The longer boiling time was found to result in increased protein aggregation involving particularly proteins such as glyceraldehyde-3-phosphate dehydrogenase, as well as truncation in proteins such as in α-actinin-2. Heat-induced protein backbone cleavage was observed adjacent to aspartic acid and asparagine residues. Side-chain modifications of amino acid residues resulting from the heating, including oxidation of phenylalanine and formation of carboxyethyllysine, were characterized in the cooked samples. Actin and myoglobin bands from the cooked meat per se remained visible on sodium dodecyl sulfate-polyacrylamide gel electrophoresis, even after significant cooking time. These proteins were also found to be the major source of observed heat-induced modifications. This study provides new insights into molecular-level modifications occurring in lamb meat proteins during boiling and a protein chemistry basis for better understanding the effect of this common treatment on the nutritional and functional properties of red-meat-based foods.

  18. A DDoS Attack Detection Method Based on SVM in Software Defined Network

    Directory of Open Access Journals (Sweden)

    Jin Ye

    2018-01-01

    Full Text Available The detection of DDoS attacks is an important topic in the field of network security. The occurrence of software defined network (SDN (Zhang et al., 2018 brings up some novel methods to this topic in which some deep learning algorithm is adopted to model the attack behavior based on collecting from the SDN controller. However, the existing methods such as neural network algorithm are not practical enough to be applied. In this paper, the SDN environment by mininet and floodlight (Ning et al., 2014 simulation platform is constructed, 6-tuple characteristic values of the switch flow table is extracted, and then DDoS attack model is built by combining the SVM classification algorithms. The experiments show that average accuracy rate of our method is 95.24% with a small amount of flow collecting. Our work is of good value for the detection of DDoS attack in SDN.

  19. DISEÑO Y EVALUACIÓN DE UN CLASIFICADOR DE TEXTURAS BASADO EN LS-SVM

    Directory of Open Access Journals (Sweden)

    Beitmantt Cárdenas Quintero

    2013-07-01

    Full Text Available Evaluar el desempeño y el costo computacional de diferentes arquitecturas y metodologías Least Square Support Vector Machine (LS-SVM ante la segmentación de imágenes por textura y a partir de dichos resultados postular un modelo de un clasificador de texturas LS-SVM.  Metodología: Ante un problema de clasificación binaria representado por la segmentación  de 32 imágenes, organizadas en 4 grupos y formadas por pares de texturas típicas (granito/corteza, ladrillo/tapicería, madera/mármol, tejido/pelaje, se mide y compara el desempeño y el costo computacional de dos tipos de núcleo (Radial / Polinomial, dos funciones de optimización (mínimo local / búsqueda exhaustiva y dos funciones de costo (validación cruzada aleatoria / Validación cruzada dejando al menos uno en una LS-SVM que toma como entrada los pixeles que conforman la vecindad cruz del pixel a evaluar (no se hace extracción de características. Resultados: LS-SVM como clasificador de texturas, presenta mejor desempeño y exige menor costo computacional cuando utiliza un kernel de base radial y una función de optimización basada en un algoritmo de búsqueda de mínimos locales acompañado de una función de costo que use validación cruzada aleatoria.

  20. Weighted Feature Gaussian Kernel SVM for Emotion Recognition.

    Science.gov (United States)

    Wei, Wei; Jia, Qingxuan

    2016-01-01

    Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.

  1. Effects of lysine residues on structural characteristics and stability of tau proteins

    International Nuclear Information System (INIS)

    Lee, Myeongsang; Baek, Inchul; Choi, Hyunsung; Kim, Jae In; Na, Sungsoo

    2015-01-01

    Pathological amyloid proteins have been implicated in neuro-degenerative diseases, specifically Alzheimer's, Parkinson's, Lewy-body diseases and prion related diseases. In prion related diseases, functional tau proteins can be transformed into pathological agents by environmental factors, including oxidative stress, inflammation, Aβ-mediated toxicity and covalent modification. These pathological agents are stable under physiological conditions and are not easily degraded. This un-degradable characteristic of tau proteins enables their utilization as functional materials to capturing the carbon dioxides. For the proper utilization of amyloid proteins as functional materials efficiently, a basic study regarding their structural characteristic is necessary. Here, we investigated the basic tau protein structure of wild-type (WT) and tau proteins with lysine residues mutation at glutamic residue (Q2K) on tau protein at atomistic scale. We also reported the size effect of both the WT and Q2K structures, which allowed us to identify the stability of those amyloid structures. - Highlights: • Lysine mutation effect alters the structure conformation and characteristic of tau. • Over the 15 layers both WT and Q2K models, both tau proteins undergo fractions. • Lysine mutation causes the increment of non-bonded energy and solvent accessible surface area. • Structural instability of Q2K model was proved by the number of hydrogen bonds analysis.

  2. Effects of lysine residues on structural characteristics and stability of tau proteins

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Myeongsang; Baek, Inchul; Choi, Hyunsung; Kim, Jae In; Na, Sungsoo, E-mail: nass@korea.ac.kr

    2015-10-23

    Pathological amyloid proteins have been implicated in neuro-degenerative diseases, specifically Alzheimer's, Parkinson's, Lewy-body diseases and prion related diseases. In prion related diseases, functional tau proteins can be transformed into pathological agents by environmental factors, including oxidative stress, inflammation, Aβ-mediated toxicity and covalent modification. These pathological agents are stable under physiological conditions and are not easily degraded. This un-degradable characteristic of tau proteins enables their utilization as functional materials to capturing the carbon dioxides. For the proper utilization of amyloid proteins as functional materials efficiently, a basic study regarding their structural characteristic is necessary. Here, we investigated the basic tau protein structure of wild-type (WT) and tau proteins with lysine residues mutation at glutamic residue (Q2K) on tau protein at atomistic scale. We also reported the size effect of both the WT and Q2K structures, which allowed us to identify the stability of those amyloid structures. - Highlights: • Lysine mutation effect alters the structure conformation and characteristic of tau. • Over the 15 layers both WT and Q2K models, both tau proteins undergo fractions. • Lysine mutation causes the increment of non-bonded energy and solvent accessible surface area. • Structural instability of Q2K model was proved by the number of hydrogen bonds analysis.

  3. Predicting beta-turns in proteins using support vector machines with fractional polynomials.

    Science.gov (United States)

    Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng

    2013-11-07

    β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

  4. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. Parameters Optimization and Application to Glutamate Fermentation Model Using SVM

    Directory of Open Access Journals (Sweden)

    Xiangsheng Zhang

    2015-01-01

    Full Text Available Aimed at the parameters optimization in support vector machine (SVM for glutamate fermentation modelling, a new method is developed. It optimizes the SVM parameters via an improved particle swarm optimization (IPSO algorithm which has better global searching ability. The algorithm includes detecting and handling the local convergence and exhibits strong ability to avoid being trapped in local minima. The material step of the method was shown. Simulation experiments demonstrate the effectiveness of the proposed algorithm.

  6. Comparison of two methods forecasting binding rate of plasma protein.

    Science.gov (United States)

    Hongjiu, Liu; Yanrong, Hu

    2014-01-01

    By introducing the descriptors calculated from the molecular structure, the binding rates of plasma protein (BRPP) with seventy diverse drugs are modeled by a quantitative structure-activity relationship (QSAR) technique. Two algorithms, heuristic algorithm (HA) and support vector machine (SVM), are used to establish linear and nonlinear models to forecast BRPP. Empirical analysis shows that there are good performances for HA and SVM with cross-validation correlation coefficients Rcv(2) of 0.80 and 0.83. Comparing HA with SVM, it was found that SVM has more stability and more robustness to forecast BRPP.

  7. A Method to Integrate GMM, SVM and DTW for Speaker Recognition

    Directory of Open Access Journals (Sweden)

    Ing-Jr Ding

    2014-01-01

    Full Text Available This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM, support vector machine (SVM, and dynamic time wrapping (DTW for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy.

  8. Analysis of core-periphery organization in protein contact networks reveals groups of structurally and functionally critical residues.

    Science.gov (United States)

    Isaac, Arnold Emerson; Sinha, Sitabhra

    2015-10-01

    The representation of proteins as networks of interacting amino acids, referred to as protein contact networks (PCN), and their subsequent analyses using graph theoretic tools, can provide novel insights into the key functional roles of specific groups of residues. We have characterized the networks corresponding to the native states of 66 proteins (belonging to different families) in terms of their core-periphery organization. The resulting hierarchical classification of the amino acid constituents of a protein arranges the residues into successive layers - having higher core order - with increasing connection density, ranging from a sparsely linked periphery to a densely intra-connected core (distinct from the earlier concept of protein core defined in terms of the three-dimensional geometry of the native state, which has least solvent accessibility). Our results show that residues in the inner cores are more conserved than those at the periphery. Underlining the functional importance of the network core, we see that the receptor sites for known ligand molecules of most proteins occur in the innermost core. Furthermore, the association of residues with structural pockets and cavities in binding or active sites increases with the core order. From mutation sensitivity analysis, we show that the probability of deleterious or intolerant mutations also increases with the core order. We also show that stabilization centre residues are in the innermost cores, suggesting that the network core is critically important in maintaining the structural stability of the protein. A publicly available Web resource for performing core-periphery analysis of any protein whose native state is known has been made available by us at http://www.imsc.res.in/ ~sitabhra/proteinKcore/index.html.

  9. Orientation-dependent backbone-only residue pair scoring functions for fixed backbone protein design

    Directory of Open Access Journals (Sweden)

    Bordner Andrew J

    2010-04-01

    Full Text Available Abstract Background Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance. Results Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only and 6D (position and orientation distributions of residue pairs. The performance of the 1D (residue separation only, 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation

  10. Fuzzy Pruning Based LS-SVM Modeling Development for a Fermentation Process

    Directory of Open Access Journals (Sweden)

    Weili Xiong

    2014-01-01

    Full Text Available Due to the complexity and uncertainty of microbial fermentation processes, data coming from the plants often contain some outliers. However, these data may be treated as the normal support vectors, which always deteriorate the performance of soft sensor modeling. Since the outliers also contaminate the correlation structure of the least square support vector machine (LS-SVM, the fuzzy pruning method is provided to deal with the problem. Furthermore, by assigning different fuzzy membership scores to data samples, the sensitivity of the model to the outliers can be reduced greatly. The effectiveness and efficiency of the proposed approach are demonstrated through two numerical examples as well as a simulator case of penicillin fermentation process.

  11. Improvement of hydrogen bond geometry in protein NMR structures by residual dipolar couplings - an assessment of the interrelation of NMR restraints

    Energy Technology Data Exchange (ETDEWEB)

    Jensen, Pernille Rose; Axelsen, Jacob Bock [University of Copenhagen, Institute of Molecular Biology (Denmark); Lerche, Mathilde Hauge [Amersham Health (Sweden); Poulsen, Flemming M. [University of Copenhagen, Institute of Molecular Biology (Denmark)], E-mail: fmp@apk.molbio.ku.dk

    2004-01-15

    We have examined how the hydrogen bond geometry in three different proteins is affected when structural restraints based on measurements of residual dipolar couplings are included in the structure calculations. The study shows, that including restraints based solely on {sup 1}H{sup N}-{sup 15}N residual dipolar couplings has pronounced impact on the backbone rmsd and Ramachandran plot but does not improve the hydrogen bond geometry. In the case of chymotrypsin inhibitor 2 the addition of {sup 13}CO-{sup 13}C{sup {alpha}} and {sup 15}N-{sup 13}CO one bond dipolar couplings as restraints in the structure calculations improved the hydrogen bond geometry to a quality comparable to that obtained in the 1.8 A resolution X-ray structure of this protein. A systematic restraint study was performed, in which four types of restraints, residual dipolar couplings, hydrogen bonds, TALOS angles and NOEs, were allowed in two states. This study revealed the importance of using several types of residual dipolar couplings to get good hydrogen bond geometry. The study also showed that using a small set of NOEs derived only from the amide protons, together with a full set of residual dipolar couplings resulted in structures of very high quality. When reducing the NOE set, it is mainly the side-chain to side-chain NOEs that are removed. Despite of this the effect on the side-chain packing is very small when a reduced NOE set is used, which implies that the over all fold of a protein structure is mainly determined by correct folding of the backbone.

  12. Modeling of SVM Diode Clamping Three-Level Inverter Connected to Grid

    DEFF Research Database (Denmark)

    Guo, Yougui; Zeng, Ping; Zhu, Jieqiong

    2011-01-01

    PLECS is used to model the diode clamping three-level inverter connected to grid and good results are obtained. First the output voltage SVM is described for diode clamping three-level inverter with loads connected to Y. Then the output voltage SVM of diode clamping three-level inverter is simply...... analyzed with loads connected to △. But it will be further researched in the future. Third, PLECS is briefly introduced. Fourth, the modeling of diode clamping three-level inverter is briefly presented with PLECS. Finally, a series of simulations are carried out. The simulation results tell us PLECS...... is very powerful tool to real power circuits and it is very easy to simulate them. They have also verified that SVM control strategy is feasible to control the diode clamping three-level inverter....

  13. A comparative QSAR study on the estrogenic activities of persistent organic pollutants by PLS and SVM

    Directory of Open Access Journals (Sweden)

    Fei Li

    2015-11-01

    Full Text Available Quantitative structure-activity relationships (QSARs were determined using partial least square (PLS and support vector machine (SVM. The predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, van der Waals volumes and polarizabilities. Comparison of the results obtained from two models, the SVM method exhibited better overall performances. Besides, three PLS models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.

  14. Residue 182 influences the second step of protein-tyrosine phosphatase-mediated catalysis

    DEFF Research Database (Denmark)

    Pedersen, A.K.; Guo, X.; Møller, K.B.

    2004-01-01

    Previous enzyme kinetic and structural studies have revealed a critical role for Asp(181) (PTP1B numbering) in PTP (protein-tyrosine phosphatase)-mediated catalysis. In the E-P (phosphoenzyme) formation step, Asp(181) functions as a general acid, while in the E-P hydrolysis step it acts...... as a general base. Most of our understanding of the role of Asp(181). is derived from studies with the Yersinia PTP and the mammalian PTP1B, and to some extent also TC (T-cell)-PTP and, the related PTPalpha and PTPepsilon. The neighbouring residue 182 is a phenylalanine in these four mammalian enzymes...... and a glutamine in Yersinia PTP. Surprisingly, little attention has been paid to the fact that this residue is a histidine in most other mammalian PTPs. Using a reciprocal single-point mutational approach with introduction of His(182) in PTP1B and Phe(182) in PTPH1, we demonstrate here that His(182)-PTPs...

  15. [Environment of tryptophan residues in proteins--a factor for stability to oxidative nitrosylation. I. Analysis of primary structure].

    Science.gov (United States)

    Beda, N V; Nedospasov, A A

    2001-01-01

    Micellar catalysis under aerobic conditions effectively accelerates oxidative nitrosylation because of solubilization of NO and O2 by protein membranes and hydrophobic nuclei. Nitrosylating intermediates NOx (NO2, N2O3, N2O4) form mainly in the hydrophobic phase, and therefore their solubility in aqueous phase is low and hydrolysis is rapid, local concentration of NOx in the hydrophobic phase being essentially higher than in aqueous. Tryptophan is a hydrophobic residue and can nitrosylate with the formation of isomer N-nitrosotryptophans (NOW). Without denitrosylation mechanism, the accumulation of NOW in proteins of NO-synthesizing organisms would be constant, and long-living proteins would contain essential amounts of NOW, which is however not the case. Using Protein Data Bank (more than 78,000 sequences) we investigated the distribution of tryptophan residues environment (22 residues on each side of polypeptide chain) in proteins with known primary structure. Charged and polar residues (D, H, K, N, Q, R, S) are more incident in the immediate surrounding of tryptophan (-6, -5, -2, -1, 1, 2, 4) and hydrophobic residues (A, F, I, L, V, Y) are more rare than in remote positions. Hence, an essential part of tryptophan residues is situated in hydrophilic environment, which decreases the nitrosylation velocity because of lower NOx concentration in aqueous phase and allows the denitrosylation reactions course via nitrosonium ion transfer on nucleophils of functional groups of protein and low-molecular compounds in aqueous phase.

  16. Comparison of ANN and SVM for classification of eye movements in EOG signals

    Science.gov (United States)

    Qi, Lim Jia; Alias, Norma

    2018-03-01

    Nowadays, electrooculogram is regarded as one of the most important biomedical signal in measuring and analyzing eye movement patterns. Thus, it is helpful in designing EOG-based Human Computer Interface (HCI). In this research, electrooculography (EOG) data was obtained from five volunteers. The (EOG) data was then preprocessed before feature extraction methods were employed to further reduce the dimensionality of data. Three feature extraction approaches were put forward, namely statistical parameters, autoregressive (AR) coefficients using Burg method, and power spectral density (PSD) using Yule-Walker method. These features would then become input to both artificial neural network (ANN) and support vector machine (SVM). The performance of the combination of different feature extraction methods and classifiers was presented and analyzed. It was found that statistical parameters + SVM achieved the highest classification accuracy of 69.75%.

  17. Fast subcellular localization by cascaded fusion of signal-based and homology-based methods

    Directory of Open Access Journals (Sweden)

    Wang Wei

    2011-10-01

    Full Text Available Abstract Background The functions of proteins are closely related to their subcellular locations. In the post-genomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. Results This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by a cascaded fusion of cleavage site prediction and profile alignment. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor using the information in their N-terminal shorting signals. Then, the sequences are truncated at the cleavage site positions, and the shortened sequences are passed to PSI-BLAST for computing their profiles. Subcellular localization are subsequently predicted by a profile-to-profile alignment support-vector-machine (SVM classifier. To further reduce the training and recognition time of the classifier, the SVM classifier is replaced by a new kernel method based on the perturbational discriminant analysis (PDA. Conclusions Experimental results on a new dataset based on Swiss-Prot Release 57.5 show that the method can make use of the best property of signal- and homology-based approaches and can attain an accuracy comparable to that achieved by using full-length sequences. Analysis of profile-alignment score matrices suggest that both profile creation time and profile alignment time can be reduced without significant reduction in subcellular localization accuracy. It was found that PDA enjoys a short training time as compared to the conventional SVM. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.

  18. Study on specificity of colon carcinoma-associated serum markers and establishment of SVM prediction model

    Directory of Open Access Journals (Sweden)

    Lu Li

    2017-03-01

    Full Text Available We aimed to evaluate the specificity of 12 tumor markers related to colon carcinoma and identify the most sensitive index. Logistic regression and Bhattacharyya distance were used to evaluate the index. Then, different index combinations were used to establish a support vector machine (SVM diagnosis model of malignant colon carcinoma. The accuracy of the model was checked. High accuracy was assumed to indicate the high specificity of the index. Through Logistic regression, three indexes, CEA, HSP60 and CA199, were screened out. Using Bhattacharyya distance, four indexes with the largest Bhattacharyya distance were screened out, including CEA, NSE, AFP, and CA724. The specificity of the combination of the above six indexes was higher than that of other combinations, so did the accuracy of the established SVM identification model. Using Logistic regression and Bhattacharyya distance for detection and establishing an SVM model based on different serum marker combinations can increase diagnostic accuracy, providing a theoretical basis for application of mathematical models in cancer diagnosis.

  19. Protein Coexpression Using FMDV 2A: Effect of “Linker” Residues

    Directory of Open Access Journals (Sweden)

    Ekaterina Minskaia

    2013-01-01

    Full Text Available Many biomedical applications absolutely require, or are substantially enhanced by, coexpression of multiple proteins from a single vector. Foot-and-mouth disease virus 2A (F2A and “2A-like” sequences (e.g., Thosea asigna virus 2A; T2A are used widely for this purpose since multiple proteins can be coexpressed by linking open reading frames (ORFs to form a single cistron. The activity of F2A “cleavage” may, however, be compromised by both the use of shorter versions of F2A and the sequences (derived from multiple-purpose cloning sites used to link F2A to the upstream protein. To characterise these effects, different lengths of F2A and T2A were inserted between green and cherry fluorescent proteins. Mutations were introduced in the linker region immediately upstream of both F2A- and T2A-based constructs and activities determined using both cell-free translation systems and transfected cells. In shorter versions of F2A, activity may be affected by both the C-terminal sequence of the protein upstream and, equally strikingly, the residues immediately upstream introduced during cloning. Mutations significantly improved activity for shorter versions of F2A but could decrease activity in the case of T2A. These data will aid the design of cloning strategies for the co-expression of multiple proteins in biomedical/biotechnological applications.

  20. Intragenic suppressor of Osiaa23 revealed a conserved tryptophan residue crucial for protein-protein interactions.

    Directory of Open Access Journals (Sweden)

    Jun Ni

    Full Text Available The Auxin/Indole-3-Acetic Acid (Aux/IAA and Auxin Response Factor (ARF are two important families that play key roles in auxin signal transduction. Both of the families contain a similar carboxyl-terminal domain (Domain III/IV that facilitates interactions between these two families. In spite of the importance of protein-protein interactions among these transcription factors, the mechanisms involved in these interactions are largely unknown. In this study, we isolated six intragenic suppressors of an auxin insensitive mutant, Osiaa23. Among these suppressors, Osiaa23-R5 successfully rescued all the defects of the mutant. Sequence analysis revealed that an amino acid substitution occurred in the Tryptophan (W residue in Domain IV of Osiaa23. Yeast two-hybrid experiments showed that the mutation in Domain IV prevents the protein-protein interactions between Osiaa23 and OsARFs. Phylogenetic analysis revealed that the W residue is conserved in both OsIAAs and OsARFs. Next, we performed site-specific amino acid substitutions within Domain IV of OsARFs, and the conserved W in Domain IV was exchanged by Serine (S. The mutated OsARF(WSs can be released from the inhibition of Osiaa23 and maintain the transcriptional activities. Expression of OsARF(WSs in Osiaa23 mutant rescued different defects of the mutant. Our results suggest a previously unknown importance of Domain IV in both families and provide an indirect way to investigate functions of OsARFs.

  1. Effect of solvent on the structure of a protein (H3.1) with a coarse-grained model with knowledge-based interactions

    Science.gov (United States)

    Pandey, Ras; Farmer, Barry

    2013-03-01

    Quality of solvent plays a critical role in modulating the structure of a protein along with the temperature. Using a coarse-grained Monte Carlo simulation based on three knowledge-based contact potentials (MJ, BT, BFKV) we examine the structure and dynamics of a histone (H3.1). The empty lattice sites constitute the effective solvent medium in which the protein is embedded. Residue-solvent characteristic interaction is based on the hydropathy index while the residue-residue interaction is used from the knowledge-based contact matrices derived from ensembles of protein structures in the protein data bank. Large scale simulations are performed to analyze the structure of protein for a range of residue-solvent interaction strength, a measure of the solvent quality with each potential. Unlike the monotonic thermal response, the radius of gyration of the protein exhibits non-monotonic dependence of the solvent strength. Quantitative comparison of the structure and dynamics emerging from three knowledge-based potentials will be presented in this talk. This work is supported by Air Force Research Laboratory.

  2. Effectively identifying compound-protein interactions by learning from positive and unlabeled examples.

    Science.gov (United States)

    Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe

    2016-05-18

    Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and

  3. A RLS-SVM Aided Fusion Methodology for INS during GPS Outages.

    Science.gov (United States)

    Yao, Yiqing; Xu, Xiaosu

    2017-02-24

    In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics.

  4. A RLS-SVM Aided Fusion Methodology for INS during GPS Outages

    Directory of Open Access Journals (Sweden)

    Yiqing Yao

    2017-02-01

    Full Text Available In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS outages, a novel robust least squares support vector machine (LS-SVM-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS. The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics.

  5. SVM-Based System for Prediction of Epileptic Seizures from iEEG Signal

    Science.gov (United States)

    Cherkassky, Vladimir; Lee, Jieun; Veber, Brandon; Patterson, Edward E.; Brinkmann, Benjamin H.; Worrell, Gregory A.

    2017-01-01

    Objective This paper describes a data-analytic modeling approach for prediction of epileptic seizures from intracranial electroencephalogram (iEEG) recording of brain activity. Even though it is widely accepted that statistical characteristics of iEEG signal change prior to seizures, robust seizure prediction remains a challenging problem due to subject-specific nature of data-analytic modeling. Methods Our work emphasizes understanding of clinical considerations important for iEEG-based seizure prediction, and proper translation of these clinical considerations into data-analytic modeling assumptions. Several design choices during pre-processing and post-processing are considered and investigated for their effect on seizure prediction accuracy. Results Our empirical results show that the proposed SVM-based seizure prediction system can achieve robust prediction of preictal and interictal iEEG segments from dogs with epilepsy. The sensitivity is about 90–100%, and the false-positive rate is about 0–0.3 times per day. The results also suggest good prediction is subject-specific (dog or human), in agreement with earlier studies. Conclusion Good prediction performance is possible only if the training data contain sufficiently many seizure episodes, i.e., at least 5–7 seizures. Significance The proposed system uses subject-specific modeling and unbalanced training data. This system also utilizes three different time scales during training and testing stages. PMID:27362758

  6. Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples

    Directory of Open Access Journals (Sweden)

    Hong Men

    2018-01-01

    Full Text Available Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA and Partial Least Squares (PLS. Support Vector Machine (SVM, Random Forest (RF, and Extreme Learning Machine (ELM were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33–100%, and ELM, with an accuracy rate of 98.01–100%. For level assessment, the R2 related to the training set was above 0.97 and the R2 related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016–0.3494, lower than the error of 0.5–1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.

  7. Critical lysine residues of Klf4 required for protein stabilization and degradation

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Key-Hwan; Kim, So-Ra; Ramakrishna, Suresh; Baek, Kwang-Hyun, E-mail: baek@cha.ac.kr

    2014-01-24

    Highlights: • Klf4 undergoes the 26S proteasomal degradation by ubiquitination on its multiple lysine residues. • Essential Klf4 ubiquitination sites are accumulated between 190–263 amino acids. • A mutation of lysine at 232 on Klf4 elongates protein turnover. • Klf4 mutants dramatically suppress p53 expression both under normal and UV irradiated conditions. - Abstract: The transcription factor, Krüppel-like factor 4 (Klf4) plays a crucial role in generating induced pluripotent stem cells (iPSCs). As the ubiquitination and degradation of the Klf4 protein have been suggested to play an important role in its function, the identification of specific lysine sites that are responsible for protein degradation is of prime interest to improve protein stability and function. However, the molecular mechanism regulating proteasomal degradation of the Klf4 is poorly understood. In this study, both the analysis of Klf4 ubiquitination sites using several Klf4 deletion fragments and bioinformatics predictions showed that the lysine sites which are signaling for Klf4 protein degradation lie in its N-terminal domain (aa 1–296). The results also showed that Lys32, 52, 232, and 252 of Klf4 are responsible for the proteolysis of the Klf4 protein. These results suggest that Klf4 undergoes proteasomal degradation and that these lysine residues are critical for Klf4 ubiquitination.

  8. Identification of residue pairing in interacting β-strands from a predicted residue contact map.

    Science.gov (United States)

    Mao, Wenzhi; Wang, Tong; Zhang, Wenxuan; Gong, Haipeng

    2018-04-19

    Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map. Our algorithm RDb 2 C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb 2 C. Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins. All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C .

  9. Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony.

    Science.gov (United States)

    Gao, Lingyun; Ye, Mingquan; Wu, Changrong

    2017-11-29

    Intelligent optimization algorithms have advantages in dealing with complex nonlinear problems accompanied by good flexibility and adaptability. In this paper, the FCBF (Fast Correlation-Based Feature selection) method is used to filter irrelevant and redundant features in order to improve the quality of cancer classification. Then, we perform classification based on SVM (Support Vector Machine) optimized by PSO (Particle Swarm Optimization) combined with ABC (Artificial Bee Colony) approaches, which is represented as PA-SVM. The proposed PA-SVM method is applied to nine cancer datasets, including five datasets of outcome prediction and a protein dataset of ovarian cancer. By comparison with other classification methods, the results demonstrate the effectiveness and the robustness of the proposed PA-SVM method in handling various types of data for cancer classification.

  10. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    Directory of Open Access Journals (Sweden)

    Huilin Wang

    Full Text Available X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM. Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I. Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II, which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization

  11. Comparison Between Wind Power Prediction Models Based on Wavelet Decomposition with Least-Squares Support Vector Machine (LS-SVM and Artificial Neural Network (ANN

    Directory of Open Access Journals (Sweden)

    Maria Grazia De Giorgi

    2014-08-01

    Full Text Available A high penetration of wind energy into the electricity market requires a parallel development of efficient wind power forecasting models. Different hybrid forecasting methods were applied to wind power prediction, using historical data and numerical weather predictions (NWP. A comparative study was carried out for the prediction of the power production of a wind farm located in complex terrain. The performances of Least-Squares Support Vector Machine (LS-SVM with Wavelet Decomposition (WD were evaluated at different time horizons and compared to hybrid Artificial Neural Network (ANN-based methods. It is acknowledged that hybrid methods based on LS-SVM with WD mostly outperform other methods. A decomposition of the commonly known root mean square error was beneficial for a better understanding of the origin of the differences between prediction and measurement and to compare the accuracy of the different models. A sensitivity analysis was also carried out in order to underline the impact that each input had in the network training process for ANN. In the case of ANN with the WD technique, the sensitivity analysis was repeated on each component obtained by the decomposition.

  12. Reciprocally coupled residues crucial for protein kinase Pak2 activity calculated by statistical coupling analysis.

    Directory of Open Access Journals (Sweden)

    Yuan-Hao Hsu

    2010-03-01

    Full Text Available Regulation of Pak2 activity involves at least two mechanisms: (i phosphorylation of the conserved Thr(402 in the activation loop and (ii interaction of the autoinhibitory domain (AID with the catalytic domain. We collected 482 human protein kinase sequences from the kinome database and globally mapped the evolutionary interactions of the residues in the catalytic domain with Thr(402 by sequence-based statistical coupling analysis (SCA. Perturbation of Thr(402 (34.6% suggests a communication pathway between Thr(402 in the activation loop, and Phe(387 (DeltaDeltaE(387F,402T = 2.80 in the magnesium positioning loop, Trp(427 (DeltaDeltaE(427W,402T = 3.12 in the F-helix, and Val(404 (DeltaDeltaE(404V,402T = 4.43 and Gly(405 (DeltaDeltaE(405G,402T = 2.95 in the peptide positioning loop. When compared to the cAMP-dependent protein kinase (PKA and Src, the perturbation pattern of threonine phosphorylation in the activation loop of Pak2 is similar to that of PKA, and different from the tyrosine phosphorylation pattern of Src. Reciprocal coupling analysis by SCA showed the residues perturbed by Thr(402 and the reciprocal coupling pairs formed a network centered at Trp(427 in the F-helix. Nine pairs of reciprocal coupling residues crucial for enzymatic activity and structural stabilization were identified. Pak2, PKA and Src share four pairs. Reciprocal coupling residues exposed to the solvent line up as an activation groove. This is the inhibitor (PKI binding region in PKA and the activation groove for Pak2. This indicates these evolutionary conserved residues are crucial for the catalytic activity of PKA and Pak2.

  13. Elicitin-induced distal systemic resistance in plants is mediated through the protein-protein interactions influenced by selected lysine residues

    Directory of Open Access Journals (Sweden)

    Hana eUhlíková

    2016-02-01

    Full Text Available Elicitins are a family of small proteins with sterol-binding activity that are secreted by Phytophthora and Pythium spp. classified as oomycete PAMPs. Although alfa- and beta-elicitins bind with the same affinity to one high affinity binding site on the plasma membrane, beta-elicitins (possessing 6-7 lysine residues are generally 50- to 100-fold more active at inducing distal HR and systemic resistance than the alfa-isoforms (with only 1-3 lysine residues.To examine the role of lysine residues in elicitin biological activity, we employed site-directed mutagenesis to prepare a series of beta-elicitin cryptogein variants with mutations on specific lysine residues. In contrast to direct infiltration of protein into leaves, application to the stem revealed a rough correlation between protein’s charge and biological activity, resulting in protection against Phytophthora parasitica. A detailed analysis of proteins’ movement in plants showed no substantial differences in distribution through phloem indicating differences in consequent apoplastic or symplastic transport. In this process, an important role of homodimer formation together with the ability to form a heterodimer with potential partner represented by endogenous plants LTPs is suggested. Our work demonstrates a key role of selected lysine residues in these interactions and stresses the importance of processes preceding elicitin recognition responsible for induction of distal systemic resistance.

  14. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    Directory of Open Access Journals (Sweden)

    Chi-Hua Tung

    2016-01-01

    Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  15. LS-SVM: uma nova ferramenta quimiométrica para regressão multivariada. Comparação de modelos de regressão LS-SVM e PLS na quantificação de adulterantes em leite em pó empregando NIR LS-SVM: a new chemometric tool for multivariate regression. Comparison of LS-SVM and pls regression for determination of common adulterants in powdered milk by nir spectroscopy

    Directory of Open Access Journals (Sweden)

    Marco F. Ferrão

    2007-08-01

    Full Text Available Least-squares support vector machines (LS-SVM were used as an alternative multivariate calibration method for the simultaneous quantification of some common adulterants found in powdered milk samples, using near-infrared spectroscopy. Excellent models were built using LS-SVM for determining R², RMSECV and RMSEP values. LS-SVMs show superior performance for quantifying starch, whey and sucrose in powdered milk samples in relation to PLSR. This study shows that it is possible to determine precisely the amount of one and two common adulterants simultaneously in powdered milk samples using LS-SVM and NIR spectra.

  16. Oligomeric protein structure networks: insights into protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Brinda KV

    2005-12-01

    Full Text Available Abstract Background Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues with special emphasis to protein interfaces. Results A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb. A few predictions of interface hot

  17. Methods for discriminating gas-liquid two phase flow patterns based on gray neural networks and SVM

    International Nuclear Information System (INIS)

    Li Jingjing; Zhou Tao; Duan Jun; Zhang Lei

    2013-01-01

    Background: The flow patterns of two phase flow will directly influence the heat transfer and mass transfer of the flow. Purpose: By wavelet analysis of the pressure drop experimental data, the wavelet coefficients of different frequency can be obtained. Methods: Get the wavelet energy and then train them in the model of BP neural network to distinguish the flow patterns. Introduced the implant gray neural networks model and use it for the two phase flow for the first time. At the same time, set up the method of training the pressure data and wavelet energy data in the support vector machine. Results: Through treatment of the gray layer, the result of the neural network is more accuracy. It can obviously reduce the effect of data marginalization. The accuracy of the pressure drop Lib-SVM method is 95.2%. Conclusions: The results show that these three methods can make a distinction among the different flow patterns and the Lib-SVM method gets the best result, then the gray neural networks, and at last the BP neural networks. (authors)

  18. PCA criterion for SVM (MLP) classifier for flavivirus biomarker from salivary SERS spectra at febrile stage.

    Science.gov (United States)

    Radzol, A R M; Lee, Khuan Y; Mansor, W; Omar, I S

    2016-08-01

    Non-structural protein (NS1) has been conceded as one of the biomarkers for flavivirus that causes diseases with life threatening consequences. NS1 is an antigen that allows detection of the illness at febrile stage, mostly from blood samples currently. Our work here intends to define an optimum model for PCA-SVM with MLP kernel for classification of flavivirus biomarker, NS1 molecule, from SERS spectra of saliva, which to the best of our knowledge has never been explored. Since performance of the model depends on the PCA criterion and MLP parameters, both are examined in tandem. Input vector to classifier determined by each PCA criterion is subjected to brute force tuning of MLP parameters for entirety. Its performance is also compared to our previous works where a Linear and RBF kernel are used. It is found that the best PCA-SVM (MLP) model can be defined by 5 PCs from Cattel's Scree test for PCA, together with P1 and P2 values of 0.1 and -0.2 respectively, with a classification performance of [96.9%, 93.8%, 100.0%].

  19. A fast learning method for large scale and multi-class samples of SVM

    Science.gov (United States)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  20. Application of SVM on satellite images to detect hotspots in Jharia coal field region of India

    Energy Technology Data Exchange (ETDEWEB)

    Gautam, R.S.; Singh, D.; Mittal, A.; Sajin, P. [Indian Institute for Technology, Roorkee (India)

    2008-07-01

    The present paper deals with the application of Support Vector Machine (SVM) and image analysis techniques on NOAA/AVHRR satellite image to detect hotspots on the Jharia coal field region of India. One of the major advantages of using these satellite data is that the data are free with very good temporal resolution; while, one drawback is that these have low spatial resolution (i.e., approximately 1.1 km at nadir). Therefore, it is important to do research by applying some efficient optimization techniques along with the image analysis techniques to rectify these drawbacks and use satellite images for efficient hotspot detection and monitoring. For this purpose, SVM and multi-threshold techniques are explored for hotspot detection. The multi-threshold algorithm is developed to remove the cloud coverage from the land coverage. This algorithm also highlights the hotspots or fire spots in the suspected regions. SVM has the advantage over multi-thresholding technique that it can learn patterns from the examples and therefore is used to optimize the performance by removing the false points which are highlighted in the threshold technique. Both approaches can be used separately or in combination depending on the size of the image. The RBF (Radial Basis Function) kernel is used in training of three sets of inputs: brightness temperature of channel 3, Normalized Difference Vegetation Index (NDVI) and Global Environment Monitoring Index (GEMI), respectively. This makes a classified image in the output that highlights the hotspot and non-hotspot pixels. The performance of the SVM is also compared with the performance obtained from the neural networks and SVM appears to detect hotspots more accurately (greater than 91% classification accuracy) with lesser false alarm rate. The results obtained are found to be in good agreement with the ground based observations of the hotspots.

  1. Application of ANFIS and SVM Systems in Order to Estimate Monthly Reference Crop Evapotranspiration in the Northwest of Iran

    Directory of Open Access Journals (Sweden)

    F. Ahmadi

    2016-10-01

    Full Text Available Introduction Crop evapotranspiration modeling process mainly performs with empirical methods, aerodynamic and energy balance. In these methods, the evapotranspiration is calculated based on the average values of meteorological parameters at different time steps. The linear models didn’t have a good performance in this field due to high variability of evapotranspiration and the researchers have turned to the use of nonlinear and intelligent models. For accurate estimation of this hydrologic variable, it should be spending much time and money to measure many data (19. Materials and Methods Recently the new hybrid methods have been developed by combining some of methods such as artificial neural networks, fuzzy logic and evolutionary computation, that called Soft Computing and Intelligent Systems. These soft techniques are used in various fields of engineering. A fuzzy neurosis is a hybrid system that incorporates the decision ability of fuzzy logic with the computational ability of neural network, which provides a high capability for modeling and estimating. Basically, the Fuzzy part is used to classify the input data set and determines the degree of membership (that each number can be laying between 0 and 1 and decisions for the next activity made based on a set of rules and move to the next stage. Adaptive Neuro-Fuzzy Inference Systems (ANFIS includes some parts of a typical fuzzy expert system which the calculations at each step is performed by the hidden layer neurons and the learning ability of the neural network has been created to increase the system information (9. SVM is a one of supervised learning methods which used for classification and regression affairs. This method was developed by Vapink (15 based on statistical learning theory. The SVM is a method for binary classification in an arbitrary characteristic space, so it is suitable for prediction problems (12. The SVM is originally a two-class Classifier that separates the classes

  2. Prediction of protein interaction hot spots using rough set-based multiple criteria linear programming.

    Science.gov (United States)

    Chen, Ruoying; Zhang, Zhiwang; Wu, Di; Zhang, Peng; Zhang, Xinyang; Wang, Yong; Shi, Yong

    2011-01-21

    Protein-protein interactions are fundamentally important in many biological processes and it is in pressing need to understand the principles of protein-protein interactions. Mutagenesis studies have found that only a small fraction of surface residues, known as hot spots, are responsible for the physical binding in protein complexes. However, revealing hot spots by mutagenesis experiments are usually time consuming and expensive. In order to complement the experimental efforts, we propose a new computational approach in this paper to predict hot spots. Our method, Rough Set-based Multiple Criteria Linear Programming (RS-MCLP), integrates rough sets theory and multiple criteria linear programming to choose dominant features and computationally predict hot spots. Our approach is benchmarked by a dataset of 904 alanine-mutated residues and the results show that our RS-MCLP method performs better than other methods, e.g., MCLP, Decision Tree, Bayes Net, and the existing HotSprint database. In addition, we reveal several biological insights based on our analysis. We find that four features (the change of accessible surface area, percentage of the change of accessible surface area, size of a residue, and atomic contacts) are critical in predicting hot spots. Furthermore, we find that three residues (Tyr, Trp, and Phe) are abundant in hot spots through analyzing the distribution of amino acids. Copyright © 2010 Elsevier Ltd. All rights reserved.

  3. Characterization of Proteins in Filtrate from Biodegradation of Crop Residue

    Science.gov (United States)

    Horton, Wileatha; Trotman, A. A.

    1997-01-01

    Biodegradation of plant biomass is a feasible path for transformation of crop residue and recycling of nutrients for crop growth. The need to model the effects of factors associated with recycling of plant biomass resulting from hydroponic sweet potato production has led to investigation of natural soil isolates with the capacity for starch hydrolysis. This study sought to use nondenaturing gel electrophoresis to characterize the proteins present in filtered effluent from bioreactors seeded with starch hydrolyzing bacterial culture used in the biodegradation of senesced sweet potato biomass. The study determined the relative molecular weight of proteins in sampled effluent and the protein banding pattern was characterized. The protein profiles of effluent were similar for samples taken from independent runs under similar conditions of starch hydrolysis. The method can be used as a quality control tool for confirmation of starch hydrolysis of crop biomass. In addition, this method will allow monitoring for presence of contaminants within the system-protein profiles indicative of new enzymes in the bioreactors.

  4. The system evaluation for report writing skills of summary by HGA-SVM with Ontology: Medical case study in problem based learning

    Science.gov (United States)

    Yenaeng, Sasikanchana; Saelee, Somkid; Samai, Wirachai

    2018-01-01

    The system evaluation for report writing skills of summary by Hybrid Genetic Algorithm-Support Vector Machines (HGA-SVM) with Ontology of Medical Case Study in Problem Based Learning (PBL) is a system was developed as a guideline of scoring for the facilitators or medical teacher. The essay answers come from medical student of medical education courses in the nervous system motion and Behavior I and II subject, a third year medical student 20 groups of 9-10 people, the Faculty of Medicine in Prince of Songkla University (PSU). The audit committee have the opinion that the ratings of individual facilitators are inadequate, this system to solve such problems. In this paper proposes a development of the system evaluation for report writing skills of summary by HGA-SVM with Ontology of medical case study in PBL which the mean scores of machine learning score and humans (facilitators) score were not different at the significantly level .05 all 3 essay parts contain problem essay part, hypothesis essay part and learning objective essay part. The result show that, the average score all 3 essay parts that were not significantly different from the rate at the level of significance .05.

  5. Residue Geometry Networks: A Rigidity-Based Approach to the Amino Acid Network and Evolutionary Rate Analysis

    Science.gov (United States)

    Fokas, Alexander S.; Cole, Daniel J.; Ahnert, Sebastian E.; Chin, Alex W.

    2016-01-01

    Amino acid networks (AANs) abstract the protein structure by recording the amino acid contacts and can provide insight into protein function. Herein, we describe a novel AAN construction technique that employs the rigidity analysis tool, FIRST, to build the AAN, which we refer to as the residue geometry network (RGN). We show that this new construction can be combined with network theory methods to include the effects of allowed conformal motions and local chemical environments. Importantly, this is done without costly molecular dynamics simulations required by other AAN-related methods, which allows us to analyse large proteins and/or data sets. We have calculated the centrality of the residues belonging to 795 proteins. The results display a strong, negative correlation between residue centrality and the evolutionary rate. Furthermore, among residues with high closeness, those with low degree were particularly strongly conserved. Random walk simulations using the RGN were also successful in identifying allosteric residues in proteins involved in GPCR signalling. The dynamic function of these residues largely remain hidden in the traditional distance-cutoff construction technique. Despite being constructed from only the crystal structure, the results in this paper suggests that the RGN can identify residues that fulfil a dynamical function. PMID:27623708

  6. Cry1Ab protein from Bt transgenic rice does not residue in rhizosphere soil

    International Nuclear Information System (INIS)

    Wang Haiyan; Ye Qingfu; Wang Wei; Wu Licheng; Wu Weixiang

    2006-01-01

    Expression of Cry1Ab protein in Bt transgenic rice (KMD) and its residue in the rhizosphere soil during the whole growth in field, as well as degradation of the protein from KMD straw in five soils under laboratory incubation were studied. The residue of Cry1Ab protein in KMD rhizosphere soil was undetectable (below the limit of 0.5 ng/g air-dried soil). The Cry1Ab protein contents in the shoot and root of KMD were 3.23-8.22 and 0.68-0.89 μg/g (fresh weight), respectively. The half-lives of the Cry1Ab protein in the soils amended with KMD straw (4%, w/w) ranged from 11.5 to 34.3 d. The residence time of the protein varied significantly in a Fluvio-marine yellow loamy soil amended with KMD straw at the rate of 3, 4 and 7%, with half-lives of 9.9, 13.8 and 18 d, respectively. In addition, an extraction method for Cry1Ab protein in soil was developed, with extraction efficiencies of 46.4-82.3%. - Cry1Ab protein was not detected in the rhizosphere soil of field-grown Bt transgenic rice

  7. Kernel based machine learning algorithm for the efficient prediction of type III polyketide synthase family of proteins

    Directory of Open Access Journals (Sweden)

    Mallika V

    2010-03-01

    Full Text Available Type III Polyketide synthases (PKS are family of proteins considered to have significant role in the biosynthesis of various polyketides in plants, fungi and bacteria. As these proteins show positive effects to human health, more researches are going on regarding this particular protein. Developing a tool to identify the probability of sequence, being a type III polyketide synthase will minimize the time consumption and manpower efforts. In this approach, we have designed and implemented PKSIIIpred, a high performance prediction server for type III PKS where the classifier is Support Vector Machine (SVM. Based on the limited training dataset, the tool efficiently predicts the type III PKS superfamily of proteins with high sensitivity and specificity. PKSIIIpred is available at http://type3pks.in/prediction/. We expect that this tool may serve as a useful resource for type III PKS researchers. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

  8. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    Science.gov (United States)

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  9. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

    Directory of Open Access Journals (Sweden)

    Daniel Y Little

    Full Text Available The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1 removes a significant, non-coevolutionary bias and 2 accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

  10. SVM-Maj: a majorization approach to linear support vector machines with different hinge errors

    NARCIS (Netherlands)

    P.J.F. Groenen (Patrick); G.I. Nalbantov (Georgi); J.C. Bioch (Cor)

    2007-01-01

    textabstractSupport vector machines (SVM) are becoming increasingly popular for the prediction of a binary dependent variable. SVMs perform very well with respect to competing techniques. Often, the solution of an SVM is obtained by switching to the dual. In this paper, we stick to the primal

  11. Determination of protein global folds using backbone residual dipolar coupling and long-range NOE restraints

    International Nuclear Information System (INIS)

    Giesen, Alexander W.; Homans, Steve W.; Brown, Jonathan Miles

    2003-01-01

    We report the determination of the global fold of human ubiquitin using protein backbone NMR residual dipolar coupling and long-range nuclear Overhauser effect (NOE) data as conformational restraints. Specifically, by use of a maximum of three backbone residual dipolar couplings per residue (N i -H N i , N i -C' i-1 , H N i - C' i-1 ) in two tensor frames and only backbone H N -H N NOEs, a global fold of ubiquitin can be derived with a backbone root-mean-square deviation of 1.4 A with respect to the crystal structure. This degree of accuracy is more than adequate for use in databases of structural motifs, and suggests a general approach for the determination of protein global folds using conformational restraints derived only from backbone atoms

  12. The myeloperoxidase-derived oxidant hypothiocyanous acid inhibits protein tyrosine phosphatases via oxidation of key cysteine residues

    DEFF Research Database (Denmark)

    Cook, Naomi L.; Moeke, Cassidy H.; Fantoni, Luca I.

    2016-01-01

    Phosphorylation of protein tyrosine residues is critical to cellular processes, and is regulated by kinases and phosphatases (PTPs). PTPs contain a redox-sensitive active site Cys residue, which is readily oxidized. Myeloperoxidase, released from activated leukocytes, catalyzes thiocyanate ion (SCN...

  13. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    Science.gov (United States)

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  14. The conserved basic residues and the charged amino acid residues at the α-helix of the zinc finger motif regulate the nuclear transport activity of triple C2H2 zinc finger proteins

    Science.gov (United States)

    Lin, Chih-Ying

    2018-01-01

    Zinc finger (ZF) motifs on proteins are frequently recognized as a structure for DNA binding. Accumulated reports indicate that ZF motifs contain nuclear localization signal (NLS) to facilitate the transport of ZF proteins into nucleus. We investigated the critical factors that facilitate the nuclear transport of triple C2H2 ZF proteins. Three conserved basic residues (hot spots) were identified among the ZF sequences of triple C2H2 ZF proteins that reportedly have NLS function. Additional basic residues can be found on the α-helix of the ZFs. Using the ZF domain (ZFD) of Egr-1 as a template, various mutants were constructed and expressed in cells. The nuclear transport activity of various mutants was estimated by analyzing the proportion of protein localized in the nucleus. Mutation at any hot spot of the Egr-1 ZFs reduced the nuclear transport activity. Changes of the basic residues at the α-helical region of the second ZF (ZF2) of the Egr-1 ZFD abolished the NLS activity. However, this activity can be restored by substituting the acidic residues at the homologous positions of ZF1 or ZF3 with basic residues. The restored activity dropped again when the hot spots at ZF1 or the basic residues in the α-helix of ZF3 were mutated. The variations in nuclear transport activity are linked directly to the binding activity of the ZF proteins with importins. This study was extended to other triple C2H2 ZF proteins. SP1 and KLF families, similar to Egr-1, have charged amino acid residues at the second (α2) and the third (α3) positions of the α-helix. Replacing the amino acids at α2 and α3 with acidic residues reduced the NLS activity of the SP1 and KLF6 ZFD. The reduced activity can be restored by substituting the α3 with histidine at any SP1 and KLF6 ZFD. The results show again the interchangeable role of ZFs and charge residues in the α-helix in regulating the NLS activity of triple C2H2 ZF proteins. PMID:29381770

  15. Static Voltage Stability Analysis by Using SVM and Neural Network

    Directory of Open Access Journals (Sweden)

    Mehdi Hajian

    2013-01-01

    Full Text Available Voltage stability is an important problem in power system networks. In this paper, in terms of static voltage stability, and application of Neural Networks (NN and Supported Vector Machine (SVM for estimating of voltage stability margin (VSM and predicting of voltage collapse has been investigated. This paper considers voltage stability in power system in two parts. The first part calculates static voltage stability margin by Radial Basis Function Neural Network (RBFNN. The advantage of the used method is high accuracy in online detecting the VSM. Whereas the second one, voltage collapse analysis of power system is performed by Probabilistic Neural Network (PNN and SVM. The obtained results in this paper indicate, that time and number of training samples of SVM, are less than NN. In this paper, a new model of training samples for detection system, using the normal distribution load curve at each load feeder, has been used. Voltage stability analysis is estimated by well-know L and VSM indexes. To demonstrate the validity of the proposed methods, IEEE 14 bus grid and the actual network of Yazd Province are used.

  16. A Fast SVM-Based Tongue's Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis

    Science.gov (United States)

    Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori

    2017-01-01

    In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds. PMID:29065640

  17. A Fast SVM-Based Tongue's Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis.

    Science.gov (United States)

    Kamarudin, Nur Diyana; Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori

    2017-01-01

    In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k -means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k -means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.

  18. An SVM-Based Classifier for Estimating the State of Various Rotating Components in Agro-Industrial Machinery with a Vibration Signal Acquired from a Single Point on the Machine Chassis

    Directory of Open Access Journals (Sweden)

    Ruben Ruiz-Gonzalez

    2014-11-01

    Full Text Available The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.

  19. Dates fruits classification using SVM

    Science.gov (United States)

    Alzu'bi, Reem; Anushya, A.; Hamed, Ebtisam; Al Sha'ar, Eng. Abdelnour; Vincy, B. S. Angela

    2018-04-01

    In this paper, we used SVM in classifying various types of dates using their images. Dates have interesting different characteristics that can be valuable to distinguish and determine a particular date type. These characteristics include shape, texture, and color. A system that achieves 100% accuracy was built to classify the dates which can be eatable and cannot be eatable. The built system helps the food industry and customer in classifying dates depending on specific quality measures giving best performance with specific type of dates.

  20. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features.

    Science.gov (United States)

    Zhu, Xiaolei; Mitchell, Julie C

    2011-09-01

    Hot spots constitute a small fraction of protein-protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non-hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over-fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods. Copyright © 2011 Wiley-Liss, Inc.

  1. Rancang Bangun Inverter SVM Berbasis Mikrokontroler PIC 18F4431 Untuk Sistem VSD

    OpenAIRE

    Tarmizi; Muyassar

    2013-01-01

    Sebuah sistem pengaturan kecepatan motor disebut dengan sistem Variable Speed Drives (VSD). Sistem VSD motor induksi menggunakan inverter untuk mengatur frekuensi suplai motor. Untuk mendapatkan frekuensi suplai motor yang mendekati sinusoidal, inveter perlu di switching dengan metode tertentu. Pada penelitian ini, switching inverter 3 fasa menggunakan metode SVM (Space Vector Modulation) yang dikontrol oleh Mikrokontroler PIC18F4431. Sebelum dilakukan ekperimen, inverter SVM ini lakukan si...

  2. An IPSO-SVM algorithm for security state prediction of mine production logistics system

    Science.gov (United States)

    Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang

    2017-06-01

    A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.

  3. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM

    Directory of Open Access Journals (Sweden)

    Chenchen Huang

    2014-01-01

    Full Text Available Feature extraction is a very important part in speech emotion recognition, and in allusion to feature extraction in speech emotion recognition problems, this paper proposed a new method of feature extraction, using DBNs in DNN to extract emotional features in speech signal automatically. By training a 5 layers depth DBNs, to extract speech emotion feature and incorporate multiple consecutive frames to form a high dimensional feature. The features after training in DBNs were the input of nonlinear SVM classifier, and finally speech emotion recognition multiple classifier system was achieved. The speech emotion recognition rate of the system reached 86.5%, which was 7% higher than the original method.

  4. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds

    Directory of Open Access Journals (Sweden)

    Felix Simkovic

    2016-07-01

    Full Text Available For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based structure prediction. Such models can be used in structure solution by molecular replacement (MR where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions (`decoys', is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.

  5. Application of SVM methods for mid-term load forecasting

    Directory of Open Access Journals (Sweden)

    Božić Miloš

    2011-01-01

    Full Text Available This paper presents an approach for the medium-term load forecasting using Support Vector Machines (SVMs. The proposed SVM model was employed to predict the maximum daily load demand for the period of a month. Analyses of available data were performed and the most important features for the construction of SVM model are selected. It was shown that the size and the structure of the training set may significantly affect the accuracy of predictions. The presented model was tested by applying it on real-life load data obtained from distribution company 'ED Jugoistok' for the territory of city Niš and its surroundings. Experimental results show that the proposed approach gives acceptable results for the entire period of prediction, which are in range with other solutions in this area.

  6. CONSRANK: a server for the analysis, comparison and ranking of docking models based on inter-residue contacts

    KAUST Repository

    Chermak, Edrisse

    2014-12-21

    Summary: Herein, we present CONSRANK, a web tool for analyzing, comparing and ranking protein–protein and protein–nucleic acid docking models, based on the conservation of inter-residue contacts and its visualization in 2D and 3D interactive contact maps.

  7. CONSRANK: a server for the analysis, comparison and ranking of docking models based on inter-residue contacts

    KAUST Repository

    Chermak, Edrisse; Petta, A.; Serra, L.; Vangone, A.; Scarano, V.; Cavallo, Luigi; Oliva, R.

    2014-01-01

    Summary: Herein, we present CONSRANK, a web tool for analyzing, comparing and ranking protein–protein and protein–nucleic acid docking models, based on the conservation of inter-residue contacts and its visualization in 2D and 3D interactive contact maps.

  8. A rapid, ensemble and free energy based method for engineering protein stabilities.

    Science.gov (United States)

    Naganathan, Athi N

    2013-05-02

    Engineering the conformational stabilities of proteins through mutations has immense potential in biotechnological applications. It is, however, an inherently challenging problem given the weak noncovalent nature of the stabilizing interactions. In this regard, we present here a robust and fast strategy to engineer protein stabilities through mutations involving charged residues using a structure-based statistical mechanical model that accounts for the ensemble nature of folding. We validate the method by predicting the absolute changes in stability for 138 experimental mutations from 16 different proteins and enzymes with a correlation of 0.65 and importantly with a success rate of 81%. Multiple point mutants are predicted with a higher success rate (90%) that is validated further by comparing meosphile-thermophile protein pairs. In parallel, we devise a methodology to rapidly engineer mutations in silico which we benchmark against experimental mutations of ubiquitin (correlation of 0.95) and check for its feasibility on a larger therapeutic protein DNase I. We expect the method to be of importance as a first and rapid step to screen for protein mutants with specific stability in the biotechnology industry, in the construction of stability maps at the residue level (i.e., hot spots), and as a robust tool to probe for mutations that enhance the stability of protein-based drugs.

  9. How Does Alkali Aid Protein Extraction in Green Tea Leaf Residue: A Basis for Integrated Biorefinery of Leaves

    Science.gov (United States)

    Zhang, Chen; Sanders, Johan P. M.; Xiao, Ting T.; Bruins, Marieke E.

    2015-01-01

    Leaf protein can be obtained cost-efficiently by alkaline extraction, but overuse of chemicals and low quality of (denatured) protein limits its application. The research objective was to investigate how alkali aids protein extraction of green tea leaf residue, and use these results for further improvements in alkaline protein biorefinery. Protein extraction yield was studied for correlation to morphology of leaf tissue structure, protein solubility and hydrolysis degree, and yields of non-protein components obtained at various conditions. Alkaline protein extraction was not facilitated by increased solubility or hydrolysis of protein, but positively correlated to leaf tissue disruption. HG pectin, RGII pectin, and organic acids were extracted before protein extraction, which was followed by the extraction of cellulose and hemi-cellulose. RGI pectin and lignin were both linear to protein yield. The yields of these two components were 80% and 25% respectively when 95% protein was extracted, which indicated that RGI pectin is more likely to be the key limitation to leaf protein extraction. An integrated biorefinery was designed based on these results. PMID:26200774

  10. How Does Alkali Aid Protein Extraction in Green Tea Leaf Residue: A Basis for Integrated Biorefinery of Leaves.

    Directory of Open Access Journals (Sweden)

    Chen Zhang

    Full Text Available Leaf protein can be obtained cost-efficiently by alkaline extraction, but overuse of chemicals and low quality of (denatured protein limits its application. The research objective was to investigate how alkali aids protein extraction of green tea leaf residue, and use these results for further improvements in alkaline protein biorefinery. Protein extraction yield was studied for correlation to morphology of leaf tissue structure, protein solubility and hydrolysis degree, and yields of non-protein components obtained at various conditions. Alkaline protein extraction was not facilitated by increased solubility or hydrolysis of protein, but positively correlated to leaf tissue disruption. HG pectin, RGII pectin, and organic acids were extracted before protein extraction, which was followed by the extraction of cellulose and hemi-cellulose. RGI pectin and lignin were both linear to protein yield. The yields of these two components were 80% and 25% respectively when 95% protein was extracted, which indicated that RGI pectin is more likely to be the key limitation to leaf protein extraction. An integrated biorefinery was designed based on these results.

  11. Density functional calculations of backbone 15N shielding tensors in beta-sheet and turn residues of protein G

    International Nuclear Information System (INIS)

    Cai Ling; Kosov, Daniel S.; Fushman, David

    2011-01-01

    We performed density functional calculations of backbone 15 N shielding tensors in the regions of beta-sheet and turns of protein G. The calculations were carried out for all twenty-four beta-sheet residues and eight beta-turn residues in the protein GB3 and the results were compared with the available experimental data from solid-state and solution NMR measurements. Together with the alpha-helix data, our calculations cover 39 out of the 55 residues (or 71%) in GB3. The applicability of several computational models developed previously (Cai et al. in J Biomol NMR 45:245–253, 2009) to compute 15 N shielding tensors of alpha-helical residues is assessed. We show that the proposed quantum chemical computational model is capable of predicting isotropic 15 N chemical shifts for an entire protein that are in good correlation with experimental data. However, the individual components of the predicted 15 N shielding tensor agree with experiment less well: the computed values show much larger spread than the experimental data, and there is a profound difference in the behavior of the tensor components for alpha-helix/turns and beta-sheet residues. We discuss possible reasons for this.

  12. Comparison of sensorless FOC and SVM-DTFC of PMSM for low-speed applications

    DEFF Research Database (Denmark)

    Basar, M. Sertug; Bech, Michael Møller; Andersen, Torben Ole

    2013-01-01

    This article presents the performance analysis of Field Oriented Control (FOC) and Space Vector Modulation (SVM) Direct Torque and Flux Control (DTFC) of a Non-Salient Permanent Magnet Synchronous Machine (PMSM) under sensorless control within low speed region. The high-frequency alternating...... with a commercially available PMSM machine. Both controllers show satisfactory sensorless performance. FOC provides smoother and more accurate response while SVM-DTFC has the advantage of faster control....

  13. Comparison of sensorless FOC and SVM-DTFC of PMSM for low-speed applications

    DEFF Research Database (Denmark)

    Basar, Mehmet Sertug

    2016-01-01

    This article presents the performance analysis of Field Oriented Control (FOC) and Space Vector Modulation (SVM) Direct Torque and Flux Control (DTFC) of a Non-Salient Permanent Magnet Synchronous Machine (PMSM) under sensorless control within low speed region. The high-frequency alternating...... with a commercially available PMSM machine. Both controllers show satisfactory sensorless performance. FOC provides smoother and more accurate response while SVM-DTFC has the advantage of faster control....

  14. A WFS-SVM Model for Soil Salinity Mapping in Keriya Oasis, Northwestern China Using Polarimetric Decomposition and Fully PolSAR Data

    Directory of Open Access Journals (Sweden)

    Ilyas Nurmemet

    2018-04-01

    Full Text Available Timely monitoring and mapping of salt-affected areas are essential for the prevention of land degradation and sustainable soil management in arid and semi-arid regions. The main objective of this study was to develop Synthetic Aperture Radar (SAR polarimetry techniques for improved soil salinity mapping in the Keriya Oasis in the Xinjiang Uyghur Autonomous Region (Xinjiang, China, where salinized soil appears to be a major threat to local agricultural productivity. Multiple polarimetric target decomposition, optimal feature subset selection (wrapper feature selector, WFS, and support vector machine (SVM algorithms were used for optimal soil salinization classification using quad-polarized PALSAR-2 data. A threefold exercise was conducted. First, 16 polarimetric decomposition methods were implemented and a wide range of polarimetric parameters and SAR discriminators were derived in order to mine hidden information in PolSAR data. Second, the optimal polarimetric feature subset that constitutes 19 polarimetric elements was selected adopting the WFS approach; optimum classification parameters were identified, and the optimal SVM classification model was obtained by employing a cross-validation method. Third, the WFS-SVM classification model was constructed, optimized, and implemented based on the optimal match of polarimetric features and optimum classification parameters. Soils with different salinization degrees (i.e., highly, moderately and slightly salinized soils were extracted. Finally, classification results were compared with the Wishart supervised classification and conventional SVM classification to examine the performance of the proposed method for salinity mapping. Detailed field investigations and ground data were used for the validation of the adopted methods. The overall accuracy and kappa coefficient of the proposed WFS-SVM model were 87.57% and 0.85, respectively that were much higher than those obtained by the Wishart supervised

  15. A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

    Science.gov (United States)

    Bernardes, Juliana S; Carbone, Alessandra; Zaverucha, Gerson

    2011-03-23

    Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.

  16. The efficacy of support vector machines (SVM)

    Indian Academy of Sciences (India)

    (2006) by applying an SVM statistical learning machine on the time-scale wavelet decomposition methods. We used the data of 108 events in central Japan with magnitude ranging from 3 to 7.4 recorded at KiK-net network stations, for a source–receiver distance of up to 150 km during the period 1998–2011. We applied a ...

  17. A Fast SVM-Based Tongue’s Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis

    Directory of Open Access Journals (Sweden)

    Nur Diyana Kamarudin

    2017-01-01

    Full Text Available In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye’s ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue’s multicolour classification based on a support vector machine (SVM whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black, deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.

  18. Do cysteine residues regulate transient receptor potential canonical type 6 (TRPC6) channel protein expression?

    DEFF Research Database (Denmark)

    Thilo, Florian; Liu, Ying; Krueger, Katharina

    2012-01-01

    The regulation of calcium influx through transient receptor potential canonical type 6 channel is mandatory for the activity of human monocytes. We submit the first evidence that cysteine residues of homocysteine or acetylcysteine affect TRPC6 expression in human monocytes. We observed that patie......The regulation of calcium influx through transient receptor potential canonical type 6 channel is mandatory for the activity of human monocytes. We submit the first evidence that cysteine residues of homocysteine or acetylcysteine affect TRPC6 expression in human monocytes. We observed...... that patients with chronic renal failure had significantly elevated homocysteine levels and TRPC6 mRNA expression levels in monocytes compared to control subjects. We further observed that administration of homocysteine or acetylcysteine significantly increased TRPC6 channel protein expression compared...... to control conditions. We therefore hypothesize that cysteine residues increase TRPC6 channel protein expression in humans....

  19. Computational Prediction of Hot Spot Residues

    Science.gov (United States)

    Morrow, John Kenneth; Zhang, Shuxing

    2013-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues. PMID:22316154

  20. Common voltage eliminating of SVM diode clamping three-level inverter connected to grid

    DEFF Research Database (Denmark)

    Guo, Yougui; Zeng, Ping; Zhu, Jieqiong

    2011-01-01

    A novel method of common voltage eliminating is put forward for SVM diode clamping three-level inverter connected to grid by calculation of common voltage of its various switching states. PLECS is used to model this three-level inverter connected to grid and good results are obtained. First...... analysis of common mode voltage for switching states of diode clamping 3-level inverter is given in detail. Second the common mode voltage eliminating control strategy of SVM is described for diode clamping three-level inverter. Third, PLECS is briefly introduced. Fourth, the modeling of diode clamping...... three-level inverter is presented with PLECS. Finally, a series of simulations are carried out. The simulation results tell us PLECS is a very powerful tool to real power circuits modeling. They have also verified that proposed common mode voltage eliminating control strategy of SVM is feasible...

  1. SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces

    Directory of Open Access Journals (Sweden)

    Schroeder Michael

    2006-03-01

    Full Text Available Abstract Background Currently there is a strong need for methods that help to obtain an accurate description of protein interfaces in order to be able to understand the principles that govern molecular recognition and protein function. Many of the recent efforts to computationally identify and characterize protein networks extract protein interaction information at atomic resolution from the PDB. However, they pay none or little attention to small protein ligands and solvent. They are key components and mediators of protein interactions and fundamental for a complete description of protein interfaces. Interactome profiling requires the development of computational tools to extract and analyze protein-protein, protein-ligand and detailed solvent interaction information from the PDB in an automatic and comparative fashion. Adding this information to the existing one on protein-protein interactions will allow us to better understand protein interaction networks and protein function. Description SCOWLP (Structural Characterization Of Water, Ligands and Proteins is a user-friendly and publicly accessible web-based relational database for detailed characterization and visualization of the PDB protein interfaces. The SCOWLP database includes proteins, peptidic-ligands and interface water molecules as descriptors of protein interfaces. It contains currently 74,907 protein interfaces and 2,093,976 residue-residue interactions formed by 60,664 structural units (protein domains and peptidic-ligands and their interacting solvent. The SCOWLP web-server allows detailed structural analysis and comparisons of protein interfaces at atomic level by text query of PDB codes and/or by navigating a SCOP-based tree. It includes a visualization tool to interactively display the interfaces and label interacting residues and interface solvent by atomic physicochemical properties. SCOWLP is automatically updated with every SCOP release. Conclusion SCOWLP enriches

  2. Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide-protein complexes.

    Science.gov (United States)

    Kondo, Jiro; Westhof, Eric

    2011-10-01

    Nucleotide bases are recognized by amino acid residues in a variety of DNA/RNA binding and nucleotide binding proteins. In this study, a total of 446 crystal structures of nucleotide-protein complexes are analyzed manually and pseudo pairs together with single and bifurcated hydrogen bonds observed between bases and amino acids are classified and annotated. Only 5 of the 20 usual amino acid residues, Asn, Gln, Asp, Glu and Arg, are able to orient in a coplanar fashion in order to form pseudo pairs with nucleotide bases through two hydrogen bonds. The peptide backbone can also form pseudo pairs with nucleotide bases and presents a strong bias for binding to the adenine base. The Watson-Crick side of the nucleotide bases is the major interaction edge participating in such pseudo pairs. Pseudo pairs between the Watson-Crick edge of guanine and Asp are frequently observed. The Hoogsteen edge of the purine bases is a good discriminatory element in recognition of nucleotide bases by protein side chains through the pseudo pairing: the Hoogsteen edge of adenine is recognized by various amino acids while the Hoogsteen edge of guanine is only recognized by Arg. The sugar edge is rarely recognized by either the side-chain or peptide backbone of amino acid residues.

  3. Classification of pseudo pairs between nucleotide bases and amino acids by analysis of nucleotide–protein complexes

    Science.gov (United States)

    Kondo, Jiro; Westhof, Eric

    2011-01-01

    Nucleotide bases are recognized by amino acid residues in a variety of DNA/RNA binding and nucleotide binding proteins. In this study, a total of 446 crystal structures of nucleotide–protein complexes are analyzed manually and pseudo pairs together with single and bifurcated hydrogen bonds observed between bases and amino acids are classified and annotated. Only 5 of the 20 usual amino acid residues, Asn, Gln, Asp, Glu and Arg, are able to orient in a coplanar fashion in order to form pseudo pairs with nucleotide bases through two hydrogen bonds. The peptide backbone can also form pseudo pairs with nucleotide bases and presents a strong bias for binding to the adenine base. The Watson–Crick side of the nucleotide bases is the major interaction edge participating in such pseudo pairs. Pseudo pairs between the Watson–Crick edge of guanine and Asp are frequently observed. The Hoogsteen edge of the purine bases is a good discriminatory element in recognition of nucleotide bases by protein side chains through the pseudo pairing: the Hoogsteen edge of adenine is recognized by various amino acids while the Hoogsteen edge of guanine is only recognized by Arg. The sugar edge is rarely recognized by either the side-chain or peptide backbone of amino acid residues. PMID:21737431

  4. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  5. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  6. Adaptive predictors based on probabilistic SVM for real time disruption mitigation on JET

    Science.gov (United States)

    Murari, A.; Lungaroni, M.; Peluso, E.; Gaudio, P.; Vega, J.; Dormido-Canto, S.; Baruzzo, M.; Gelfusa, M.; Contributors, JET

    2018-05-01

    Detecting disruptions with sufficient anticipation time is essential to undertake any form of remedial strategy, mitigation or avoidance. Traditional predictors based on machine learning techniques can be very performing, if properly optimised, but do not provide a natural estimate of the quality of their outputs and they typically age very quickly. In this paper a new set of tools, based on probabilistic extensions of support vector machines (SVM), are introduced and applied for the first time to JET data. The probabilistic output constitutes a natural qualification of the prediction quality and provides additional flexibility. An adaptive training strategy ‘from scratch’ has also been devised, which allows preserving the performance even when the experimental conditions change significantly. Large JET databases of disruptions, covering entire campaigns and thousands of discharges, have been analysed, both for the case of the graphite and the ITER Like Wall. Performance significantly better than any previous predictor using adaptive training has been achieved, satisfying even the requirements of the next generation of devices. The adaptive approach to the training has also provided unique information about the evolution of the operational space. The fact that the developed tools give the probability of disruption improves the interpretability of the results, provides an estimate of the predictor quality and gives new insights into the physics. Moreover, the probabilistic treatment permits to insert more easily these classifiers into general decision support and control systems.

  7. PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.

    Science.gov (United States)

    Wang, Yanbin; You, Zhuhong; Li, Xiao; Chen, Xing; Jiang, Tonghai; Zhang, Jingting

    2017-05-11

    Protein-protein interactions (PPIs) are essential for most living organisms' process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori , the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.

  8. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

    Science.gov (United States)

    Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne

    2018-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  9. Classification of EEG-P300 Signals Extracted from Brain Activities in BCI Systems Using ν-SVM and BLDA Algorithms

    Directory of Open Access Journals (Sweden)

    Ali MOMENNEZHAD

    2014-06-01

    Full Text Available In this paper, a linear predictive coding (LPC model is used to improve classification accuracy, convergent speed to maximum accuracy, and maximum bitrates in brain computer interface (BCI system based on extracting EEG-P300 signals. First, EEG signal is filtered in order to eliminate high frequency noise. Then, the parameters of filtered EEG signal are extracted using LPC model. Finally, the samples are reconstructed by LPC coefficients and two classifiers, a Bayesian Linear discriminant analysis (BLDA, and b the υ-support vector machine (υ-SVM are applied in order to classify. The proposed algorithm performance is compared with fisher linear discriminant analysis (FLDA. Results show that the efficiency of our algorithm in improving classification accuracy and convergent speed to maximum accuracy are much better. As example at the proposed algorithms, respectively BLDA with LPC model and υ-SVM with LPC model with8 electrode configuration for subject S1 the total classification accuracy is improved as 9.4% and 1.7%. And also, subject 7 at BLDA and υ-SVM with LPC model algorithms (LPC+BLDA and LPC+ υ-SVM after block 11th converged to maximum accuracy but Fisher Linear Discriminant Analysis (FLDA algorithm did not converge to maximum accuracy (with the same configuration. So, it can be used as a promising tool in designing BCI systems.

  10. Protein microarray: sensitive and effective immunodetection for drug residues

    Directory of Open Access Journals (Sweden)

    Zer Cindy

    2010-02-01

    Full Text Available Abstract Background Veterinary drugs such as clenbuterol (CL and sulfamethazine (SM2 are low molecular weight ( Results The artificial antigens were spotted on microarray slides. Standard concentrations of the compounds were added to compete with the spotted antigens for binding to the antisera to determine the IC50. Our microarray assay showed the IC50 were 39.6 ng/ml for CL and 48.8 ng/ml for SM2, while the traditional competitive indirect-ELISA (ci-ELISA showed the IC50 were 190.7 ng/ml for CL and 156.7 ng/ml for SM2. We further validated the two methods with CL fortified chicken muscle tissues, and the protein microarray assay showed 90% recovery while the ci-ELISA had 76% recovery rate. When tested with CL-fed chicken muscle tissues, the protein microarray assay had higher sensitivity (0.9 ng/g than the ci-ELISA (0.1 ng/g for detection of CL residues. Conclusions The protein microarrays showed 4.5 and 3.5 times lower IC50 than the ci-ELISA detection for CL and SM2, respectively, suggesting that immunodetection of small molecules with protein microarray is a better approach than the traditional ELISA technique.

  11. Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs.

    Directory of Open Access Journals (Sweden)

    Md Mehedi Hasan

    Full Text Available Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ubiquitin-like protein (Pup tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in 10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.

  12. APPLICATION OF FUSION WITH SAR AND OPTICAL IMAGES IN LAND USE CLASSIFICATION BASED ON SVM

    Directory of Open Access Journals (Sweden)

    C. Bao

    2012-07-01

    Full Text Available As the increment of remote sensing data with multi-space resolution, multi-spectral resolution and multi-source, data fusion technologies have been widely used in geological fields. Synthetic Aperture Radar (SAR and optical camera are two most common sensors presently. The multi-spectral optical images express spectral features of ground objects, while SAR images express backscatter information. Accuracy of the image classification could be effectively improved fusing the two kinds of images. In this paper, Terra SAR-X images and ALOS multi-spectral images were fused for land use classification. After preprocess such as geometric rectification, radiometric rectification noise suppression and so on, the two kind images were fused, and then SVM model identification method was used for land use classification. Two different fusion methods were used, one is joining SAR image into multi-spectral images as one band, and the other is direct fusing the two kind images. The former one can raise the resolution and reserve the texture information, and the latter can reserve spectral feature information and improve capability of identifying different features. The experiment results showed that accuracy of classification using fused images is better than only using multi-spectral images. Accuracy of classification about roads, habitation and water bodies was significantly improved. Compared to traditional classification method, the method of this paper for fused images with SVM classifier could achieve better results in identifying complicated land use classes, especially for small pieces ground features.

  13. Highly sensitive rapid fluorescence detection of protein residues on surgical instruments

    International Nuclear Information System (INIS)

    Kovalev, Valeri I; Bartona, James S; Richardson, Patricia R; Jones, Anita C

    2006-01-01

    There is a risk of contamination of surgical instruments by infectious protein residues, in particular, prions which are the agents for Creutzfeldt-Jakob Disease in humans. They are exceptionally resistant to conventional sterilization, therefore it is important to detect their presence as contaminants so that alternative cleaning procedures can be applied. We describe the development of an optimized detection system for fluorescently labelled protein, suitable for in-hospital use. We show that under optimum conditions the technique can detect ∼10 attomole/cm 2 with a scan speed of ∼3-10 cm 2 /s of the test instrument's surface. A theoretical analysis and experimental measurements will be discussed

  14. DisArticle: a web server for SVM-based discrimination of articles on traditional medicine.

    Science.gov (United States)

    Kim, Sang-Kyun; Nam, SeJin; Kim, SangHyun

    2017-01-28

    Much research has been done in Northeast Asia to show the efficacy of traditional medicine. While MEDLINE contains many biomedical articles including those on traditional medicine, it does not categorize those articles by specific research area. The aim of this study was to provide a method that searches for articles only on traditional medicine in Northeast Asia, including traditional Chinese medicine, from among the articles in MEDLINE. This research established an SVM-based classifier model to identify articles on traditional medicine. The TAK + HM classifier, trained with the features of title, abstract, keywords, herbal data, and MeSH, has a precision of 0.954 and a recall of 0.902. In particular, the feature of herbal data significantly increased the performance of the classifier. By using the TAK + HM classifier, a total of about 108,000 articles were discriminated as articles on traditional medicine from among all articles in MEDLINE. We also built a web server called DisArticle ( http://informatics.kiom.re.kr/disarticle ), in which users can search for the articles and obtain statistical data. Because much evidence-based research on traditional medicine has been published in recent years, it has become necessary to search for articles on traditional medicine exclusively in literature databases. DisArticle can help users to search for and analyze the research trends in traditional medicine.

  15. Support vector machines for prediction and analysis of beta and gamma-turns in proteins.

    Science.gov (United States)

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2005-04-01

    Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.

  16. Exploring QSARs of the interaction of flavonoids with GABA (A) receptor using MLR, ANN and SVM techniques.

    Science.gov (United States)

    Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K

    2014-10-01

    Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.

  17. Residual dipolar couplings : a new technique for structure determination of proteins in solution

    NARCIS (Netherlands)

    van Lune, Frouktje Sapke

    2004-01-01

    The aim of the work described in this thesis was to investigate how residual dipolar couplings can be used to resolve or refine the three-dimensional structure of one of the proteins of the phosphoenol-pyruvate phosphotransferase system (PTS), the main transport system for carbohydrates in

  18. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Directory of Open Access Journals (Sweden)

    Ruan Jishou

    2007-04-01

    Full Text Available Abstract Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP; the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are

  19. Knowledge base and neural network approach for protein secondary structure prediction.

    Science.gov (United States)

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Highly sensitive rapid fluorescence detection of protein residues on surgical instruments

    Energy Technology Data Exchange (ETDEWEB)

    Kovalev, Valeri I [School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Bartona, James S [School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Richardson, Patricia R [School of Chemistry, University of Edinburgh, Edinburgh, EH9 3JJ (United Kingdom); Jones, Anita C [School of Chemistry, University of Edinburgh, Edinburgh, EH9 3JJ (United Kingdom)

    2006-07-15

    There is a risk of contamination of surgical instruments by infectious protein residues, in particular, prions which are the agents for Creutzfeldt-Jakob Disease in humans. They are exceptionally resistant to conventional sterilization, therefore it is important to detect their presence as contaminants so that alternative cleaning procedures can be applied. We describe the development of an optimized detection system for fluorescently labelled protein, suitable for in-hospital use. We show that under optimum conditions the technique can detect {approx}10 attomole/cm{sup 2} with a scan speed of {approx}3-10 cm{sup 2}/s of the test instrument's surface. A theoretical analysis and experimental measurements will be discussed.

  1. Fasting, but Not Aging, Dramatically Alters the Redox Status of Cysteine Residues on Proteins in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Katja E. Menger

    2015-06-01

    Full Text Available Altering the redox state of cysteine residues on protein surfaces is an important response to environmental challenges. Although aging and fasting alter many redox processes, the role of cysteine residues is uncertain. To address this, we used a redox proteomic technique, oxidative isotope-coded affinity tags (OxICAT, to assess cysteine-residue redox changes in Drosophila melanogaster during aging and fasting. This approach enabled us to simultaneously identify and quantify the redox state of several hundred cysteine residues in vivo. Cysteine residues within young flies had a bimodal distribution with peaks at ∼10% and ∼85% reversibly oxidized. Surprisingly, these cysteine residues did not become more oxidized with age. In contrast, 24 hr of fasting dramatically oxidized cysteine residues that were reduced under fed conditions while also reducing cysteine residues that were initially oxidized. We conclude that fasting, but not aging, dramatically alters cysteine-residue redox status in D. melanogaster.

  2. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

    KAUST Repository

    Chen, Peng

    2014-12-03

    Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

  3. Tsetse salivary gland proteins 1 and 2 are high affinity nucleic acid binding proteins with residual nuclease activity.

    Directory of Open Access Journals (Sweden)

    Guy Caljon

    Full Text Available Analysis of the tsetse fly salivary gland EST database revealed the presence of a highly enriched cluster of putative endonuclease genes, including tsal1 and tsal2. Tsal proteins are the major components of tsetse fly (G. morsitans morsitans saliva where they are present as monomers as well as high molecular weight complexes with other saliva proteins. We demonstrate that the recombinant tsetse salivary gland proteins 1&2 (Tsal1&2 display DNA/RNA non-specific, high affinity nucleic acid binding with K(D values in the low nanomolar range and a non-exclusive preference for duplex. These Tsal proteins exert only a residual nuclease activity with a preference for dsDNA in a broad pH range. Knockdown of Tsal expression by in vivo RNA interference in the tsetse fly revealed a partially impaired blood digestion phenotype as evidenced by higher gut nucleic acid, hematin and protein contents.

  4. A New Hybrid Model FPA-SVM Considering Cointegration for Particular Matter Concentration Forecasting: A Case Study of Kunming and Yuxi, China.

    Science.gov (United States)

    Li, Weide; Kong, Demeng; Wu, Jinran

    2017-01-01

    Air pollution in China is becoming more serious especially for the particular matter (PM) because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China) are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM), in which the parameters c and g are optimized by flower pollination algorithm (FPA). Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality.

  5. Identifying inter-residue resonances in crowded 2D {sup 13}C-{sup 13}C chemical shift correlation spectra of membrane proteins by solid-state MAS NMR difference spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Miao Yimin; Cross, Timothy A. [Florida State University, Department of Chemistry and Biochemistry (United States); Fu Riqiang, E-mail: rfu@magnet.fsu.edu [National High Magnet Field Lab (United States)

    2013-07-15

    The feasibility of using difference spectroscopy, i.e. subtraction of two correlation spectra at different mixing times, for substantially enhanced resolution in crowded two-dimensional {sup 13}C-{sup 13}C chemical shift correlation spectra is presented. With the analyses of {sup 13}C-{sup 13}C spin diffusion in simple spin systems, difference spectroscopy is proposed to partially separate the spin diffusion resonances of relatively short intra-residue distances from the longer inter-residue distances, leading to a better identification of the inter-residue resonances. Here solid-state magic-angle-spinning NMR spectra of the full length M2 protein embedded in synthetic lipid bilayers have been used to illustrate the resolution enhancement in the difference spectra. The integral membrane M2 protein of Influenza A virus assembles as a tetrameric bundle to form a proton-conducting channel that is activated by low pH and is essential for the viral lifecycle. Based on known amino acid resonance assignments from amino acid specific labeled samples of truncated M2 sequences or from time-consuming 3D experiments of uniformly labeled samples, some inter-residue resonances of the full length M2 protein can be identified in the difference spectra of uniformly {sup 13}C labeled protein that are consistent with the high resolution structure of the M2 (22-62) protein (Sharma et al., Science 330(6003):509-512, 2010)

  6. Estimation of hydraulic jump characteristics of channels with sudden diverging side walls via SVM.

    Science.gov (United States)

    Roushangar, Kiyoumars; Valizadeh, Reyhaneh; Ghasempour, Roghayeh

    2017-10-01

    Sudden diverging channels are one of the energy dissipaters which can dissipate most of the kinetic energy of the flow through a hydraulic jump. An accurate prediction of hydraulic jump characteristics is an important step in designing hydraulic structures. This paper focuses on the capability of the support vector machine (SVM) as a meta-model approach for predicting hydraulic jump characteristics in different sudden diverging stilling basins (i.e. basins with and without appurtenances). In this regard, different models were developed and tested using 1,018 experimental data. The obtained results proved the capability of the SVM technique in predicting hydraulic jump characteristics and it was found that the developed models for a channel with a central block performed more successfully than models for channels without appurtenances or with a negative step. The superior performance for the length of hydraulic jump was obtained for the model with parameters F 1 (Froude number) and (h 2- h 1 )/h 1 (h 1 and h 2 are sequent depth of upstream and downstream respectively). Concerning the relative energy dissipation and sequent depth ratio, the model with parameters F 1 and h 1 /B (B is expansion ratio) led to the best results. According to the outcome of sensitivity analysis, Froude number had the most significant effect on the modeling. Also comparison between SVM and empirical equations indicated the great performance of the SVM.

  7. Lactobacillus plantarum BL011 cultivation in industrial isolated soybean protein acid residue

    Directory of Open Access Journals (Sweden)

    Chaline Caren Coghetto

    Full Text Available Abstract In this study, physiological aspects of Lactobacillus plantarum BL011 growing in a new, all-animal free medium in bioreactors were evaluated aiming at the production of this important lactic acid bacterium. Cultivations were performed in submerged batch bioreactors using the Plackett-Burman methodology to evaluate the influence of temperature, aeration rate and stirring speed as well as the concentrations of liquid acid protein residue of soybean, soy peptone, corn steep liquor, and raw yeast extract. The results showed that all variables, except for corn steep liquor, significantly influenced biomass production. The best condition was applied to bioreactor cultures, which produced a maximal biomass of 17.87 g L-1, whereas lactic acid, the most important lactic acid bacteria metabolite, peaked at 37.59 g L-1, corresponding to a productivity of 1.46 g L-1 h-1. This is the first report on the use of liquid acid protein residue of soybean medium for L. plantarum growth. These results support the industrial use of this system as an alternative to produce probiotics without animal-derived ingredients to obtain high biomass concentrations in batch bioreactors.

  8. Forecasting Dry Bulk Freight Index with Improved SVM

    Directory of Open Access Journals (Sweden)

    Qianqian Han

    2014-01-01

    Full Text Available An improved SVM model is presented to forecast dry bulk freight index (BDI in this paper, which is a powerful tool for operators and investors to manage the market trend and avoid price risking shipping industry. The BDI is influenced by many factors, especially the random incidents in dry bulk market, inducing the difficulty in forecasting of BDI. Therefore, to eliminate the impact of random incidents in dry bulk market, wavelet transform is adopted to denoise the BDI data series. Hence, the combined model of wavelet transform and support vector machine is developed to forecast BDI in this paper. Lastly, the BDI data in 2005 to 2012 are presented to test the proposed model. The 84 prior consecutive monthly BDI data are the inputs of the model, and the last 12 monthly BDI data are the outputs of model. The parameters of the model are optimized by genetic algorithm and the final model is conformed through SVM training. This paper compares the forecasting result of proposed method and three other forecasting methods. The result shows that the proposed method has higher accuracy and could be used to forecast the short-term trend of the BDI.

  9. Optimization of the protein concentration process from residual peanut oil-cake

    Directory of Open Access Journals (Sweden)

    Gayol, M. F.

    2013-12-01

    Full Text Available The objective of this study was to find the best process conditions for preparing protein concentrate from residual peanut oil-cake (POC. The study was carried out on POC from industrial peanut oil extraction. Different protein extraction and precipitation conditions were used: water/ flour ratio (10:1, 20:1 and 30:1, pH (8, 9 and 10, NaCl concentration (0 and 0.5 M, extraction time (30, 60 and 120 min, temperature (25, 40 and 60 °C, extraction stages (1, 2 and 3, and precipitation pH (4, 4.5 and 5. The extraction and precipitation conditions which showed the highest protein yield were 10:1 water/flour ratio, extraction at pH 9, no NaCl, 2 extraction stages of 30 min at 40 °C and precipitation at pH 4.5. Under these conditions, the peanut protein concentrate (PC contained 86.22% protein, while the initial POC had 38.04% . POC is an alternative source of protein that can be used for human consumption or animal nutrition. Therefore, it adds value to an industry residue.El objetivo de este trabajo fue encontrar las mejores condiciones para obtener un concentrado de proteínas a partir de la torta residual de maní (POC. El estudio se llevó a cabo en POC provenientes de la extracción industrial de aceite de maní. Se utilizaron distintas condiciones para la extracción y precipitación de proteínas: relación agua / harina (10:1, 20:1 y 30:1, pH de extracción (8, 9 y 10, concentración de NaCl (0 y 0,5 M, tiempo de extracción (30, 60 y 120 min, temperatura (25, 40 y 60 °C, número de etapas de extracción (1, 2 y 3, y el pH de precipitación (4, 4,5 y 5. Las condiciones de extracción y de precipitación que mostraron mayor rendimiento de proteína fueron: relación de 10:1 en agua / harina, pH de extracción de 9, en ausencia de NaCl, 2 etapas de extracción de 30 min cada una a 40 °C y el pH de precipitación de 4,5. En estas condiciones, el concentrado de proteína de maní (PC fue de 86,22%, mientras que el porcentaje de proteínas de

  10. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    Science.gov (United States)

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  11. ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval.

    Science.gov (United States)

    Wang, Jingyan; Gao, Xin; Wang, Quanquan; Li, Yongping

    2012-05-08

    The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N(i) and N(j).Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial

  12. [The importance of C-terminal aspartic acid residue (D141) to the antirestriction activity of the ArdB (R64) protein].

    Science.gov (United States)

    Kudryavtseva, A A; Osetrova, M S; Livinyuk, V Ya; Manukhov, I V; Zavilgelsky, G B

    2017-01-01

    Antirestriction proteins of the ArdB/KlcA family are specific inhibitors of restriction (endonuclease) activity of type-I restriction/modification enzymes. The effect of conserved amino acid residues on the antirestriction activity of the ArdB protein encoded by the transmissible R64 (IncI1) plasmid has been investigated. An analysis of the amino acid sequences of ArdB homologues demonstrated the presence of four groups of conserved residues ((1) R16, E32, and W51; (2) Y46 and G48; (3) S81, D83 and E132, and (4) N77, L(I)140, and D141) on the surface of the protein globule. Amino acid residues of the fourth group showed a unique localization pattern with the terminal residue protruding beyond the globule surface. The replacement of two conserved amino acids (D141 and N77) located in the close vicinity of each other on the globule surface showed that the C-terminal D141 is essential for the antirestriction activity of ArdB. The deletion of this residue, as well as replacement by a hydrophobic threonine residue (D141T), completely abolished the antirestriction activity of ArdB. The synonymous replacement of D141 by a glutamic acid residue (D141E) caused an approximately 30-fold decrease of the antirestriction activity of ArdB, and the point mutation N77A caused an approximately 20-fold decrease in activity. The residues D141 and N77 located on the surface of the protein globule are presumably essential for the formation of a contact between ArdB and a currently unknown factor that modulates the activity of type-I restriction/modification enzymes.

  13. NAPS: Network Analysis of Protein Structures

    Science.gov (United States)

    Chakrabarty, Broto; Parekh, Nita

    2016-01-01

    Traditionally, protein structures have been analysed by the secondary structure architecture and fold arrangement. An alternative approach that has shown promise is modelling proteins as a network of non-covalent interactions between amino acid residues. The network representation of proteins provide a systems approach to topological analysis of complex three-dimensional structures irrespective of secondary structure and fold type and provide insights into structure-function relationship. We have developed a web server for network based analysis of protein structures, NAPS, that facilitates quantitative and qualitative (visual) analysis of residue–residue interactions in: single chains, protein complex, modelled protein structures and trajectories (e.g. from molecular dynamics simulations). The user can specify atom type for network construction, distance range (in Å) and minimal amino acid separation along the sequence. NAPS provides users selection of node(s) and its neighbourhood based on centrality measures, physicochemical properties of amino acids or cluster of well-connected residues (k-cliques) for further analysis. Visual analysis of interacting domains and protein chains, and shortest path lengths between pair of residues are additional features that aid in functional analysis. NAPS support various analyses and visualization views for identifying functional residues, provide insight into mechanisms of protein folding, domain-domain and protein–protein interactions for understanding communication within and between proteins. URL:http://bioinf.iiit.ac.in/NAPS/. PMID:27151201

  14. Using Generalized Entropies and OC-SVM with Mahalanobis Kernel for Detection and Classification of Anomalies in Network Traffic

    Directory of Open Access Journals (Sweden)

    Jayro Santiago-Paz

    2015-09-01

    Full Text Available Network anomaly detection and classification is an important open issue in network security. Several approaches and systems based on different mathematical tools have been studied and developed, among them, the Anomaly-Network Intrusion Detection System (A-NIDS, which monitors network traffic and compares it against an established baseline of a “normal” traffic profile. Then, it is necessary to characterize the “normal” Internet traffic. This paper presents an approach for anomaly detection and classification based on Shannon, Rényi and Tsallis entropies of selected features, and the construction of regions from entropy data employing the Mahalanobis distance (MD, and One Class Support Vector Machine (OC-SVM with different kernels (Radial Basis Function (RBF and Mahalanobis Kernel (MK for “normal” and abnormal traffic. Regular and non-regular regions built from “normal” traffic profiles allow anomaly detection, while the classification is performed under the assumption that regions corresponding to the attack classes have been previously characterized. Although this approach allows the use of as many features as required, only four well-known significant features were selected in our case. In order to evaluate our approach, two different data sets were used: one set of real traffic obtained from an Academic Local Area Network (LAN, and the other a subset of the 1998 MIT-DARPA set. For these data sets, a True positive rate up to 99.35%, a True negative rate up to 99.83% and a False negative rate at about 0.16% were yielded. Experimental results show that certain q-values of the generalized entropies and the use of OC-SVM with RBF kernel improve the detection rate in the detection stage, while the novel inclusion of MK kernel in OC-SVM and k-temporal nearest neighbors improve accuracy in classification. In addition, the results show that using the Box-Cox transformation, the Mahalanobis distance yielded high detection rates with

  15. Estimation of Costs and Durations of Construction of Urban Roads Using ANN and SVM

    Directory of Open Access Journals (Sweden)

    Igor Peško

    2017-01-01

    Full Text Available Offer preparation has always been a specific part of a building process which has significant impact on company business. Due to the fact that income greatly depends on offer’s precision and the balance between planned costs, both direct and overheads, and wished profit, it is necessary to prepare a precise offer within required time and available resources which are always insufficient. The paper presents a research of precision that can be achieved while using artificial intelligence for estimation of cost and duration in construction projects. Both artificial neural networks (ANNs and support vector machines (SVM are analysed and compared. The best SVM has shown higher precision, when estimating costs, with mean absolute percentage error (MAPE of 7.06% compared to the most precise ANNs which has achieved precision of 25.38%. Estimation of works duration has proved to be more difficult. The best MAPEs were 22.77% and 26.26% for SVM and ANN, respectively.

  16. Segmentasi Citra menggunakan Support Vector Machine (SVM dan Ellipsoid Region Search Strategy (ERSS Arimoto Entropy berdasarkan Ciri Warna dan Tekstur

    Directory of Open Access Journals (Sweden)

    Lukman Hakim

    2016-02-01

    Full Text Available Abstrak Segmentasi citra merupakan suatu metode penting dalam pengolahan citra digital yang bertujuan membagi citra menjadi beberapa region yang homogen berdasarkan kriteria kemiripan tertentu. Salah satu syarat utama yang harus dimiliki suatu metode segmentasi citra yaitu menghasilkan citra boundary yang optimal.Untuk memenuhi syarat tersebut suatu metode segmentasi membutuhkan suatu klasifikasi piksel citra yang dapat memisahkan piksel secara linier dan non-linear. Pada penelitian ini, penulis mengusulkan metode segmentasi citra menggunakan SVM dan entropi Arimoto berbasis ERSS sehingga tahan terhadap derau dan mempunyai kompleksitas yang rendah untuk menghasilkan citra boundary yang optimal. Pertama, ekstraksi ciri warna dengan local homogeneity dan ciri tekstur dengan menggunakan Gray Level Co-occurrence Matrix (GLCM yang menghasilkan beberapa fitur. Kedua, pelabelan dengan Arimoto berbasis ERSS yang digunakan sebagai kelas dalam klasifikasi. Ketiga, hasil ekstraksi fitur dan training kemudian diklasifikasi berdasarkan label dengan SVM yang telah di-training. Dari percobaan yang dilakukan menunjukkan hasil segmentasi kurang optimal dengan akurasi 69 %. Reduksi fitur perlu dilakukan untuk menghasilkan citra yang tersegmentasi dengan baik. Kata kunci: segmentasi citra, support vector machine, ERSS Arimoto Entropy, ekstraksi ciri. Abstract Image segmentation is an important tool in image processing that divides an image into homogeneous regions based on certain similarity criteria, which ideally should be meaning-full for a certain purpose. Optimal boundary is one of the main criteria that an image segmentation method should has. A classification method that can partitions pixel linearly or non-linearly is needed by an image segmentation method. We propose a color image segmentation using Support Vector Machine (SVM classification and ERSS Arimoto entropy thresholding to get optimal boundary of segmented image that noise-free and low complexity

  17. A New Hybrid Model FPA-SVM Considering Cointegration for Particular Matter Concentration Forecasting: A Case Study of Kunming and Yuxi, China

    Directory of Open Access Journals (Sweden)

    Weide Li

    2017-01-01

    Full Text Available Air pollution in China is becoming more serious especially for the particular matter (PM because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM, in which the parameters c and g are optimized by flower pollination algorithm (FPA. Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality.

  18. OPTIMALISASI SUPPORT VEKTOR MACHINE (SVM UNTUK KLASIFIKASI TEMA TUGAS AKHIR BERBASIS K-MEANS

    Directory of Open Access Journals (Sweden)

    Oman Somantri

    2017-01-01

    Full Text Available The difficulty in determining the classification of students final project theme often experienced by each college. The purpose of this study is to provide a decision support for policy makers in the study program so that each student can be achieved in accordance with their own competence. From the research that has been done text mining algorithms using Support Vector Machine ( SVM and K -Means as the technology used was produced a better accuracy rate with an accuracy rate of 86.21 % when compared to the SVM without K -Means is 85 , 38 %

  19. Selective effects of charge on G protein activation by FSH-receptor residues 551-555 and 650-653.

    Science.gov (United States)

    Grasso, P; Deziel, M R; Reichert, L E

    1995-01-01

    Two cytosolic regions of the rat testicular FSH receptor (FSHR), residues 533-555 and 645-653, have been identified as G protein-coupling domains. We localized the activity in these domains to their C-terminal sequences, residues 551-555 (KIAKR, net charge +3) and 650-653 (RKSH, net charge +3), and examined the effects of charge on G protein activation by the C-terminal peptides, using synthetic analogs containing additions, through alanine (A) linkages, of arginine (R, +), histidine (H, +) or both. RA-KIAKR (net charge +4) mimicked the effect of FSHR-(551-555) on guanine nucleotide exchange in rat testis membranes, but reduced its ability to inhibit FSH-stimulated estradiol biosynthesis in cultured rat Sertoli cells. Further increasing net charge by the addition of H (HARA-KIAKR, net charge +5) increased guanosine 5'-triphosphate (GTP) binding, but eliminated FSHR-(551-555) effects on FSH-stimulated steroidogenesis. HA-RKSH (net charge +4) significantly inhibited guanine nucleotide exchange in rat testis membranes, but stimulated basal and potentiated FSH-induced estradiol biosynthesis in cultured rat Sertoli cells. Addition of two H residues (HAHA-RKSH, net charge +5) restored GTP binding and further potentiated basal and FSH-stimulated steroidogenesis. These results suggest that positive charges in G protein-coupling domains of the FSHR play a role in modulating G protein activation and postbinding effects of FSH, such as steroidogenesis.

  20. Stationary Wavelet Transform and AdaBoost with SVM Based Pathological Brain Detection in MRI Scanning.

    Science.gov (United States)

    Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar

    2017-01-01

    This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. High resolution tempo-spatial ozone prediction with SVM and LSTM

    Science.gov (United States)

    Gao, D.; Zhang, Y.; Qu, Z.; Sadighi, K.; Coffey, E.; LIU, Q.; Hannigan, M.; Henze, D. K.; Dick, R.; Shang, L.; Lv, Q.

    2017-12-01

    To investigate and predict the exposure of ozone and other pollutants in urban areas, we utilize data from various infrastructures including EPA, NOAA and RIITS from government of Los Angeles and construct statistical models to conduct ozone concentration prediction in Los Angeles areas at finer spatial and temporal granularity. Our work involves cyber data such as traffic, roads and population data as features for prediction. Two statistical models, Support Vector Machine (SVM) and Long Short-term Memory (LSTM, deep learning method) are used for prediction. . Our experiments show that kernelized SVM gains better prediction performance when taking traffic counts, road density and population density as features, with a prediction RMSE of 7.99 ppb for all-time ozone and 6.92 ppb for peak-value ozone. With simulated NOx from Chemical Transport Model(CTM) as features, SVM generates even better prediction performance, with a prediction RMSE of 6.69ppb. We also build LSTM, which has shown great advantages at dealing with temporal sequences, to predict ozone concentration by treating ozone concentration as spatial-temporal sequences. Trained by ozone concentration measurements from the 13 EPA stations in LA area, the model achieves 4.45 ppb RMSE. Besides, we build a variant of this model which adds spatial dynamics into the model in the form of transition matrix that reveals new knowledge on pollutant transition. The forgetting gate of the trained LSTM is consistent with the delay effect of ozone concentration and the trained transition matrix shows spatial consistency with the common direction of winds in LA area.

  2. Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System

    Science.gov (United States)

    Velazquez-Pupo, Roxana; Sierra-Romero, Alberto; Torres-Roman, Deni; Shkvarko, Yuriy V.; Romero-Delgado, Misael

    2018-01-01

    This paper presents a high performance vision-based system with a single static camera for traffic surveillance, for moving vehicle detection with occlusion handling, tracking, counting, and One Class Support Vector Machine (OC-SVM) classification. In this approach, moving objects are first segmented from the background using the adaptive Gaussian Mixture Model (GMM). After that, several geometric features are extracted, such as vehicle area, height, width, centroid, and bounding box. As occlusion is present, an algorithm was implemented to reduce it. The tracking is performed with adaptive Kalman filter. Finally, the selected geometric features: estimated area, height, and width are used by different classifiers in order to sort vehicles into three classes: small, midsize, and large. Extensive experimental results in eight real traffic videos with more than 4000 ground truth vehicles have shown that the improved system can run in real time under an occlusion index of 0.312 and classify vehicles with a global detection rate or recall, precision, and F-measure of up to 98.190%, and an F-measure of up to 99.051% for midsize vehicles. PMID:29382078

  3. Metals in proteins: correlation between the metal-ion type, coordination number and the amino-acid residues involved in the coordination.

    Science.gov (United States)

    Dokmanić, Ivan; Sikić, Mile; Tomić, Sanja

    2008-03-01

    Metal ions are constituents of many metalloproteins, in which they have either catalytic (metalloenzymes) or structural functions. In this work, the characteristics of various metals were studied (Cu, Zn, Mg, Mn, Fe, Co, Ni, Cd and Ca in proteins with known crystal structure) as well as the specificity of their environments. The analysis was performed on two data sets: the set of protein structures in the Protein Data Bank (PDB) determined with resolution metal ion and its electron donors and the latter was used to assess the preferred coordination numbers and common combinations of amino-acid residues in the neighbourhood of each metal. Although the metal ions considered predominantly had a valence of two, their preferred coordination number and the type of amino-acid residues that participate in the coordination differed significantly from one metal ion to the next. This study concentrates on finding the specificities of a metal-ion environment, namely the distribution of coordination numbers and the amino-acid residue types that frequently take part in coordination. Furthermore, the correlation between the coordination number and the occurrence of certain amino-acid residues (quartets and triplets) in a metal-ion coordination sphere was analysed. The results obtained are of particular value for the identification and modelling of metal-binding sites in protein structures derived by homology modelling. Knowledge of the geometry and characteristics of the metal-binding sites in metalloproteins of known function can help to more closely determine the biological activity of proteins of unknown function and to aid in design of proteins with specific affinity for certain metals.

  4. Fine-grained leukocyte classification with deep residual learning for microscopic images.

    Science.gov (United States)

    Qin, Feiwei; Gao, Nannan; Peng, Yong; Wu, Zizhao; Shen, Shuying; Grudtsin, Artur

    2018-08-01

    Leukocyte classification and cytometry have wide applications in medical domain, previous researches usually exploit machine learning techniques to classify leukocytes automatically. However, constrained by the past development of machine learning techniques, for example, extracting distinctive features from raw microscopic images are difficult, the widely used SVM classifier only has relative few parameters to tune, these methods cannot efficiently handle fine-grained classification cases when the white blood cells have up to 40 categories. Based on deep learning theory, a systematic study is conducted on finer leukocyte classification in this paper. A deep residual neural network based leukocyte classifier is constructed at first, which can imitate the domain expert's cell recognition process, and extract salient features robustly and automatically. Then the deep neural network classifier's topology is adjusted according to the prior knowledge of white blood cell test. After that the microscopic image dataset with almost one hundred thousand labeled leukocytes belonging to 40 categories is built, and combined training strategies are adopted to make the designed classifier has good generalization ability. The proposed deep residual neural network based classifier was tested on microscopic image dataset with 40 leukocyte categories. It achieves top-1 accuracy of 77.80%, top-5 accuracy of 98.75% during the training procedure. The average accuracy on the test set is nearly 76.84%. This paper presents a fine-grained leukocyte classification method for microscopic images, based on deep residual learning theory and medical domain knowledge. Experimental results validate the feasibility and effectiveness of our approach. Extended experiments support that the fine-grained leukocyte classifier could be used in real medical applications, assist doctors in diagnosing diseases, reduce human power significantly. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  6. Conversion of lignocellulosic agave residues into liquid biofuels using an AFEX™-based biorefinery.

    Science.gov (United States)

    Flores-Gómez, Carlos A; Escamilla Silva, Eleazar M; Zhong, Cheng; Dale, Bruce E; da Costa Sousa, Leonardo; Balan, Venkatesh

    2018-01-01

    Agave-based alcoholic beverage companies generate thousands of tons of solid residues per year in Mexico. These agave residues might be used for biofuel production due to their abundance and favorable sustainability characteristics. In this work, agave leaf and bagasse residues from species Agave tequilana and Agave salmiana were subjected to pretreatment using the ammonia fiber expansion (AFEX) process. The pretreatment conditions were optimized using a response surface design methodology. We also identified commercial enzyme mixtures that maximize sugar yields for AFEX-pretreated agave bagasse and leaf matter, at ~ 6% glucan (w/w) loading enzymatic hydrolysis. Finally, the pretreated agave hydrolysates (at a total solids loading of ~ 20%) were used for ethanol fermentation using the glucose- and xylose-consuming strain Saccharomyces cerevisiae 424A (LNH-ST), to determine ethanol yields at industrially relevant conditions. Low-severity AFEX pretreatment conditions are required (100-120 °C) to enable efficient enzymatic deconstruction of the agave cell wall. These studies showed that AFEX-pretreated A. tequilana bagasse, A. tequilana leaf fiber, and A. salmiana bagasse gave ~ 85% sugar conversion during enzyme hydrolysis and over 90% metabolic yields of ethanol during fermentation without any washing step or nutrient supplementation. On the other hand, although lignocellulosic A. salmiana leaf gave high sugar conversions, the hydrolysate could not be fermented at high solids loadings, apparently due to the presence of natural inhibitory compounds. These results show that AFEX-pretreated agave residues can be effectively hydrolyzed at high solids loading using an optimized commercial enzyme cocktail (at 25 mg protein/g glucan) producing > 85% sugar conversions and over 40 g/L bioethanol titers. These results show that AFEX technology has considerable potential to convert lignocellulosic agave residues to bio-based fuels and chemicals in a biorefinery.

  7. Model-based leakage localization in drinking water distribution networks using structured residuals

    OpenAIRE

    Puig Cayuela, Vicenç; Rosich, Albert

    2013-01-01

    In this paper, a new model based approach to leakage localization in drinking water networks is proposed based on generating a set of structured residuals. The residual evaluation is based on a numerical method based on an enhanced Newton-Raphson algorithm. The proposed method is suitable for water network systems because the non-linearities of the model make impossible to derive analytical residuals. Furthermore, the computed residuals are designed so that leaks are decoupled, which impro...

  8. Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

    Directory of Open Access Journals (Sweden)

    Zomaya Albert Y

    2006-12-01

    Full Text Available Abstract Background Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix, secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. Results Improved DomainDiscovery is compared with other methods by benchmarking against a structurally non-redundant dataset and also CASP5 targets. Improved DomainDiscovery achieves 70% accuracy for domain boundary identification in multi-domains proteins. Conclusion Improved DomainDiscovery compares favourably to the performance of other methods and excels in the identification of domain boundaries for multi-domain proteins as a result of introducing support vector machine with benchmark_2 dataset.

  9. Identification of aspartate-184 as an essential residue in the catalytic subunit of cAMP-dependent protein kinase

    International Nuclear Information System (INIS)

    Buechler, J.A.; Taylor, S.S.

    1988-01-01

    The hydrophobic carbodiimide dicyclohexylcarbodiimide (DCCD) was previously shown to be an irreversible inhibitor of the catalytic subunit of cAMP-dependent protein kinase, and MgATP protected against inactivation. This inhibition by DCCD indicated that an essential carboxyl group was present at the active site of the enzyme even though identification of that carboxyl group was not possible. This presumably was because a nucleophile on the protein cross-linked to the electrophilic intermediate formed when the carbodiimide reacted with the carboxyl group. To circumvent this problem, the catalytic subunit first was treated with acetic anhydride to block accessible lysine residues, thus preventing intramolecular cross-linking. The DCCD reaction then was carried out in the presence of [ 14 C]glycine ethyl ester in order to trap any electrophilic intermediates that were generated by DCCD. The modified protein was treated with trypsin, and the resulting peptides were separated by HPLC. Two major radioactive peptides were isolated as well as one minor peptide. MgATP protected all three peptides from covalent modification. The two major peaks contained the same modified carboxyl group, which corresponded to Asp-184. The minor peak contained a modified glutamic acid, Glu-91. Both of these acidic residues are conserved in all protein kinases, which is consistent with their playing essential roles. The positions of Asp-184 and Glu-91 have been correlated with the overall domain structure of the molecule. Asp-184 may participate as a general base catalyst at the active site. A third carboxyl group, Glu-230, also was identified

  10. Hydrolysis of insoluble fish protein residue from whitemouth croaker (Micropogonias furnieri by fungi

    Directory of Open Access Journals (Sweden)

    Vilásia Guimarães Martins

    2014-02-01

    Full Text Available A significant amount of insoluble fibrous protein, in the form of feather, hair, scales, skin and others are available as co-products of agro industrial processing. These wastes are rich in keratin and collagen. This study evaluated different fungi for the hydrolysis of insoluble fish protein residues. Proteins resulting from Micropogonias furnieri wastes through pH-shifting process were dried and milled for fermentation for 96 h. This resulted the production of keratinolytic enzymes in the medium. Trichoderma sp. on alkaline substrate (28.99 U mL-1 and Penicillium sp. on acidic substrate (31.20 U mL-1 showed the highest proteolytic activities. Penicillium sp. showed the largest free amino acid solubilization (0.146 mg mL-1 and Fusarium sp. the highest protein solubilization (6.17 mg mL-1.

  11. Introduction of potential helix-capping residues into an engineered helical protein.

    Science.gov (United States)

    Parker, M H; Hefford, M A

    1998-08-01

    MB-1 is an engineered protein that was designed to incorporate high percentages of four amino acid residues and to fold into a four-alpha-helix bundle motif. Mutations were made in the putative loop I and III regions of this protein with the aim of increasing the stability of the helix ends. Four variants, MB-3, MB-5, MB-11 and MB-13, have replacements intended to promote formation of an 'N-capping box'. The loop I and III sequences of MB-3 (both GDLST) and MB-11 (GGDST) were designed to cause alphaL C-terminal 'capping' motifs to form in helices I and III. MB-5 has a sequence, GPDST, that places proline in a favourable position for forming beta-turns, whereas MB-13 (GLDST) has the potential to form Schellman C-capping motifs. Size-exclusion chromatography suggested that MB-1, MB-3, MB-5, MB-11 and MB-13 all form dimers, or possibly trimers. Free energies for the unfolding of each of these variants were determined by urea denaturation, with the loss of secondary structure followed by CD spectroscopy. Assuming an equilibrium between folded dimer and unfolded monomer, MB-13 had the highest apparent stability (40.5 kJ/mol, with +/-2.5 kJ/mol 95% confidence limits), followed by MB-11 (39.3+/-5.9 kJ/mol), MB-3 (36.4+/-1.7 kJ/mol), MB-5 (34.7+/-2.1 kJ/mol) and MB-1 (29.3+/-1.3 kJ/mol); the same relative stabilities of the variants were found when a folded trimer to unfolded monomer model was used to calculate stabilities. All of the variants were relatively unstable for dimeric proteins, but were significantly more stable than MB-1. These findings suggest that it might be possible to increase the stability of a protein for which the three-dimensional structure is unknown by placing amino acid residues in positions that have the potential to form helix- and turn-stabilizing motifs.

  12. Model-checking techniques based on cumulative residuals.

    Science.gov (United States)

    Lin, D Y; Wei, L J; Ying, Z

    2002-03-01

    Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.

  13. Development of an energy-protein for animal food based crop residues pear (Pyrus communis

    Directory of Open Access Journals (Sweden)

    Néstor Julián Pulido-Suárez,

    2016-01-01

    Full Text Available Pear (Pyrus communis is a fruit from the species of deciduous, widely consumed worldwide for its high quality energ y. However, pear itself does not provide the amount of protein required for cattle feeding, so alternatives to improve its nutritional quality have been studied. On these grounds, the objective of this study was to evaluate the parameters of solid state fermentation, and compositional energ y value of a protein food based on pears (Pyrus communis with apparent physical damage. A completely random design was used to evaluate three treatments; these correspond to percentages of inclusion of calcium carbonate (0.25, 0.50, 0.75 formulation based on already established (40 % pear, 25 % rice flour, 25 % wheat bran and 10 % urea, the parameters evaluated were: pH, ashes (CZ, crude protein (CP and crude fiber (CF, and they were recorded at 0, 24, 48 and 72 hours. As a result, it was found that the pH dropped gradually for each treatment and at each sampling period; however, there were no significant differences. The lower value at the end of the process is recorded T2 (0.25 with 4.66, followed by T3 (0.50 with 4.50, the ash reached values of up to 6 % with T3, and T2 (0.50 reached the highest percentages in fiber and crude protein. Finally, decreasing the fermentation variables ensures a food with no presence of undesirable microorganisms and stable over time.

  14. Direct determination of the redox status of cysteine residues in proteins in vivo

    Energy Technology Data Exchange (ETDEWEB)

    Hara, Satoshi [Chemical Resources Laboratory, Tokyo Institute of Technology, Nagatsuta 4259-R1-8, Midori-ku, Yokohama 226-8503 (Japan); Tatenaka, Yuki; Ohuchi, Yuya [Dojindo Laboratories, 2025-5 Tabaru, Mashiki-machi, Kumamoto 861-2202 (Japan); Hisabori, Toru, E-mail: thisabor@res.titech.ac.jp [Chemical Resources Laboratory, Tokyo Institute of Technology, Nagatsuta 4259-R1-8, Midori-ku, Yokohama 226-8503 (Japan); Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), Tokyo 102-0075 (Japan)

    2015-01-02

    Highlights: • A new DNA-maleimide which is cleaved by UV irradiation, DNA-PCMal, was developed. • DNA-PCMal can be used like DNA-Mal to analyze the redox state of cysteine residues. • It is useful for detecting the thiol redox status of a protein in vivo by Western blotting method. • Thus, DNA-PCMal can be a powerful tool for redox proteomics analysis. - Abstract: The redox states of proteins in cells are key factors in many cellular processes. To determine the redox status of cysteinyl thiol groups in proteins in vivo, we developed a new maleimide reagent, a photocleavable maleimide-conjugated single stranded DNA (DNA-PCMal). The DNA moiety of DNA-PCMal is easily removed by UV-irradiation, allowing DNA-PCMal to be used in Western blotting applications. Thereby the state of thiol groups in intracellular proteins can be directly evaluated. This new maleimide compound can provide information concerning redox proteins in vivo, which is important for our understanding of redox networks in the cell.

  15. The Replacement of 10 Non-Conserved Residues in the Core Protein of JFH-1 Hepatitis C Virus Improves Its Assembly and Secretion.

    Directory of Open Access Journals (Sweden)

    Loïc Etienne

    Full Text Available Hepatitis C virus (HCV assembly is still poorly understood. It is thought that trafficking of the HCV core protein to the lipid droplet (LD surface is essential for its multimerization and association with newly synthesized HCV RNA to form the viral nucleocapsid. We carried out a mapping analysis of several complete HCV genomes of all genotypes, and found that the genotype 2 JFH-1 core protein contained 10 residues different from those of other genotypes. The replacement of these 10 residues of the JFH-1 strain sequence with the most conserved residues deduced from sequence alignments greatly increased virus production. Confocal microscopy of the modified JFH-1 strain in cell culture showed that the mutated JFH-1 core protein, C10M, was present mostly at the endoplasmic reticulum (ER membrane, but not at the surface of the LDs, even though its trafficking to these organelles was possible. The non-structural 5A protein of HCV was also redirected to ER membranes and colocalized with the C10M core protein. Using a Semliki forest virus vector to overproduce core protein, we demonstrated that the C10M core protein was able to form HCV-like particles, unlike the native JFH-1 core protein. Thus, the substitution of a few selected residues in the JFH-1 core protein modified the subcellular distribution and assembly properties of the protein. These findings suggest that the early steps of HCV assembly occur at the ER membrane rather than at the LD surface. The C10M-JFH-1 strain will be a valuable tool for further studies of HCV morphogenesis.

  16. Robust Non-Linear Direct Torque and Flux Control of Adjustable Speed Sensorless PMSM Drive Based on SVM Using a PI Predictive Controller

    Directory of Open Access Journals (Sweden)

    F. Naceri

    2010-01-01

    Full Text Available This paper presents a new sensorless direct torque control method for voltage inverter – fed PMSM. The control methodis used a modified Direct Torque Control scheme with constant inverter switching frequency using Space Vector Modulation(DTC-SVM. The variation of stator and rotor resistance due to changes in temperature or frequency deteriorates theperformance of DTC-SVM controller by introducing errors in the estimated flux linkage and the electromagnetic torque.As a result, this approach will not be suitable for high power drives such as those used in tractions, as they require goodtorque control performance at considerably lower frequency. A novel stator resistance estimator is proposed. The estimationmethod is implemented using the Extended Kalman Filter. Finally extensive simulation results are presented to validate theproposed technique. The system is tested at different speeds and a very satisfactory performance has been achieved.

  17. Lactobacillus plantarum BL011 cultivation in industrial isolated soybean protein acid residue.

    Science.gov (United States)

    Coghetto, Chaline Caren; Vasconcelos, Carolina Bettker; Brinques, Graziela Brusch; Ayub, Marco Antônio Záchia

    In this study, physiological aspects of Lactobacillus plantarum BL011 growing in a new, all-animal free medium in bioreactors were evaluated aiming at the production of this important lactic acid bacterium. Cultivations were performed in submerged batch bioreactors using the Plackett-Burman methodology to evaluate the influence of temperature, aeration rate and stirring speed as well as the concentrations of liquid acid protein residue of soybean, soy peptone, corn steep liquor, and raw yeast extract. The results showed that all variables, except for corn steep liquor, significantly influenced biomass production. The best condition was applied to bioreactor cultures, which produced a maximal biomass of 17.87gL -1 , whereas lactic acid, the most important lactic acid bacteria metabolite, peaked at 37.59gL -1 , corresponding to a productivity of 1.46gL -1 h -1 . This is the first report on the use of liquid acid protein residue of soybean medium for L. plantarum growth. These results support the industrial use of this system as an alternative to produce probiotics without animal-derived ingredients to obtain high biomass concentrations in batch bioreactors. Copyright © 2016 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.

  18. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    Science.gov (United States)

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  19. Adaptive image denoising based on support vector machine and wavelet description

    Science.gov (United States)

    An, Feng-Ping; Zhou, Xian-Wei

    2017-12-01

    Adaptive image denoising method decomposes the original image into a series of basic pattern feature images on the basis of wavelet description and constructs the support vector machine regression function to realize the wavelet description of the original image. The support vector machine method allows the linear expansion of the signal to be expressed as a nonlinear function of the parameters associated with the SVM. Using the radial basis kernel function of SVM, the original image can be extended into a MEXICAN function and a residual trend. This MEXICAN represents a basic image feature pattern. If the residual does not fluctuate, it can also be represented as a characteristic pattern. If the residuals fluctuate significantly, it is treated as a new image and the same decomposition process is repeated until the residuals obtained by the decomposition do not significantly fluctuate. Experimental results show that the proposed method in this paper performs well; especially, it satisfactorily solves the problem of image noise removal. It may provide a new tool and method for image denoising.

  20. FreeContact: fast and free software for protein contact prediction from residue co-evolution.

    Science.gov (United States)

    Kaján, László; Hopf, Thomas A; Kalaš, Matúš; Marks, Debora S; Rost, Burkhard

    2014-03-26

    20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).