sampling-based structured prediction: Topics by WorldWideScience.org

Sample records for sampling-based structured prediction

Structural changes and out-of-sample prediction of realized range-based variance in the stock market

Science.gov (United States)

Gong, Xu; Lin, Boqiang

2018-03-01

This paper aims to examine the effects of structural changes on forecasting the realized range-based variance in the stock market. Considering structural changes in variance in the stock market, we develop the HAR-RRV-SC model on the basis of the HAR-RRV model. Subsequently, the HAR-RRV and HAR-RRV-SC models are used to forecast the realized range-based variance of S&P 500 Index. We find that there are many structural changes in variance in the U.S. stock market, and the period after the financial crisis contains more structural change points than the period before the financial crisis. The out-of-sample results show that the HAR-RRV-SC model significantly outperforms the HAR-BV model when they are employed to forecast the 1-day, 1-week, and 1-month realized range-based variances, which means that structural changes can improve out-of-sample prediction of realized range-based variance. The out-of-sample results remain robust across the alternative rolling fixed-window, the alternative threshold value in ICSS algorithm, and the alternative benchmark models. More importantly, we believe that considering structural changes can help improve the out-of-sample performances of most of other existing HAR-RRV-type models in addition to the models used in this paper.
A probabilistic fragment-based protein structure prediction algorithm.

Directory of Open Access Journals (Sweden)

David Simoncini

Full Text Available Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].
Blind Test of Physics-Based Prediction of Protein Structures

Science.gov (United States)

Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

2009-01-01

We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130
Ensemble-based prediction of RNA secondary structures.

Science.gov (United States)

Aghaeepour, Nima; Hoos, Holger H

2013-04-24

Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between
Distance matrix-based approach to protein structure prediction.

Science.gov (United States)

Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

2009-03-01

Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the
Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

Directory of Open Access Journals (Sweden)

Scheid Anika

2012-07-01

Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst
Performance of local information-based link prediction: a sampling perspective

Science.gov (United States)

Zhao, Jichang; Feng, Xu; Dong, Li; Liang, Xiao; Xu, Ke

2012-08-01

Link prediction is pervasively employed to uncover the missing links in the snapshots of real-world networks, which are usually obtained through different kinds of sampling methods. In the previous literature, in order to evaluate the performance of the prediction, known edges in the sampled snapshot are divided into the training set and the probe set randomly, without considering the underlying sampling approaches. However, different sampling methods might lead to different missing links, especially for the biased ways. For this reason, random partition-based evaluation of performance is no longer convincing if we take the sampling method into account. In this paper, we try to re-evaluate the performance of local information-based link predictions through sampling method governed division of the training set and the probe set. It is interesting that we find that for different sampling methods, each prediction approach performs unevenly. Moreover, most of these predictions perform weakly when the sampling method is biased, which indicates that the performance of these methods might have been overestimated in the prior works.
GARN: Sampling RNA 3D Structure Space with Game Theory and Knowledge-Based Scoring Strategies.

Science.gov (United States)

Boudard, Mélanie; Bernauer, Julie; Barth, Dominique; Cohen, Johanne; Denise, Alain

2015-01-01

Cellular processes involve large numbers of RNA molecules. The functions of these RNA molecules and their binding to molecular machines are highly dependent on their 3D structures. One of the key challenges in RNA structure prediction and modeling is predicting the spatial arrangement of the various structural elements of RNA. As RNA folding is generally hierarchical, methods involving coarse-grained models hold great promise for this purpose. We present here a novel coarse-grained method for sampling, based on game theory and knowledge-based potentials. This strategy, GARN (Game Algorithm for RNa sampling), is often much faster than previously described techniques and generates large sets of solutions closely resembling the native structure. GARN is thus a suitable starting point for the molecular modeling of large RNAs, particularly those with experimental constraints. GARN is available from: http://garn.lri.fr/.
Stochastic sampling of the RNA structural alignment space.

Science.gov (United States)

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H

2009-07-01

A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
Predicting Drug-Target Interactions Based on Small Positive Samples.

Science.gov (United States)

Hu, Pengwei; Chan, Keith C C; Hu, Yanxing

2018-01-01

A basic task in drug discovery is to find new medication in the form of candidate compounds that act on a target protein. In other words, a drug has to interact with a target and such drug-target interaction (DTI) is not expected to be random. Significant and interesting patterns are expected to be hidden in them. If these patterns can be discovered, new drugs are expected to be more easily discoverable. Currently, a number of computational methods have been proposed to predict DTIs based on their similarity. However, such as approach does not allow biochemical features to be directly considered. As a result, some methods have been proposed to try to discover patterns in physicochemical interactions. Since the number of potential negative DTIs are very high both in absolute terms and in comparison to that of the known ones, these methods are rather computationally expensive and they can only rely on subsets, rather than the full set, of negative DTIs for training and validation. As there is always a relatively high chance for negative DTIs to be falsely identified and as only partial subset of such DTIs is considered, existing approaches can be further improved to better predict DTIs. In this paper, we present a novel approach, called ODT (one class drug target interaction prediction), for such purpose. One main task of ODT is to discover association patterns between interacting drugs and proteins from the chemical structure of the former and the protein sequence network of the latter. ODT does so in two phases. First, the DTI-network is transformed to a representation by structural properties. Second, it applies a oneclass classification algorithm to build a prediction model based only on known positive interactions. We compared the best AUROC scores of the ODT with several state-of-art approaches on Gold standard data. The prediction accuracy of the ODT is superior in comparison with all the other methods at GPCRs dataset and Ion channels dataset. Performance
Knowledge base and neural network approach for protein secondary structure prediction.

Science.gov (United States)

Patel, Maulika S; Mazumdar, Himanshu S

2014-11-21

Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels

Science.gov (United States)

Dral, Pavlo O.; Owens, Alec; Yurchenko, Sergei N.; Thiel, Walter

2017-06-01

We present an efficient approach for generating highly accurate molecular potential energy surfaces (PESs) using self-correcting, kernel ridge regression (KRR) based machine learning (ML). We introduce structure-based sampling to automatically assign nuclear configurations from a pre-defined grid to the training and prediction sets, respectively. Accurate high-level ab initio energies are required only for the points in the training set, while the energies for the remaining points are provided by the ML model with negligible computational cost. The proposed sampling procedure is shown to be superior to random sampling and also eliminates the need for training several ML models. Self-correcting machine learning has been implemented such that each additional layer corrects errors from the previous layer. The performance of our approach is demonstrated in a case study on a published high-level ab initio PES of methyl chloride with 44 819 points. The ML model is trained on sets of different sizes and then used to predict the energies for tens of thousands of nuclear configurations within seconds. The resulting datasets are utilized in variational calculations of the vibrational energy levels of CH3Cl. By using both structure-based sampling and self-correction, the size of the training set can be kept small (e.g., 10% of the points) without any significant loss of accuracy. In ab initio rovibrational spectroscopy, it is thus possible to reduce the number of computationally costly electronic structure calculations through structure-based sampling and self-correcting KRR-based machine learning by up to 90%.
A sampling-based computational strategy for the representation of epistemic uncertainty in model predictions with evidence theory.

Energy Technology Data Exchange (ETDEWEB)

Johnson, J. D. (Prostat, Mesa, AZ); Oberkampf, William Louis; Helton, Jon Craig (Arizona State University, Tempe, AZ); Storlie, Curtis B. (North Carolina State University, Raleigh, NC)

2006-10-01

Evidence theory provides an alternative to probability theory for the representation of epistemic uncertainty in model predictions that derives from epistemic uncertainty in model inputs, where the descriptor epistemic is used to indicate uncertainty that derives from a lack of knowledge with respect to the appropriate values to use for various inputs to the model. The potential benefit, and hence appeal, of evidence theory is that it allows a less restrictive specification of uncertainty than is possible within the axiomatic structure on which probability theory is based. Unfortunately, the propagation of an evidence theory representation for uncertainty through a model is more computationally demanding than the propagation of a probabilistic representation for uncertainty, with this difficulty constituting a serious obstacle to the use of evidence theory in the representation of uncertainty in predictions obtained from computationally intensive models. This presentation describes and illustrates a sampling-based computational strategy for the representation of epistemic uncertainty in model predictions with evidence theory. Preliminary trials indicate that the presented strategy can be used to propagate uncertainty representations based on evidence theory in analysis situations where naive sampling-based (i.e., unsophisticated Monte Carlo) procedures are impracticable due to computational cost.
Structural protein descriptors in 1-dimension and their sequence-based predictions.

Science.gov (United States)

Kurgan, Lukasz; Disfani, Fatemeh Miri

2011-09-01

The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.
Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

Directory of Open Access Journals (Sweden)

M.M. Mohie El-Din

2011-10-01

Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.
Analysis of energy-based algorithms for RNA secondary structure prediction

Directory of Open Access Journals (Sweden)

Hajiaghayi Monir

2012-02-01

Full Text Available Abstract Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA or pseudo-expected accuracy (pseudo-MEA methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms
Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model.

Science.gov (United States)

Xin, Jingzhou; Zhou, Jianting; Yang, Simon X; Li, Xiaoqing; Wang, Yu

2018-01-19

Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing
Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model

Directory of Open Access Journals (Sweden)

Jingzhou Xin

2018-01-01

Full Text Available Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA, and generalized autoregressive conditional heteroskedasticity (GARCH. Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS deformation monitoring system demonstrated that: (1 the Kalman filter is capable of denoising the bridge deformation monitoring data; (2 the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3 in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity; the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data
Predicting sexual infidelity in a population-based sample of married individuals.

Science.gov (United States)

Whisman, Mark A; Gordon, Kristina Coop; Chatav, Yael

2007-06-01

Predictors of 12-month prevalence of sexual infidelity were examined in a population-based sample of married individuals (N = 2,291). Predictor variables were organized in terms of involved-partner (e.g., personality, religiosity), marital (e.g., marital dissatisfaction, partner affair), and extradyadic (e.g., parenting) variables. Annual prevalence of infidelity was 2.3%. Controlling for marital dissatisfaction and demographic variables, infidelity was predicted by greater neuroticism and lower religiosity; wives' pregnancy also increased the risk of infidelity for husbands. In comparison, self-esteem and partners' suspected affair were predictive of infidelity when controlling for demographic variables but were not uniquely predictive of infidelity when also controlling for marital dissatisfaction. Religiosity and wives' pregnancy moderated the association between marital dissatisfaction and infidelity.
Protein Function Prediction Based on Sequence and Structure Information

KAUST Repository

Smaili, Fatima Z.

2016-05-25

The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

Cow-specific diet digestibility predictions based on near-infrared reflectance spectroscopy scans of faecal samples.

Science.gov (United States)

Mehtiö, T; Rinne, M; Nyholm, L; Mäntysaari, P; Sairanen, A; Mäntysaari, E A; Pitkänen, T; Lidauer, M H

2016-04-01

This study was designed to obtain information on prediction of diet digestibility from near-infrared reflectance spectroscopy (NIRS) scans of faecal spot samples from dairy cows at different stages of lactation and to develop a faecal sampling protocol. NIRS was used to predict diet organic matter digestibility (OMD) and indigestible neutral detergent fibre content (iNDF) from faecal samples, and dry matter digestibility (DMD) using iNDF in feed and faecal samples as an internal marker. Acid-insoluble ash (AIA) as an internal digestibility marker was used as a reference method to evaluate the reliability of NIRS predictions. Feed and composite faecal samples were collected from 44 cows at approximately 50, 150 and 250 days in milk (DIM). The estimated standard deviation for cow-specific organic matter digestibility analysed by AIA was 12.3 g/kg, which is small considering that the average was 724 g/kg. The phenotypic correlation between direct faecal OMD prediction by NIRS and OMD by AIA over the lactation was 0.51. The low repeatability and small variability estimates for direct OMD predictions by NIRS were not accurate enough to quantify small differences in OMD between cows. In contrast to OMD, the repeatability estimates for DMD by iNDF and especially for direct faecal iNDF predictions were 0.32 and 0.46, respectively, indicating that developing of NIRS predictions for cow-specific digestibility is possible. A data subset of 20 cows with daily individual faecal samples was used to develop an on-farm sampling protocol. Based on the assessment of correlations between individual sample combinations and composite samples as well as repeatability estimates for individual sample combinations, we found that collecting up to three individual samples yields a representative composite sample. Collection of samples from all the cows of a herd every third month might be a good choice, because it would yield a better accuracy. © 2015 Blackwell Verlag GmbH.
Predictive Methods for Dense Polymer Networks: Combating Bias with Bio-Based Structures

Science.gov (United States)

2016-03-16

Combating bias with bio - based structures 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Andrew J. Guenthner...unlimited. PA Clearance 16152 Integrity  Service  Excellence Predictive methods for dense polymer networks: Combating bias with bio -based...Architectural Bias • Comparison of Petroleum-Based and Bio -Based Chemical Architectures • Continuing Research on Structure-Property Relationships using
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

Science.gov (United States)

Zheng, Ce; Kurgan, Lukasz

2008-10-10

beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

Directory of Open Access Journals (Sweden)

Kurgan Lukasz

2008-10-01

Full Text Available Abstract Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM values serve as an input to the support vector machine (SVM predictor. Results We show that (1 all four predicted secondary structures are useful; (2 the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3 the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an
Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction.

Science.gov (United States)

de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M

2018-04-01

Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Introducing etch kernels for efficient pattern sampling and etch bias prediction

Science.gov (United States)

Weisbuch, François; Lutich, Andrey; Schatz, Jirka

2018-01-01

Successful patterning requires good control of the photolithography and etch processes. While compact litho models, mainly based on rigorous physics, can predict very well the contours printed in photoresist, pure empirical etch models are less accurate and more unstable. Compact etch models are based on geometrical kernels to compute the litho-etch biases that measure the distance between litho and etch contours. The definition of the kernels, as well as the choice of calibration patterns, is critical to get a robust etch model. This work proposes to define a set of independent and anisotropic etch kernels-"internal, external, curvature, Gaussian, z_profile"-designed to represent the finest details of the resist geometry to characterize precisely the etch bias at any point along a resist contour. By evaluating the etch kernels on various structures, it is possible to map their etch signatures in a multidimensional space and analyze them to find an optimal sampling of structures. The etch kernels evaluated on these structures were combined with experimental etch bias derived from scanning electron microscope contours to train artificial neural networks to predict etch bias. The method applied to contact and line/space layers shows an improvement in etch model prediction accuracy over standard etch model. This work emphasizes the importance of the etch kernel definition to characterize and predict complex etch effects.
Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

Science.gov (United States)

Rysavy, Steven J; Beck, David AC; Daggett, Valerie

2014-01-01

Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412
Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

Science.gov (United States)

Rysavy, Steven J; Beck, David A C; Daggett, Valerie

2014-11-01

Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.
MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

Science.gov (United States)

Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

2018-05-08

Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on
Sampling Realistic Protein Conformations Using Local Structural Bias

DEFF Research Database (Denmark)

Hamelryck, Thomas Wim; Kent, John T.; Krogh, A.

2006-01-01

The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which...... are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded...... in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications...
Prediction of molecular crystal structures

International Nuclear Information System (INIS)

Beyer, Theresa

2001-01-01

The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol -1 of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy
Prediction of molecular crystal structures

Energy Technology Data Exchange (ETDEWEB)

Beyer, Theresa

2001-07-01

The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol{sup -1} of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy
Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

Science.gov (United States)

Jelínek, Jan; Škoda, Petr; Hoksza, David

2017-12-06

Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

Science.gov (United States)

Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

2011-08-01

Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

Directory of Open Access Journals (Sweden)

Yunyun Liang

2015-01-01

Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
LoopIng: a template-based tool for predicting the structure of protein loops.

KAUST Repository

Messih, Mario Abdel

2015-08-06

Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop).www.biocomputing.it/loopinganna.tramontano@uniroma1.itSupplementary data are available at Bioinformatics online.
Structural prediction in aphasia

Directory of Open Access Journals (Sweden)

Tessa Warren

2015-05-01

had varying levels of sentence-comprehension impairment read sentences where an upcoming disjunction either could (1b or could not (1a be predicted, based on the presence of either (Staub & Clifton, 2006; see Figure 1 for example. If either spurs PWA to predict an upcoming disjunction and this prediction is facilitative, then reading times on the or and second disjunct (or a beautiful portrait should be faster in the Either condition than in the No Either condition. Results confirmed this prediction (see Figure 1; β=352, t=2.36. The magnitude of this facilitation was related to overall language-impairment severity on the Comprehensive Aphasia Test (CAT: Swinburn, et al., 2004: PWA with milder language impairments showed more facilitation for either than PWA with more severe language impairments (r=.594, p.05. This finding represents strong and novel evidence that PWA can use a lexical cue to predict the structural form of upcoming material during comprehension. However, the lack of relation between these PWA’s degree of structural facilitation and their sentence comprehension ability may indicate that structural predictions could speed reading without improving comprehension.
Kaolin Quality Prediction from Samples: A Bayesian Network Approach

International Nuclear Information System (INIS)

Rivas, T.; Taboada, J.; Ordonez, C.; Matias, J. M.

2009-01-01

We describe the results of an expert system applied to the evaluation of samples of kaolin for industrial use in paper or ceramic manufacture. Different machine learning techniques - classification trees, support vector machines and Bayesian networks - were applied with the aim of evaluating and comparing their interpretability and prediction capacities. The predictive capacity of these models for the samples analyzed was highly satisfactory, both for ceramic quality and paper quality. However, Bayesian networks generally proved to be the most useful technique for our study, as this approach combines good predictive capacity with excellent interpretability of the kaolin quality structure, as it graphically represents relationships between variables and facilitates what-if analyses.
Gaussian process based intelligent sampling for measuring nano-structure surfaces

Science.gov (United States)

Sun, L. J.; Ren, M. J.; Yin, Y. H.

2016-09-01

Nanotechnology is the science and engineering that manipulate matters at nano scale, which can be used to create many new materials and devices with a vast range of applications. As the nanotech product increasingly enters the commercial marketplace, nanometrology becomes a stringent and enabling technology for the manipulation and the quality control of the nanotechnology. However, many measuring instruments, for instance scanning probe microscopy, are limited to relatively small area of hundreds of micrometers with very low efficiency. Therefore some intelligent sampling strategies should be required to improve the scanning efficiency for measuring large area. This paper presents a Gaussian process based intelligent sampling method to address this problem. The method makes use of Gaussian process based Bayesian regression as a mathematical foundation to represent the surface geometry, and the posterior estimation of Gaussian process is computed by combining the prior probability distribution with the maximum likelihood function. Then each sampling point is adaptively selected by determining the position which is the most likely outside of the required tolerance zone among the candidates and then inserted to update the model iteratively. Both simulationson the nominal surface and manufactured surface have been conducted on nano-structure surfaces to verify the validity of the proposed method. The results imply that the proposed method significantly improves the measurement efficiency in measuring large area structured surfaces.
Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

Directory of Open Access Journals (Sweden)

Taigang Liu

2015-12-01

Full Text Available The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE. These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

Structure-based prediction of subtype selectivity of histamine H3 receptor selective antagonists in clinical trials.

Science.gov (United States)

Kim, Soo-Kyung; Fristrup, Peter; Abrol, Ravinder; Goddard, William A

2011-12-27

Histamine receptors (HRs) are excellent drug targets for the treatment of diseases, such as schizophrenia, psychosis, depression, migraine, allergies, asthma, ulcers, and hypertension. Among them, the human H(3) histamine receptor (hH(3)HR) antagonists have been proposed for specific therapeutic applications, including treatment of Alzheimer's disease, attention deficit hyperactivity disorder (ADHD), epilepsy, and obesity. However, many of these drug candidates cause undesired side effects through the cross-reactivity with other histamine receptor subtypes. In order to develop improved selectivity and activity for such treatments, it would be useful to have the three-dimensional structures for all four HRs. We report here the predicted structures of four HR subtypes (H(1), H(2), H(3), and H(4)) using the GEnSeMBLE (GPCR ensemble of structures in membrane bilayer environment) Monte Carlo protocol, sampling ∼35 million combinations of helix packings to predict the 10 most stable packings for each of the four subtypes. Then we used these 10 best protein structures with the DarwinDock Monte Carlo protocol to sample ∼50 000 × 10(20) poses to predict the optimum ligand-protein structures for various agonists and antagonists. We find that E206(5.46) contributes most in binding H(3) selective agonists (5, 6, 7) in agreement with experimental mutation studies. We also find that conserved E5.46/S5.43 in both of hH(3)HR and hH(4)HR are involved in H(3)/ H(4) subtype selectivity. In addition, we find that M378(6.55) in hH(3)HR provides additional hydrophobic interactions different from hH(4)HR (the corresponding amino acid of T323(6.55) in hH(4)HR) to provide additional subtype bias. From these studies, we developed a pharmacophore model based on our predictions for known hH(3)HR selective antagonists in clinical study [ABT-239 1, GSK-189,254 2, PF-3654746 3, and BF2.649 (tiprolisant) 4] that suggests critical selectivity directing elements are: the basic proton
Protein structure based prediction of catalytic residues.

Science.gov (United States)

Fajardo, J Eduardo; Fiser, Andras

2013-02-22

Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

Directory of Open Access Journals (Sweden)

Alexander P. Kartun-Giles

2018-04-01

Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.
Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

Science.gov (United States)

Ain, Qurrat Ul; Aleksandrova, Antoniya; Roessler, Florian D; Ballester, Pedro J

2015-01-01

Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?

Science.gov (United States)

Skolnick, Jeffrey; Zhou, Hongyi

2017-04-20

Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
Adaptive Neuro-Fuzzy Inference System Models for Force Prediction of a Mechatronic Flexible Structure

DEFF Research Database (Denmark)

Achiche, S.; Shlechtingen, M.; Raison, M.

2016-01-01

This paper presents the results obtained from a research work investigating the performance of different Adaptive Neuro-Fuzzy Inference System (ANFIS) models developed to predict excitation forces on a dynamically loaded flexible structure. For this purpose, a flexible structure is equipped...... obtained from applying a random excitation force on the flexible structure. The performance of the developed models is evaluated by analyzing the prediction capabilities based on a normalized prediction error. The frequency domain is considered to analyze the similarity of the frequencies in the predicted...... of the sampling frequency and sensor location on the model performance is investigated. The results obtained in this paper show that ANFIS models can be used to set up reliable force predictors for dynamical loaded flexible structures, when a certain degree of inaccuracy is accepted. Furthermore, the comparison...
Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

Directory of Open Access Journals (Sweden)

Lees Jonathan G

2008-01-01

Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.
RNA-SSPT: RNA Secondary Structure Prediction Tools.

Science.gov (United States)

Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

2013-01-01

The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.
Structure-based function prediction of the expanding mollusk tyrosinase family

Science.gov (United States)

Huang, Ronglian; Li, Li; Zhang, Guofan

2017-11-01

Tyrosinase (Ty) is a common enzyme found in many different animal groups. In our previous study, genome sequencing revealed that the Ty family is expanded in the Pacific oyster ( Crassostrea gigas). Here, we examine the larger number of Ty family members in the Pacific oyster by high-level structure prediction to obtain more information about their function and evolution, especially the unknown role in biomineralization. We verified 12 Ty gene sequences from Crassostrea gigas genome and Pinctada fucata martensii transcriptome. By using phylogenetic analysis of these Tys with functionally known Tys from other molluscan species, eight subgroups were identified (CgTy_s1, CgTy_s2, MolTy_s1, MolTy-s2, MolTy-s3, PinTy-s1, PinTy-s2 and PviTy). Structural data and surface pockets of the dinuclear copper center in the eight subgroups of molluscan Ty were obtained using the latest versions of prediction online servers. Structural comparison with other Ty proteins from the protein databank revealed functionally important residues (HA1, HA2, HA3, HB1, HB2, HB3, Z1-Z9) and their location within these protein structures. The structural and chemical features of these pockets which may related to the substrate binding showed considerable variability among mollusks, which undoubtedly defines Ty substrate binding. Finally, we discuss the potential driving forces of Ty family evolution in mollusks. Based on these observations, we conclude that the Ty family has rapidly evolved as a consequence of substrate adaptation in mollusks.
Parallel protein secondary structure prediction based on neural networks.

Science.gov (United States)

Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

2004-01-01

Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

Directory of Open Access Journals (Sweden)

Huiying Zhao

Full Text Available As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions. A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC of 0.77 with high precision (94% and high sensitivity (65%. We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA] is available as an on-line server at http://sparks-lab.org.
Improving the accuracy of protein secondary structure prediction using structural alignment

Directory of Open Access Journals (Sweden)

Gallin Warren J

2006-06-01

Full Text Available Abstract Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3 of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences, the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25% onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs
UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

Science.gov (United States)

Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam

2018-04-30

A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.
Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

Science.gov (United States)

Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

2017-06-01

The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
Prediction of Seismic Damage-Based Degradation in RC Structures

DEFF Research Database (Denmark)

Kirkegaard, Poul Henning; Gupta, Vinay K.; Nielsen, Søren R.K.

Estimation of structural damage from known increase in the fundamental period of a structure after an earthquake or prediction of degradation of stiffness and strength for known damage requires reliable correlations between these response functionals. This study proposes a modified Clough-Johnsto...
Modeling and simulation of adaptive Neuro-fuzzy based intelligent system for predictive stabilization in structured overlay networks

Directory of Open Access Journals (Sweden)

Ramanpreet Kaur

2017-02-01

Full Text Available Intelligent prediction of neighboring node (k well defined neighbors as specified by the dht protocol dynamism is helpful to improve the resilience and can reduce the overhead associated with topology maintenance of structured overlay networks. The dynamic behavior of overlay nodes depends on many factors such as underlying user’s online behavior, geographical position, time of the day, day of the week etc. as reported in many applications. We can exploit these characteristics for efficient maintenance of structured overlay networks by implementing an intelligent predictive framework for setting stabilization parameters appropriately. Considering the fact that human driven behavior usually goes beyond intermittent availability patterns, we use a hybrid Neuro-fuzzy based predictor to enhance the accuracy of the predictions. In this paper, we discuss our predictive stabilization approach, implement Neuro-fuzzy based prediction in MATLAB simulation and apply this predictive stabilization model in a chord based overlay network using OverSim as a simulation tool. The MATLAB simulation results present that the behavior of neighboring nodes is predictable to a large extent as indicated by the very small RMSE. The OverSim based simulation results also observe significant improvements in the performance of chord based overlay network in terms of lookup success ratio, lookup hop count and maintenance overhead as compared to periodic stabilization approach.
Predicting cyclohexane/water distribution coefficients for the SAMPL5 challenge using MOSCED and the SMD solvation model

Science.gov (United States)

Diaz-Rodriguez, Sebastian; Bozada, Samantha M.; Phifer, Jeremy R.; Paluch, Andrew S.

2016-11-01

We present blind predictions using the solubility parameter based method MOSCED submitted for the SAMPL5 challenge on calculating cyclohexane/water distribution coefficients at 298 K. Reference data to parameterize MOSCED was generated with knowledge only of chemical structure by performing solvation free energy calculations using electronic structure calculations in the SMD continuum solvent. To maintain simplicity and use only a single method, we approximate the distribution coefficient with the partition coefficient of the neutral species. Over the final SAMPL5 set of 53 compounds, we achieved an average unsigned error of 2.2± 0.2 log units (ranking 15 out of 62 entries), the correlation coefficient ( R) was 0.6± 0.1 (ranking 35), and 72± 6 % of the predictions had the correct sign (ranking 30). While used here to predict cyclohexane/water distribution coefficients at 298 K, MOSCED is broadly applicable, allowing one to predict temperature dependent infinite dilution activity coefficients in any solvent for which parameters exist, and provides a means by which an excess Gibbs free energy model may be parameterized to predict composition dependent phase-equilibrium.
Tchebichef image moment approach to the prediction of protein secondary structures based on circular dichroism.

Science.gov (United States)

Li, Sha Sha; Li, Bao Qiong; Liu, Jin Jin; Lu, Shao Hua; Zhai, Hong Lin

2018-04-20

Circular dichroism (CD) spectroscopy is a widely used technique for the evaluation of protein secondary structures that has a significant impact for the understanding of molecular biology. However, the quantitative analysis of protein secondary structures based on CD spectra is still a hard work due to the serious overlap of the spectra corresponding to different structural motifs. Here, Tchebichef image moment (TM) approach is introduced for the first time, which can effectively extract the chemical features in CD spectra for the quantitative analysis of protein secondary structures. The proposed approach was applied to analyze reference set. and the obtained results were evaluated by the strict statistical parameters such as correlation coefficient, cross-validation correlation coefficient and root mean squared error. Compared with several specialized prediction methods, TM approach provided satisfactory results, especially for turns and unordered structures. Our study indicates that TM approach can be regarded as a feasible tool for the analysis of the secondary structures of proteins based on CD spectra. An available TMs package is provided and can be used directly for secondary structures prediction. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

Science.gov (United States)

Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S

2015-11-13

The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science
Quantifying predictability through information theory: small sample estimation in a non-Gaussian framework

International Nuclear Information System (INIS)

Haven, Kyle; Majda, Andrew; Abramov, Rafail

2005-01-01

Many situations in complex systems require quantitative estimates of the lack of information in one probability distribution relative to another. In short term climate and weather prediction, examples of these issues might involve the lack of information in the historical climate record compared with an ensemble prediction, or the lack of information in a particular Gaussian ensemble prediction strategy involving the first and second moments compared with the non-Gaussian ensemble itself. The relative entropy is a natural way to quantify the predictive utility in this information, and recently a systematic computationally feasible hierarchical framework has been developed. In practical systems with many degrees of freedom, computational overhead limits ensemble predictions to relatively small sample sizes. Here the notion of predictive utility, in a relative entropy framework, is extended to small random samples by the definition of a sample utility, a measure of the unlikeliness that a random sample was produced by a given prediction strategy. The sample utility is the minimum predictability, with a statistical level of confidence, which is implied by the data. Two practical algorithms for measuring such a sample utility are developed here. The first technique is based on the statistical method of null-hypothesis testing, while the second is based upon a central limit theorem for the relative entropy of moment-based probability densities. These techniques are tested on known probability densities with parameterized bimodality and skewness, and then applied to the Lorenz '96 model, a recently developed 'toy' climate model with chaotic dynamics mimicking the atmosphere. The results show a detection of non-Gaussian tendencies of prediction densities at small ensemble sizes with between 50 and 100 members, with a 95% confidence level

Knowledge-based Fragment Binding Prediction

Science.gov (United States)

Tang, Grace W.; Altman, Russ B.

2014-01-01

Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971
Facilitating RNA structure prediction with microarrays.

Science.gov (United States)

Kierzek, Elzbieta; Kierzek, Ryszard; Turner, Douglas H; Catrina, Irina E

2006-01-17

Determining RNA secondary structure is important for understanding structure-function relationships and identifying potential drug targets. This paper reports the use of microarrays with heptamer 2'-O-methyl oligoribonucleotides to probe the secondary structure of an RNA and thereby improve the prediction of that secondary structure. When experimental constraints from hybridization results are added to a free-energy minimization algorithm, the prediction of the secondary structure of Escherichia coli 5S rRNA improves from 27 to 92% of the known canonical base pairs. Optimization of buffer conditions for hybridization and application of 2'-O-methyl-2-thiouridine to enhance binding and improve discrimination between AU and GU pairs are also described. The results suggest that probing RNA with oligonucleotide microarrays can facilitate determination of secondary structure.
Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

DEFF Research Database (Denmark)

Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna

2014-01-01

Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful...... applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min...... on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together....
Support vector regression to predict porosity and permeability: Effect of sample size

Science.gov (United States)

Al-Anazi, A. F.; Gates, I. D.

2012-02-01

Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function
Chemical structure-based predictive model for methanogenic anaerobic biodegradation potential.

Science.gov (United States)

Meylan, William; Boethling, Robert; Aronson, Dallas; Howard, Philip; Tunkel, Jay

2007-09-01

Many screening-level models exist for predicting aerobic biodegradation potential from chemical structure, but anaerobic biodegradation generally has been ignored by modelers. We used a fragment contribution approach to develop a model for predicting biodegradation potential under methanogenic anaerobic conditions. The new model has 37 fragments (substructures) and classifies a substance as either fast or slow, relative to the potential to be biodegraded in the "serum bottle" anaerobic biodegradation screening test (Organization for Economic Cooperation and Development Guideline 311). The model correctly classified 90, 77, and 91% of the chemicals in the training set (n = 169) and two independent validation sets (n = 35 and 23), respectively. Accuracy of predictions of fast and slow degradation was equal for training-set chemicals, but fast-degradation predictions were less accurate than slow-degradation predictions for the validation sets. Analysis of the signs of the fragment coefficients for this and the other (aerobic) Biowin models suggests that in the context of simple group contribution models, the majority of positive and negative structural influences on ultimate degradation are the same for aerobic and methanogenic anaerobic biodegradation.
Algorithms for Protein Structure Prediction

DEFF Research Database (Denmark)

Paluszewski, Martin

-trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...
A class-based link prediction using Distance Dependent Chinese Restaurant Process

Science.gov (United States)

Andalib, Azam; Babamir, Seyed Morteza

2016-08-01

One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.
Research on test of product based on spatial sampling criteria and variable step sampling mechanism

Science.gov (United States)

Li, Ruihong; Han, Yueping

2014-09-01

This paper presents an effective approach for online testing the assembly structures inside products using multiple views technique and X-ray digital radiography system based on spatial sampling criteria and variable step sampling mechanism. Although there are some objects inside one product to be tested, there must be a maximal rotary step for an object within which the least structural size to be tested is predictable. In offline learning process, Rotating the object by the step and imaging it and so on until a complete cycle is completed, an image sequence is obtained that includes the full structural information for recognition. The maximal rotary step is restricted by the least structural size and the inherent resolution of the imaging system. During online inspection process, the program firstly finds the optimum solutions to all different target parts in the standard sequence, i.e., finds their exact angles in one cycle. Aiming at the issue of most sizes of other targets in product are larger than that of the least structure, the paper adopts variable step-size sampling mechanism to rotate the product specific angles with different steps according to different objects inside the product and match. Experimental results show that the variable step-size method can greatly save time compared with the traditional fixed-step inspection method while the recognition accuracy is guaranteed.
RNA secondary structure prediction using soft computing.

Science.gov (United States)

Ray, Shubhra Sankar; Pal, Sankar K

2013-01-01

Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
Vfold: a web server for RNA structure and folding thermodynamics prediction.

Science.gov (United States)

Xu, Xiaojun; Zhao, Peinan; Chen, Shi-Jie

2014-01-01

The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu".
Structured prediction models for RNN based sequence labeling in clinical text.

Science.gov (United States)

Jagannatha, Abhyuday N; Yu, Hong

2016-11-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Data-Based Predictive Control with Multirate Prediction Step

Science.gov (United States)

Barlow, Jonathan S.

2010-01-01

Data-based predictive control is an emerging control method that stems from Model Predictive Control (MPC). MPC computes current control action based on a prediction of the system output a number of time steps into the future and is generally derived from a known model of the system. Data-based predictive control has the advantage of deriving predictive models and controller gains from input-output data. Thus, a controller can be designed from the outputs of complex simulation code or a physical system where no explicit model exists. If the output data happens to be corrupted by periodic disturbances, the designed controller will also have the built-in ability to reject these disturbances without the need to know them. When data-based predictive control is implemented online, it becomes a version of adaptive control. One challenge of MPC is computational requirements increasing with prediction horizon length. This paper develops a closed-loop dynamic output feedback controller that minimizes a multi-step-ahead receding-horizon cost function with multirate prediction step. One result is a reduced influence of prediction horizon and the number of system outputs on the computational requirements of the controller. Another result is an emphasis on portions of the prediction window that are sampled more frequently. A third result is the ability to include more outputs in the feedback path than in the cost function.
Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces

International Nuclear Information System (INIS)

Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

2011-01-01

The vast majority of the chores in the living cell involve protein–protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein–protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations
Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

KAUST Repository

Marcatili, Paolo

2014-11-06

© 2014 Nature America, Inc. All rights reserved. Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (~10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.
Characteristics and Prediction of RNA Structure

Directory of Open Access Journals (Sweden)

Hengwu Li

2014-01-01

Full Text Available RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding.
Predictive Models of Primary Tropical Forest Structure from Geomorphometric Variables Based on SRTM in the Tapajós Region, Brazilian Amazon.

Science.gov (United States)

Bispo, Polyanna da Conceição; Dos Santos, João Roberto; Valeriano, Márcio de Morisson; Graça, Paulo Maurício Lima de Alencastro; Balzter, Heiko; França, Helena; Bispo, Pitágoras da Conceição

2016-01-01

Surveying primary tropical forest over large regions is challenging. Indirect methods of relating terrain information or other external spatial datasets to forest biophysical parameters can provide forest structural maps at large scales but the inherent uncertainties need to be evaluated fully. The goal of the present study was to evaluate relief characteristics, measured through geomorphometric variables, as predictors of forest structural characteristics such as average tree basal area (BA) and height (H) and average percentage canopy openness (CO). Our hypothesis is that geomorphometric variables are good predictors of the structure of primary tropical forest, even in areas, with low altitude variation. The study was performed at the Tapajós National Forest, located in the Western State of Pará, Brazil. Forty-three plots were sampled. Predictive models for BA, H and CO were parameterized based on geomorphometric variables using multiple linear regression. Validation of the models with nine independent sample plots revealed a Root Mean Square Error (RMSE) of 3.73 m2/ha (20%) for BA, 1.70 m (12%) for H, and 1.78% (21%) for CO. The coefficient of determination between observed and predicted values were r2 = 0.32 for CO, r2 = 0.26 for H and r2 = 0.52 for BA. The models obtained were able to adequately estimate BA and CO. In summary, it can be concluded that relief variables are good predictors of vegetation structure and enable the creation of forest structure maps in primary tropical rainforest with an acceptable uncertainty.
Evolutionary rate variation and RNA secondary structure prediction

DEFF Research Database (Denmark)

Knudsen, B.; Andersen, E.S.; Damgaard, C.

2004-01-01

Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type...... by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions...
Structure-based methods to predict mutational resistance to diarylpyrimidine non-nucleoside reverse transcriptase inhibitors.

Science.gov (United States)

Azeem, Syeda Maryam; Muwonge, Alecia N; Thakkar, Nehaben; Lam, Kristina W; Frey, Kathleen M

2018-01-01

Resistance to non-nucleoside reverse transcriptase inhibitors (NNRTIs) is a leading cause of HIV treatment failure. Often included in antiviral therapy, NNRTIs are chemically diverse compounds that bind an allosteric pocket of enzyme target reverse transcriptase (RT). Several new NNRTIs incorporate flexibility in order to compensate for lost interactions with amino acid conferring mutations in RT. Unfortunately, even successful inhibitors such as diarylpyrimidine (DAPY) inhibitor rilpivirine are affected by mutations in RT that confer resistance. In order to aid drug design efforts, it would be efficient and cost effective to pre-evaluate NNRTI compounds in development using a structure-based computational approach. As proof of concept, we applied a residue scan and molecular dynamics strategy using RT crystal structures to predict mutations that confer resistance to DAPYs rilpivirine, etravirine, and investigational microbicide dapivirine. Our predictive values, changes in affinity and stability, are correlative with fold-resistance data for several RT mutants. Consistent with previous studies, mutation K101P is predicted to confer high-level resistance to DAPYs. These findings were further validated using structural analysis, molecular dynamics, and an enzymatic reverse transcription assay. Our results confirm that changes in affinity and stability for mutant complexes are predictive parameters of resistance as validated by experimental and clinical data. In future work, we believe that this computational approach may be useful to predict resistance mutations for inhibitors in development. Published by Elsevier Inc.
Effects prediction guidelines for structures subjected to ground motion

International Nuclear Information System (INIS)

1975-07-01

Part of the planning for an underground nuclear explosion (UNE) is determining the effects of expected ground motion on exposed structures. Because of the many types of structures and the wide variation in ground motion intensity typically encountered, no single prediction method is both adequate and feasible for a complete evaluation. Furthermore, the nature and variability of ground motion and structure damage prescribe effects predictions that are made probabilistically. Initially, prediction for a UNE involves a preliminary assessment of damage to establish overall project feasibility. Subsequent efforts require more detailed damage evaluations, based on structure inventories and analyses of specific structures, so that safety problems can be identified and safety and remedial measures can be recommended. To cover this broad range of effects prediction needs for a typical UNE project, three distinct but interrelated methods have been developed and are described. First, the fundamental practical and theoretical aspects of predicting the effects of dynamic ground motion on structures are summarized. Next, experimentally derived and theoretically determined observations of the behavior of typical structures subjected to ground motion are presented. Then, based on these fundamental considerations and on the observed behavior of structures, the formulation of the three effects prediction procedures is described, along with guidelines regarding their applicability. Example damage predictions for hypothetical UNEs demonstrate these procedures. To aid in identifying the vibration properties of complex structures, one chapter discusses alternatives in vibration testing, instrumentation, and data analysis. Finally, operational guidelines regarding data acquisition procedures, safety criteria, and remedial measures involved in conducting structure effects evaluations are discussed. (U.S.)
A model to predict radon exhalation from walls to indoor air based on the exhalation from building material samples

International Nuclear Information System (INIS)

Sahoo, B.K.; Sapra, B.K.; Gaware, J.J.; Kanse, S.D.; Mayya, Y.S.

2011-01-01

In recognition of the fact that building materials are an important source of indoor radon, second only to soil, surface radon exhalation fluxes have been extensively measured from the samples of these materials. Based on this flux data, several researchers have attempted to predict the inhalation dose attributable to radon emitted from walls and ceilings made up of these materials. However, an important aspect not considered in this methodology is the enhancement of the radon flux from the wall or the ceiling constructed using the same building material. This enhancement occurs mainly because of the change in the radon diffusion process from the former to the latter configuration. To predict the true radon flux from the wall based on the flux data of building material samples, we now propose a semi-empirical model involving radon diffusion length and the physical dimensions of the samples as well as wall thickness as other input parameters. This model has been established by statistically fitting the ratio of the solution to radon diffusion equations for the cases of three-dimensional cuboidal shaped building materials (such as brick, concrete block) and one dimensional wall system to a simple mathematical function. The model predictions have been validated against the measurements made at a new construction site. This model provides an alternative tool (substitute to conventional 1-D model) to estimate radon flux from a wall without relying on 226 Ra content, radon emanation factor and bulk density of the samples. Moreover, it may be very useful in the context of developing building codes for radon regulation in new buildings. - Research highlights: → A model is proposed to predict radon flux from wall using flux of building material. → It is established based on the diffusion mechanism in building material and wall. → Study showed a large difference in radon flux from building material and wall. → Model has been validated against the measurements made at

Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs.

Directory of Open Access Journals (Sweden)

Michael F Sloma

2017-11-01

Full Text Available Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package.
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs.

Science.gov (United States)

Sloma, Michael F; Mathews, David H

2017-11-01

Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package.
Feature-Based and String-Based Models for Predicting RNA-Protein Interaction

Directory of Open Access Journals (Sweden)

Donald Adjeroh

2018-03-01

Full Text Available In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI. In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences, and structure information (protein and RNA secondary structures. This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.
A Kernel for Protein Secondary Structure Prediction

OpenAIRE

Guermeur , Yann; Lifchitz , Alain; Vert , Régis

2004-01-01

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10338&mode=toc; International audience; Multi-class support vector machines have already proved efficient in protein secondary structure prediction as ensemble methods, to combine the outputs of sets of classifiers based on different principles. In this chapter, their implementation as basic prediction methods, processing the primary structure or the profile of multiple alignments, is investigated. A kernel devoted to the task is in...
SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

Science.gov (United States)

Suresh, V; Parthasarathy, S

2014-01-01

We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
RNAstructure: software for RNA secondary structure prediction and analysis.

Science.gov (United States)

Reuter, Jessica S; Mathews, David H

2010-03-15

To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.
Linguistic Structure Prediction

CERN Document Server

Smith, Noah A

2011-01-01

A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. W
Real-time prediction of atmospheric Lagrangian coherent structures based on forecast data: An application and error analysis

Science.gov (United States)

BozorgMagham, Amir E.; Ross, Shane D.; Schmale, David G.

2013-09-01

The language of Lagrangian coherent structures (LCSs) provides a new means for studying transport and mixing of passive particles advected by an atmospheric flow field. Recent observations suggest that LCSs govern the large-scale atmospheric motion of airborne microorganisms, paving the way for more efficient models and management strategies for the spread of infectious diseases affecting plants, domestic animals, and humans. In addition, having reliable predictions of the timing of hyperbolic LCSs may contribute to improved aerobiological sampling of microorganisms with unmanned aerial vehicles and LCS-based early warning systems. Chaotic atmospheric dynamics lead to unavoidable forecasting errors in the wind velocity field, which compounds errors in LCS forecasting. In this study, we reveal the cumulative effects of errors of (short-term) wind field forecasts on the finite-time Lyapunov exponent (FTLE) fields and the associated LCSs when realistic forecast plans impose certain limits on the forecasting parameters. Objectives of this paper are to (a) quantify the accuracy of prediction of FTLE-LCS features and (b) determine the sensitivity of such predictions to forecasting parameters. Results indicate that forecasts of attracting LCSs exhibit less divergence from the archive-based LCSs than the repelling features. This result is important since attracting LCSs are the backbone of long-lived features in moving fluids. We also show under what circumstances one can trust the forecast results if one merely wants to know if an LCS passed over a region and does not need to precisely know the passage time.
Using Data Mining Approaches for Force Prediction of a Dynamically Loaded Flexible Structure

DEFF Research Database (Denmark)

Schlechtingen, Meik; Achiche, Sofiane; Lourenco Costa, Tiago

2014-01-01

-deterministic excitation forces with different excitation frequencies and amplitudes. Additionally, the influence of the sampling frequency and sensor location on the model performance is investigated. The results obtained in this paper show that most data mining approaches can be used, when a certain degree of inaccuracy...... of freedom and a force transducer for validation and training. The models are trained using data obtained from applying a random excitation force on the flexible structure. The performance of the developed models is evaluated by analyzing the prediction capabilities based on a normalized prediction error...
Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.

Science.gov (United States)

Ramani, Anantharaman; Wong, Yongxun; Tan, Si Zhen; Shue, Bing Hong; Syn, Christopher

2017-11-01

The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G ST =0.085) than that reported in ALFRED (F ST =0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio
A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction.

Science.gov (United States)

Haider, Saad; Rahman, Raziur; Ghosh, Souparno; Pal, Ranadip

2015-01-01

Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.
Prediction of Individual Response to Electroconvulsive Therapy via Machine Learning on Structural Magnetic Resonance Imaging Data.

Science.gov (United States)

Redlich, Ronny; Opel, Nils; Grotegerd, Dominik; Dohm, Katharina; Zaremba, Dario; Bürger, Christian; Münker, Sandra; Mühlmann, Lisa; Wahl, Patricia; Heindel, Walter; Arolt, Volker; Alferink, Judith; Zwanzger, Peter; Zavorotnyy, Maxim; Kugel, Harald; Dannlowski, Udo

2016-06-01

Electroconvulsive therapy (ECT) is one of the most effective treatments for severe depression. However, biomarkers that accurately predict a response to ECT remain unidentified. To investigate whether certain factors identified by structural magnetic resonance imaging (MRI) techniques are able to predict ECT response. In this nonrandomized prospective study, gray matter structure was assessed twice at approximately 6 weeks apart using 3-T MRI and voxel-based morphometry. Patients were recruited through the inpatient service of the Department of Psychiatry, University of Muenster, from March 11, 2010, to March 27, 2015. Two patient groups with acute major depressive disorder were included. One group received an ECT series in addition to antidepressants (n = 24); a comparison sample was treated solely with antidepressants (n = 23). Both groups were compared with a sample of healthy control participants (n = 21). Binary pattern classification was used to predict ECT response by structural MRI that was performed before treatment. In addition, univariate analysis was conducted to predict reduction of the Hamilton Depression Rating Scale score by pretreatment gray matter volumes and to investigate ECT-related structural changes. One participant in the ECT sample was excluded from the analysis, leaving 67 participants (27 men and 40 women; mean [SD] age, 43.7 [10.6] years). The binary pattern classification yielded a successful prediction of ECT response, with accuracy rates of 78.3% (18 of 23 patients in the ECT sample) and sensitivity rates of 100% (13 of 13 who responded to ECT). Furthermore, a support vector regression yielded a significant prediction of relative reduction in the Hamilton Depression Rating Scale score. The principal findings of the univariate model indicated a positive association between pretreatment subgenual cingulate volume and individual ECT response (Montreal Neurological Institute [MNI] coordinates x = 8, y = 21, z = -18
Sampling Enrichment toward Target Structures Using Hybrid Molecular Dynamics-Monte Carlo Simulations.

Directory of Open Access Journals (Sweden)

Kecheng Yang

Full Text Available Sampling enrichment toward a target state, an analogue of the improvement of sampling efficiency (SE, is critical in both the refinement of protein structures and the generation of near-native structure ensembles for the exploration of structure-function relationships. We developed a hybrid molecular dynamics (MD-Monte Carlo (MC approach to enrich the sampling toward the target structures. In this approach, the higher SE is achieved by perturbing the conventional MD simulations with a MC structure-acceptance judgment, which is based on the coincidence degree of small angle x-ray scattering (SAXS intensity profiles between the simulation structures and the target structure. We found that the hybrid simulations could significantly improve SE by making the top-ranked models much closer to the target structures both in the secondary and tertiary structures. Specifically, for the 20 mono-residue peptides, when the initial structures had the root-mean-squared deviation (RMSD from the target structure smaller than 7 Å, the hybrid MD-MC simulations afforded, on average, 0.83 Å and 1.73 Å in RMSD closer to the target than the parallel MD simulations at 310K and 370K, respectively. Meanwhile, the average SE values are also increased by 13.2% and 15.7%. The enrichment of sampling becomes more significant when the target states are gradually detectable in the MD-MC simulations in comparison with the parallel MD simulations, and provide >200% improvement in SE. We also performed a test of the hybrid MD-MC approach in the real protein system, the results showed that the SE for 3 out of 5 real proteins are improved. Overall, this work presents an efficient way of utilizing solution SAXS to improve protein structure prediction and refinement, as well as the generation of near native structures for function annotation.
Critical Features of Fragment Libraries for Protein Structure Prediction.

Science.gov (United States)

Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

2017-01-01

The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Prediction of the Secondary Structure of HIV-1 gp120

DEFF Research Database (Denmark)

Hansen, Jan; Lund, Ole; Nielsen, Jens O.

1996-01-01

Fourier transform infrared spectroscopy. The predicted secondary structure of gp120 compared well with data from NMR analysis of synthetic peptides from the V3 loop and the C4 region. As a first step towards modeling the tertiary structure of gp120, the predicted secondary structure may guide the design......The secondary structure of HIV-1 gp120 was predicted using multiple alignment and a combination of two independent methods based on neural network and nearest-neighbor algorithms. The methods agreed on the secondary structure for 80% of the residues in BH10 gp120. Six helices were predicted in HIV...
[Effects of sampling plot number on tree species distribution prediction under climate change].

Science.gov (United States)

Liang, Yu; He, Hong-Shi; Wu, Zhi-Wei; Li, Xiao-Na; Luo, Xu

2013-05-01

Based on the neutral landscapes under different degrees of landscape fragmentation, this paper studied the effects of sampling plot number on the prediction of tree species distribution at landscape scale under climate change. The tree species distribution was predicted by the coupled modeling approach which linked an ecosystem process model with a forest landscape model, and three contingent scenarios and one reference scenario of sampling plot numbers were assumed. The differences between the three scenarios and the reference scenario under different degrees of landscape fragmentation were tested. The results indicated that the effects of sampling plot number on the prediction of tree species distribution depended on the tree species life history attributes. For the generalist species, the prediction of their distribution at landscape scale needed more plots. Except for the extreme specialist, landscape fragmentation degree also affected the effects of sampling plot number on the prediction. With the increase of simulation period, the effects of sampling plot number on the prediction of tree species distribution at landscape scale could be changed. For generalist species, more plots are needed for the long-term simulation.
Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.

Energy Technology Data Exchange (ETDEWEB)

Matulef, Kevin Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-02-01

The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewer resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.
Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

Science.gov (United States)

Fukunishi, Yoshifumi

2010-01-01

For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.
Copula based prediction models: an application to an aortic regurgitation study

Directory of Open Access Journals (Sweden)

Shoukri Mohamed M

2007-06-01

Full Text Available Abstract Background: An important issue in prediction modeling of multivariate data is the measure of dependence structure. The use of Pearson's correlation as a dependence measure has several pitfalls and hence application of regression prediction models based on this correlation may not be an appropriate methodology. As an alternative, a copula based methodology for prediction modeling and an algorithm to simulate data are proposed. Methods: The method consists of introducing copulas as an alternative to the correlation coefficient commonly used as a measure of dependence. An algorithm based on the marginal distributions of random variables is applied to construct the Archimedean copulas. Monte Carlo simulations are carried out to replicate datasets, estimate prediction model parameters and validate them using Lin's concordance measure. Results: We have carried out a correlation-based regression analysis on data from 20 patients aged 17–82 years on pre-operative and post-operative ejection fractions after surgery and estimated the prediction model: Post-operative ejection fraction = - 0.0658 + 0.8403 (Pre-operative ejection fraction; p = 0.0008; 95% confidence interval of the slope coefficient (0.3998, 1.2808. From the exploratory data analysis, it is noted that both the pre-operative and post-operative ejection fractions measurements have slight departures from symmetry and are skewed to the left. It is also noted that the measurements tend to be widely spread and have shorter tails compared to normal distribution. Therefore predictions made from the correlation-based model corresponding to the pre-operative ejection fraction measurements in the lower range may not be accurate. Further it is found that the best approximated marginal distributions of pre-operative and post-operative ejection fractions (using q-q plots are gamma distributions. The copula based prediction model is estimated as: Post -operative ejection fraction = - 0.0933 + 0
Neural Networks for protein Structure Prediction

DEFF Research Database (Denmark)

Bohr, Henrik

1998-01-01

This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

Science.gov (United States)

Masso, Majid; Vaisman, Iosif I

2008-09-15

Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.
LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

DEFF Research Database (Denmark)

Will, Sebastian; Joshi, Tejal; Hofacker, Ivo L.

2012-01-01

Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing...... methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based...... on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used nc...
Structural MRI-Based Predictions in Patients with Treatment-Refractory Depression (TRD.

Directory of Open Access Journals (Sweden)

Blair A Johnston

Full Text Available The application of machine learning techniques to psychiatric neuroimaging offers the possibility to identify robust, reliable and objective disease biomarkers both within and between contemporary syndromal diagnoses that could guide routine clinical practice. The use of quantitative methods to identify psychiatric biomarkers is consequently important, particularly with a view to making predictions relevant to individual patients, rather than at a group-level. Here, we describe predictions of treatment-refractory depression (TRD diagnosis using structural T1-weighted brain scans obtained from twenty adult participants with TRD and 21 never depressed controls. We report 85% accuracy of individual subject diagnostic prediction. Using an automated feature selection method, the major brain regions supporting this significant classification were in the caudate, insula, habenula and periventricular grey matter. It was not, however, possible to predict the degree of 'treatment resistance' in individual patients, at least as quantified by the Massachusetts General Hospital (MGH-S clinical staging method; but the insula was again identified as a region of interest. Structural brain imaging data alone can be used to predict diagnostic status, but not MGH-S staging, with a high degree of accuracy in patients with TRD.
PSPP: a protein structure prediction pipeline for computing clusters.

Directory of Open Access Journals (Sweden)

Michael S Lee

2009-07-01

Full Text Available Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes.The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform
Kernel density estimation-based real-time prediction for respiratory motion

International Nuclear Information System (INIS)

Ruan, Dan

2010-01-01

Effective delivery of adaptive radiotherapy requires locating the target with high precision in real time. System latency caused by data acquisition, streaming, processing and delivery control necessitates prediction. Prediction is particularly challenging for highly mobile targets such as thoracic and abdominal tumors undergoing respiration-induced motion. The complexity of the respiratory motion makes it difficult to build and justify explicit models. In this study, we honor the intrinsic uncertainties in respiratory motion and propose a statistical treatment of the prediction problem. Instead of asking for a deterministic covariate-response map and a unique estimate value for future target position, we aim to obtain a distribution of the future target position (response variable) conditioned on the observed historical sample values (covariate variable). The key idea is to estimate the joint probability distribution (pdf) of the covariate and response variables using an efficient kernel density estimation method. Then, the problem of identifying the distribution of the future target position reduces to identifying the section in the joint pdf based on the observed covariate. Subsequently, estimators are derived based on this estimated conditional distribution. This probabilistic perspective has some distinctive advantages over existing deterministic schemes: (1) it is compatible with potentially inconsistent training samples, i.e., when close covariate variables correspond to dramatically different response values; (2) it is not restricted by any prior structural assumption on the map between the covariate and the response; (3) the two-stage setup allows much freedom in choosing statistical estimates and provides a full nonparametric description of the uncertainty for the resulting estimate. We evaluated the prediction performance on ten patient RPM traces, using the root mean squared difference between the prediction and the observed value normalized by the
Predicting Successful Aging in a Population-Based Sample of Georgia Centenarians

Directory of Open Access Journals (Sweden)

Jonathan Arnold

2010-01-01

Full Text Available Used a population-based sample (Georgia Centenarian Study, GCS, to determine proportions of centenarians reaching 100 years as (1 survivors (43% of chronic diseases first experienced between 0–80 years of age, (2 delayers (36% with chronic diseases first experienced between 80–98 years of age, or (3 escapers (17% with chronic diseases only at 98 years of age or older. Diseases fall into two morbidity profiles of 11 chronic diseases; one including cardiovascular disease, cancer, anemia, and osteoporosis, and another including dementia. Centenarians at risk for cancer in their lifetime tended to be escapers (73%, while those at risk for cardiovascular disease tended to be survivors (24%, delayers (39%, or escapers (32%. Approximately half (43% of the centenarians did not experience dementia. Psychiatric disorders were positively associated with dementia, but prevalence of depression, anxiety, and psychoses did not differ significantly between centenarians and an octogenarian control group. However, centenarians were higher on the Geriatric Depression Scale (GDS than octogenarians. Consistent with our model of developmental adaptation in aging, distal life events contribute to predicting survivorship outcome in which health status as survivor, delayer, or escaper appears as adaptation variables late in life.
Predicting Successful Aging in a Population-Based Sample of Georgia Centenarians

Science.gov (United States)

Arnold, Jonathan; Dai, Jianliang; Nahapetyan, Lusine; Arte, Ankit; Johnson, Mary Ann; Hausman, Dorothy; Rodgers, Willard L.; Hensley, Robert; Martin, Peter; MacDonald, Maurice; Davey, Adam; Siegler, Ilene C.; Jazwinski, S. Michal; Poon, Leonard W.

2010-01-01

Used a population-based sample (Georgia Centenarian Study, GCS), to determine proportions of centenarians reaching 100 years as (1) survivors (43%) of chronic diseases first experienced between 0–80 years of age, (2) delayers (36%) with chronic diseases first experienced between 80–98 years of age, or (3) escapers (17%) with chronic diseases only at 98 years of age or older. Diseases fall into two morbidity profiles of 11 chronic diseases; one including cardiovascular disease, cancer, anemia, and osteoporosis, and another including dementia. Centenarians at risk for cancer in their lifetime tended to be escapers (73%), while those at risk for cardiovascular disease tended to be survivors (24%), delayers (39%), or escapers (32%). Approximately half (43%) of the centenarians did not experience dementia. Psychiatric disorders were positively associated with dementia, but prevalence of depression, anxiety, and psychoses did not differ significantly between centenarians and an octogenarian control group. However, centenarians were higher on the Geriatric Depression Scale (GDS) than octogenarians. Consistent with our model of developmental adaptation in aging, distal life events contribute to predicting survivorship outcome in which health status as survivor, delayer, or escaper appears as adaptation variables late in life. PMID:20885919
Identification of proteomic biomarkers predicting prostate cancer aggressiveness and lethality despite biopsy-sampling error.

Science.gov (United States)

Shipitsin, M; Small, C; Choudhury, S; Giladi, E; Friedlander, S; Nardone, J; Hussain, S; Hurley, A D; Ernst, C; Huang, Y E; Chang, H; Nifong, T P; Rimm, D L; Dunyak, J; Loda, M; Berman, D M; Blume-Jensen, P

2014-09-09

Key challenges of biopsy-based determination of prostate cancer aggressiveness include tumour heterogeneity, biopsy-sampling error, and variations in biopsy interpretation. The resulting uncertainty in risk assessment leads to significant overtreatment, with associated costs and morbidity. We developed a performance-based strategy to identify protein biomarkers predictive of prostate cancer aggressiveness and lethality regardless of biopsy-sampling variation. Prostatectomy samples from a large patient cohort with long follow-up were blindly assessed by expert pathologists who identified the tissue regions with the highest and lowest Gleason grade from each patient. To simulate biopsy-sampling error, a core from a high- and a low-Gleason area from each patient sample was used to generate a 'high' and a 'low' tumour microarray, respectively. Using a quantitative proteomics approach, we identified from 160 candidates 12 biomarkers that predicted prostate cancer aggressiveness (surgical Gleason and TNM stage) and lethal outcome robustly in both high- and low-Gleason areas. Conversely, a previously reported lethal outcome-predictive marker signature for prostatectomy tissue was unable to perform under circumstances of maximal sampling error. Our results have important implications for cancer biomarker discovery in general and development of a sampling error-resistant clinical biopsy test for prediction of prostate cancer aggressiveness.
Protein structure prediction using bee colony optimization metaheuristic

DEFF Research Database (Denmark)

Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

2010-01-01

of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... our BCO method to generate good solutions to the protein structure prediction problem. The results show that BCO generally ¿nds better solutions than simulated annealing which so far has been the metaheuristic of choice for this problem....
Prediction of Global Damage and Reliability Based Upon Sequential Identification and Updating of RC Structures Subject to Earthquakes

DEFF Research Database (Denmark)

Nielsen, Søren R.K.; Skjærbæk, P. S.; Köylüoglu, H. U.

The paper deals with the prediction of global damage and future structural reliability with special emphasis on sensitivity, bias and uncertainty of these predictions dependent on the statistically equivalent realizations of the future earthquake. The predictions are based on a modified Clough......-Johnston single-degree-of-freedom (SDOF) oscillator with three parameters which are calibrated to fit the displacement response and the damage development in the past earthquake....
Model-Based Prediction of Pulsed Eddy Current Testing Signals from Stratified Conductive Structures

International Nuclear Information System (INIS)

Zhang, Jian Hai; Song, Sung Jin; Kim, Woong Ji; Kim, Hak Joon; Chung, Jong Duk

2011-01-01

Excitation and propagation of electromagnetic field of a cylindrical coil above an arbitrary number of conductive plates for pulsed eddy current testing(PECT) are very complex problems due to their complicated physical properties. In this paper, analytical modeling of PECT is established by Fourier series based on truncated region eigenfunction expansion(TREE) method for a single air-cored coil above stratified conductive structures(SCS) to investigate their integrity. From the presented expression of PECT, the coil impedance due to SCS is calculated based on analytical approach using the generalized reflection coefficient in series form. Then the multilayered structures manufactured by non-ferromagnetic (STS301L) and ferromagnetic materials (SS400) are investigated by the developed PECT model. Good prediction of analytical model of PECT not only contributes to the development of an efficient solver but also can be applied to optimize the conditions of experimental setup in PECT
RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

Science.gov (United States)

Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

2018-01-01

RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.
MicroRNA prediction using a fixed-order Markov model based on the secondary structure pattern.

Directory of Open Access Journals (Sweden)

Wei Shen

Full Text Available Predicting miRNAs is an arduous task, due to the diversity of the precursors and complexity of enzyme processes. Although several prediction approaches have reached impressive performances, few of them could achieve a full-function recognition of mature miRNA directly from the candidate hairpins across species. Therefore, researchers continue to seek a more powerful model close to biological recognition to miRNA structure. In this report, we describe a novel miRNA prediction algorithm, known as FOMmiR, using a fixed-order Markov model based on the secondary structural pattern. For a training dataset containing 809 human pre-miRNAs and 6441 human pseudo-miRNA hairpins, the model's parameters were defined and evaluated. The results showed that FOMmiR reached 91% accuracy on the human dataset through 5-fold cross-validation. Moreover, for the independent test datasets, the FOMmiR presented an outstanding prediction in human and other species including vertebrates, Drosophila, worms and viruses, even plants, in contrast to the well-known algorithms and models. Especially, the FOMmiR was not only able to distinguish the miRNA precursors from the hairpins, but also locate the position and strand of the mature miRNA. Therefore, this study provides a new generation of miRNA prediction algorithm, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences. And it presents a new understanding of the biological recognition based on the strongest signal's location detected by FOMmiR, which might be closely associated with the enzyme cleavage mechanism during the miRNA maturation.
Application of Functional Use Predictions to Aid in Structure ...

Science.gov (United States)

Humans are potentially exposed to thousands of anthropogenic chemicals in commerce. Recent work has shown that the bulk of this exposure may occur in near-field indoor environments (e.g., home, school, work, etc.). Advances in suspect screening analyses (SSA) now allow an improved understanding of the chemicals present in these environments. However, due to the nature of suspect screening techniques, investigators are often left with chemical formula predictions, with the possibility of many chemical structures matching to each formula. Here, newly developed quantitative structure-use relationship (QSUR) models are used to identify potential exposure sources for candidate structures. Previously, a suspect screening workflow was introduced and applied to house dust samples collected from the U.S. Department of Housing and Urban Development’s American Healthy Homes Survey (AHHS) [Rager, et al., Env. Int. 88 (2016)]. This workflow utilized the US EPA’s Distributed Structure-Searchable Toxicity (DSSTox) Database to link identified molecular features to molecular formulas, and ultimately chemical structures. Multiple QSUR models were applied to support the evaluation of candidate structures. These QSURs predict the likelihood of a chemical having a functional use commonly associated with consumer products having near-field use. For 3,228 structures identified as possible chemicals in AHHS house dust samples, we were able to obtain the required descriptors to appl
Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.

Science.gov (United States)

Balfer, Jenny; Hu, Ye; Bajorath, Jürgen

2014-08-01

Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Predicting sample size required for classification performance

Directory of Open Access Journals (Sweden)

Figueroa Rosa L

2012-02-01

Full Text Available Abstract Background Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. Methods We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. Results A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p Conclusions This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.
Predicting the required number of training samples. [for remotely sensed image data based on covariance matrix estimate quality criterion of normal distribution

Science.gov (United States)

Kalayeh, H. M.; Landgrebe, D. A.

1983-01-01

A criterion which measures the quality of the estimate of the covariance matrix of a multivariate normal distribution is developed. Based on this criterion, the necessary number of training samples is predicted. Experimental results which are used as a guide for determining the number of training samples are included. Previously announced in STAR as N82-28109
Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

Science.gov (United States)

Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

2009-04-21

In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.
Sampling from complex networks with high community structures.

Science.gov (United States)

Salehi, Mostafa; Rabiee, Hamid R; Rajabi, Arezo

2012-06-01

In this paper, we propose a novel link-tracing sampling algorithm, based on the concepts from PageRank vectors, to sample from networks with high community structures. Our method has two phases; (1) Sampling the closest nodes to the initial nodes by approximating personalized PageRank vectors and (2) Jumping to a new community by using PageRank vectors and unknown neighbors. Empirical studies on several synthetic and real-world networks show that the proposed method improves the performance of network sampling compared to the popular link-based sampling methods in terms of accuracy and visited communities.
Exploiting the Past and the Future in Protein Secondary Structure Prediction

DEFF Research Database (Denmark)

Baldi, Pierre; Brunak, Søren; Frasconi, P

1999-01-01

predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary......Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network...

Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness

Science.gov (United States)

Conomos, Matthew P.; Miller, Mike; Thornton, Timothy

2016-01-01

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. PMID:25810074
Effects of sample size on robustness and prediction accuracy of a prognostic gene signature

Directory of Open Access Journals (Sweden)

Kim Seon-Young

2009-05-01

Full Text Available Abstract Background Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. Results A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+ patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures Conclusion Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study.
Advances in Rosetta structure prediction for difficult molecular-replacement problems

International Nuclear Information System (INIS)

DiMaio, Frank

2013-01-01

Modeling advances using Rosetta structure prediction to aid in solving difficult molecular-replacement problems are discussed. Recent work has shown the effectiveness of structure-prediction methods in solving difficult molecular-replacement problems. The Rosetta protein structure modeling suite can aid in the solution of difficult molecular-replacement problems using templates from 15 to 25% sequence identity; Rosetta refinement guided by noisy density has consistently led to solved structures where other methods fail. In this paper, an overview of the use of Rosetta for these difficult molecular-replacement problems is provided and new modeling developments that further improve model quality are described. Several variations to the method are introduced that significantly reduce the time needed to generate a model and the sampling required to improve the starting template. The improvements are benchmarked on a set of nine difficult cases and it is shown that this improved method obtains consistently better models in less running time. Finally, strategies for best using Rosetta to solve difficult molecular-replacement problems are presented and future directions for the role of structure-prediction methods in crystallography are discussed
Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens

Directory of Open Access Journals (Sweden)

Gomez Shawn M

2010-04-01

Full Text Available Abstract Background In the course of infection, viruses such as HIV-1 must enter a cell, travel to sites where they can hijack host machinery to transcribe their genes and translate their proteins, assemble, and then leave the cell again, all while evading the host immune system. Thus, successful infection depends on the pathogen's ability to manipulate the biological pathways and processes of the organism it infects. Interactions between HIV-encoded and human proteins provide one means by which HIV-1 can connect into cellular pathways to carry out these survival processes. Results We developed and applied a computational approach to predict interactions between HIV and human proteins based on structural similarity of 9 HIV-1 proteins to human proteins having known interactions. Using functional data from RNAi studies as a filter, we generated over 2000 interaction predictions between HIV proteins and 406 unique human proteins. Additional filtering based on Gene Ontology cellular component annotation reduced the number of predictions to 502 interactions involving 137 human proteins. We find numerous known interactions as well as novel interactions showing significant functional relevance based on supporting Gene Ontology and literature evidence. Conclusions Understanding the interplay between HIV-1 and its human host will help in understanding the viral lifecycle and the ways in which this virus is able to manipulate its host. The results shown here provide a potential set of interactions that are amenable to further experimental manipulation as well as potential targets for therapeutic intervention.
Predicting sample lifetimes in creep fracture of heterogeneous materials

Science.gov (United States)

Koivisto, Juha; Ovaska, Markus; Miksic, Amandine; Laurson, Lasse; Alava, Mikko J.

2016-08-01

Materials flow—under creep or constant loads—and, finally, fail. The prediction of sample lifetimes is an important and highly challenging problem because of the inherently heterogeneous nature of most materials that results in large sample-to-sample lifetime fluctuations, even under the same conditions. We study creep deformation of paper sheets as one heterogeneous material and thus show how to predict lifetimes of individual samples by exploiting the "universal" features in the sample-inherent creep curves, particularly the passage to an accelerating creep rate. Using simulations of a viscoelastic fiber bundle model, we illustrate how deformation localization controls the shape of the creep curve and thus the degree of lifetime predictability.
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

Science.gov (United States)

Lee, Hui Sun; Im, Wonpil

2017-01-01

Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
(PS)2: protein structure prediction server version 3.0.

Science.gov (United States)

Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

2015-07-01

Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

Science.gov (United States)

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Building a better fragment library for de novo protein structure prediction.

Directory of Open Access Journals (Sweden)

Saulo H P de Oliveira

Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".
Building a Better Fragment Library for De Novo Protein Structure Prediction

Science.gov (United States)

de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

2015-01-01

Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595
PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

Directory of Open Access Journals (Sweden)

Aboul-Magd Mohammed O

2009-07-01

Full Text Available Abstract Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures from primary sequence data which makes use of Parallel Cascade Identification (PCI, a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input
Prediction of RNA secondary structure using generalized centroid estimators.

Science.gov (United States)

Hamada, Michiaki; Kiryu, Hisanori; Sato, Kengo; Mituyama, Toutai; Asai, Kiyoshi

2009-02-15

Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics. Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/.
Ordering decision-making methods on spare parts for a new aircraft fleet based on a two-sample prediction

International Nuclear Information System (INIS)

Yongquan, Sun; Xi, Chen; He, Ren; Yingchao, Jin; Quanwu, Liu

2016-01-01

Ordering decision-making on spare parts is crucial in maximizing aircraft utilization and minimizing total operating cost. Extensive researches on spare parts inventory management and optimal allocation could be found based on the amount of historical operation data or condition-monitoring data. However, it is challengeable to make an ordering decision on spare parts under the case of establishment of a fleet by introducing new aircraft with little historical data. In this paper, spare parts supporting policy and ordering decision-making policy for new aircraft fleet are analyzed firstly. Then two-sample predictions for a Weibull distribution and a Weibull process are incorporated into forecast of the first failure time and failure number during certain time period using Bayesian and classical method respectively, according to which the ordering time and ordering quantity for spare parts are identified. Finally, a case study is presented to illustrate the methods of identifying the ordering time and ordering number of engine-driven pumps through forecasting the failure time and failure number, followed by a discussion on the impact of various fleet sizes on prediction results. This method has the potential to decide the ordering time and quantity of spare parts when a new aircraft fleet is established. - Highlights: • A modeling framework of ordering spare parts for a new fleet is proposed. • Models for ordering time and number are established based on two-sample prediction. • The computation of future failure time is simplified using Newtonian binomial law. • Comparison of the first failure time PDFs is used to identify process parameters. • Identification methods for spare parts are validated by Engine Driven Pump case study.
Size-based predictions of food web patterns

DEFF Research Database (Denmark)

Zhang, Lai; Hartvig, Martin; Knudsen, Kim

2014-01-01

We employ size-based theoretical arguments to derive simple analytic predictions of ecological patterns and properties of natural communities: size-spectrum exponent, maximum trophic level, and susceptibility to invasive species. The predictions are brought about by assuming that an infinite number...... of species are continuously distributed on a size-trait axis. It is, however, an open question whether such predictions are valid for a food web with a finite number of species embedded in a network structure. We address this question by comparing the size-based predictions to results from dynamic food web...... simulations with varying species richness. To this end, we develop a new size- and trait-based food web model that can be simplified into an analytically solvable size-based model. We confirm existing solutions for the size distribution and derive novel predictions for maximum trophic level and invasion...
Child Support Payment: A Structural Model of Predictive Variables.

Science.gov (United States)

Wright, David W.; Price, Sharon J.

A major area of concern in divorced families is compliance with child support payments. Aspects of the former spouse relationship that are predictive of compliance with court-ordered payment of child support were investigated in a sample of 58 divorced persons all of whom either paid or received child support. Structured interviews and…
The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

Science.gov (United States)

Suzuki, Tomonori; Miyazaki, Satoru

2011-01-01

Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.
Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

Science.gov (United States)

Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

2018-01-01

T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95
Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology.

Science.gov (United States)

Lee, A; Szpiro, A; Kim, S Y; Sheppard, L

2015-06-01

Preferential sampling has been defined in the context of geostatistical modeling as the dependence between the sampling locations and the process that describes the spatial structure of the data. It can occur when networks are designed to find high values. For example, in networks based on the U.S. Clean Air Act monitors are sited to determine whether air quality standards are exceeded. We study the impact of the design of monitor networks in the context of air pollution epidemiology studies. The effect of preferential sampling has been illustrated in the literature by highlighting its impact on spatial predictions. In this paper, we use these predictions as input in a second stage analysis, and we assess how they affect health effect inference. Our work is motivated by data from two United States regulatory networks and health data from the Multi-Ethnic Study of Atherosclerosis and Air Pollution. The two networks were designed to monitor air pollution in urban and rural areas respectively, and we found that the health analysis results based on the two networks can lead to different scientific conclusions. We use preferential sampling to gain insight into these differences. We designed a simulation study, and found that the validity and reliability of the health effect estimate can be greatly affected by how we sample the monitor locations. To better understand its effect on second stage inference, we identify two components of preferential sampling that shed light on how preferential sampling alters the properties of the health effect estimate.
Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology

Science.gov (United States)

Lee, A.; Szpiro, A.; Kim, S.Y.; Sheppard, L.

2018-01-01

Summary Preferential sampling has been defined in the context of geostatistical modeling as the dependence between the sampling locations and the process that describes the spatial structure of the data. It can occur when networks are designed to find high values. For example, in networks based on the U.S. Clean Air Act monitors are sited to determine whether air quality standards are exceeded. We study the impact of the design of monitor networks in the context of air pollution epidemiology studies. The effect of preferential sampling has been illustrated in the literature by highlighting its impact on spatial predictions. In this paper, we use these predictions as input in a second stage analysis, and we assess how they affect health effect inference. Our work is motivated by data from two United States regulatory networks and health data from the Multi-Ethnic Study of Atherosclerosis and Air Pollution. The two networks were designed to monitor air pollution in urban and rural areas respectively, and we found that the health analysis results based on the two networks can lead to different scientific conclusions. We use preferential sampling to gain insight into these differences. We designed a simulation study, and found that the validity and reliability of the health effect estimate can be greatly affected by how we sample the monitor locations. To better understand its effect on second stage inference, we identify two components of preferential sampling that shed light on how preferential sampling alters the properties of the health effect estimate. PMID:29576734
Prediction of retention in micellar electrokinetic chromatography based on molecular structural descriptors by using the heuristic method

International Nuclear Information System (INIS)

Liu Huanxiang; Yao Xiaojun; Liu Mancang; Hu Zhide; Fan Botao

2006-01-01

Based on calculated molecular descriptors from the solutes' structure alone, the micelle-water partition coefficients of 103 solutes in micellar electrokinetic chromatography (MEKC) were predicted using the heuristic method (HM). At the same time, in order to show the influence of different molecular descriptors on the micelle-water partition of solute and to well understand the retention mechanism in MEKC, HM was used to build several multivariable linear models using different numbers of molecular descriptors. The best 6-parameter model gave the following results: the square of correlation coefficient R 2 was 0.958 and the mean relative error was 3.98%, which proved that the predictive values were in good agreement with the experimental results. From the built model, it can be concluded that the hydrophobic, H-bond, polar interactions of solutes with the micellar and aqueous phases are the main factors that determine their partitioning behavior. In addition, this paper provided a simple, fast and effective method for predicting the retention of the solutes in MEKC from their structures and gave some insight into structural features related to the retention of the solutes

Excavating past population structures by surname-based sampling: the genetic legacy of the Vikings in northwest England.

Science.gov (United States)

Bowden, Georgina R; Balaresque, Patricia; King, Turi E; Hansen, Ziff; Lee, Andrew C; Pergl-Wilson, Giles; Hurley, Emma; Roberts, Stephen J; Waite, Patrick; Jesch, Judith; Jones, Abigail L; Thomas, Mark G; Harding, Stephen E; Jobling, Mark A

2008-02-01

The genetic structures of past human populations are obscured by recent migrations and expansions and have been observed only indirectly by inference from modern samples. However, the unique link between a heritable cultural marker, the patrilineal surname, and a genetic marker, the Y chromosome, provides a means to target sets of modern individuals that might resemble populations at the time of surname establishment. As a test case, we studied samples from the Wirral Peninsula and West Lancashire, in northwest England. Place-names and archaeology show clear evidence of a past Viking presence, but heavy immigration and population growth since the industrial revolution are likely to have weakened the genetic signal of a 1,000-year-old Scandinavian contribution. Samples ascertained on the basis of 2 generations of residence were compared with independent samples based on known ancestry in the region plus the possession of a surname known from historical records to have been present there in medieval times. The Y-chromosomal haplotypes of these 2 sets of samples are significantly different, and in admixture analyses, the surname-ascertained samples show markedly greater Scandinavian ancestry proportions, supporting the idea that northwest England was once heavily populated by Scandinavian settlers. The method of historical surname-based ascertainment promises to allow investigation of the influence of migration and drift over the last few centuries in changing the population structure of Britain and will have general utility in other regions where surnames are patrilineal and suitable historical records survive.
Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.

Science.gov (United States)

Will, Sebastian; Jabbari, Hosna

2016-01-01

RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free
A semi-supervised learning approach for RNA secondary structure prediction.

Science.gov (United States)

Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki

2015-08-01

RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automated Prediction of Catalytic Mechanism and Rate Law Using Graph-Based Reaction Path Sampling.

Science.gov (United States)

Habershon, Scott

2016-04-12

In a recent article [ J. Chem. Phys. 2015 , 143 , 094106 ], we introduced a novel graph-based sampling scheme which can be used to generate chemical reaction paths in many-atom systems in an efficient and highly automated manner. The main goal of this work is to demonstrate how this approach, when combined with direct kinetic modeling, can be used to determine the mechanism and phenomenological rate law of a complex catalytic cycle, namely cobalt-catalyzed hydroformylation of ethene. Our graph-based sampling scheme generates 31 unique chemical products and 32 unique chemical reaction pathways; these sampled structures and reaction paths enable automated construction of a kinetic network model of the catalytic system when combined with density functional theory (DFT) calculations of free energies and resultant transition-state theory rate constants. Direct simulations of this kinetic network across a range of initial reactant concentrations enables determination of both the reaction mechanism and the associated rate law in an automated fashion, without the need for either presupposing a mechanism or making steady-state approximations in kinetic analysis. Most importantly, we find that the reaction mechanism which emerges from these simulations is exactly that originally proposed by Heck and Breslow; furthermore, the simulated rate law is also consistent with previous experimental and computational studies, exhibiting a complex dependence on carbon monoxide pressure. While the inherent errors of using DFT simulations to model chemical reactivity limit the quantitative accuracy of our calculated rates, this work confirms that our automated simulation strategy enables direct analysis of catalytic mechanisms from first principles.
Bessel beam CARS of axially structured samples

Science.gov (United States)

Heuke, Sandro; Zheng, Juanjuan; Akimov, Denis; Heintzmann, Rainer; Schmitt, Michael; Popp, Jürgen

2015-06-01

We report about a Bessel beam CARS approach for axial profiling of multi-layer structures. This study presents an experimental implementation for the generation of CARS by Bessel beam excitation using only passive optical elements. Furthermore, an analytical expression is provided describing the generated anti-Stokes field by a homogeneous sample. Based on the concept of coherent transfer functions, the underling resolving power of axially structured geometries is investigated. It is found that through the non-linearity of the CARS process in combination with the folded illumination geometry continuous phase-matching is achieved starting from homogeneous samples up to spatial sample frequencies at twice of the pumping electric field wave. The experimental and analytical findings are modeled by the implementation of the Debye Integral and scalar Green function approach. Finally, the goal of reconstructing an axially layered sample is demonstrated on the basis of the numerically simulated modulus and phase of the anti-Stokes far-field radiation pattern.
Crack Growth-Based Predictive Methodology for the Maintenance of the Structural Integrity of Repaired and Nonrepaired Aging Engine Stationary Components

National Research Council Canada - National Science Library

Barron, Michael

1999-01-01

.... Specifically, the FAA's goal was to develop "Crack Growth-Based Predictive Methodologies for the Maintenance of the Structural Integrity of Repaired and Nonrepaired Aging Engine Stationary Components...
Identification of proteomic biomarkers predicting prostate cancer aggressiveness and lethality despite biopsy-sampling error

OpenAIRE

Shipitsin, M; Small, C; Choudhury, S; Giladi, E; Friedlander, S; Nardone, J; Hussain, S; Hurley, A D; Ernst, C; Huang, Y E; Chang, H; Nifong, T P; Rimm, D L; Dunyak, J; Loda, M

2014-01-01

Background: Key challenges of biopsy-based determination of prostate cancer aggressiveness include tumour heterogeneity, biopsy-sampling error, and variations in biopsy interpretation. The resulting uncertainty in risk assessment leads to significant overtreatment, with associated costs and morbidity. We developed a performance-based strategy to identify protein biomarkers predictive of prostate cancer aggressiveness and lethality regardless of biopsy-sampling variation. Methods: Prostatectom...
Detecting determinism with improved sensitivity in time series: rank-based nonlinear predictability score.

Science.gov (United States)

Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G

2014-09-01

The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
New tips for structure prediction by comparative modeling

Science.gov (United States)

Rayan, Anwar

2009-01-01

Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence identity and model quality, we carried out an analysis of a set of 4753 sequence and structure alignments. Throughout this research, the model accuracy was measured by root mean square deviations of Cα atoms of the target-template structures. Surprisingly, the results show that sequence identity of the target protein to the template is not a good descriptor to predict the accuracy of the 3-D structure model. However, in a large number of cases, comparative modelling with lower sequence identity of target to template proteins led to more accurate 3-D structure model. As a consequence of this study, we suggest new tips for improving the quality of omparative models, particularly for models whose target-template sequence identity is below 50%. PMID:19255646
Tooth width predictions in a sample of Black South Africans.

Science.gov (United States)

Khan, M I; Seedat, A K; Hlongwa, P

2007-07-01

Space analysis during the mixed dentition requires prediction of the mesiodistal widths of the unerupted permanent canines and premolars and prediction tables and equations may be used for this purpose. The Tanaka and Johnston prediction equations, which were derived from a North American White sample, is one example which is widely used. This prediction equation may be inapplicable to other race groups due to racial tooth size variability. Therefore the purpose of this study was to derive prediction equations that would be applicable to Black South African subjects. One hundred and ten pre-treatment study casts of Black South African subjects were analysed from the Department of Orthodontics' records at the University of Limpopo. The sample was equally divided by gender with all subjects having Class I molar relationship and relatively well aligned teeth. The mesiodistal widths of the maxillary and mandibular canines and premolars were measured with a digital vernier calliper and compared with the measurements predicted with the Tanaka and Johnston equations. The relationship between the measured and predicted values were analysed by correlation and regression analyses. The results indicated that the Tanaka and Johnston prediction equations were not fully applicable to the Black South African sample. The equations tended to underpredict the male sample, while slight overprediction was observed in the female sample. Therefore, new equations were formulated and proposed that would be accurate for Black subjects.
Nanostructured and nanolayer coatings based on nitrides of the metals structure study and structure and composition standard samples set development

Directory of Open Access Journals (Sweden)

E. B. Chabina

2014-01-01

Full Text Available Researches by methods of analytical microscopy and the x-ray analysis have allowed to develop a set of standard samples of composition and structure of the strengthening nanostructured and nanolayer coatings for control of the strengthening nanostructured and nanolayer coatings based on nitrides of the metals used to protect critical parts of the compressor of the gas turbine engine from dust erosion, corrosion and oxidation.
Conformational sampling in template-free protein loop structure modeling: an overview.

Science.gov (United States)

Li, Yaohang

2013-01-01

Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a "mini protein folding problem" under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.
CONFORMATIONAL SAMPLING IN TEMPLATE-FREE PROTEIN LOOP STRUCTURE MODELING: AN OVERVIEW

Directory of Open Access Journals (Sweden)

Yaohang Li

2013-02-01

Full Text Available Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a “mini protein folding problem” under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.
Drug-target interaction prediction from PSSM based evolutionary information.

Science.gov (United States)

Mousavian, Zaynab; Khakabimamaghani, Sahand; Kavousi, Kaveh; Masoudi-Nejad, Ali

2016-01-01

The labor-intensive and expensive experimental process of drug-target interaction prediction has motivated many researchers to focus on in silico prediction, which leads to the helpful information in supporting the experimental interaction data. Therefore, they have proposed several computational approaches for discovering new drug-target interactions. Several learning-based methods have been increasingly developed which can be categorized into two main groups: similarity-based and feature-based. In this paper, we firstly use the bi-gram features extracted from the Position Specific Scoring Matrix (PSSM) of proteins in predicting drug-target interactions. Our results demonstrate the high-confidence prediction ability of the Bigram-PSSM model in terms of several performance indicators specifically for enzymes and ion channels. Moreover, we investigate the impact of negative selection strategy on the performance of the prediction, which is not widely taken into account in the other relevant studies. This is important, as the number of non-interacting drug-target pairs are usually extremely large in comparison with the number of interacting ones in existing drug-target interaction data. An interesting observation is that different levels of performance reduction have been attained for four datasets when we change the sampling method from the random sampling to the balanced sampling. Copyright © 2015 Elsevier Inc. All rights reserved.
Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles

KAUST Repository

Maadooliat, Mehdi; Gao, Xin; Huang, Jianhua Z.

2012-01-01

Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence-structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu. edu/~madoliat/LagSVD) that can be used to produce informative animations. © The Author 2012. Published by Oxford University Press.
Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles

KAUST Repository

Maadooliat, Mehdi

2012-08-27

Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence-structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu. edu/~madoliat/LagSVD) that can be used to produce informative animations. © The Author 2012. Published by Oxford University Press.
Predicting RNA Structure Using Mutual Information

DEFF Research Database (Denmark)

Freyhult, E.; Moulton, V.; Gardner, P. P.

2005-01-01

, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. Results: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall...... package. Conclusion: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam. Availability: MIfold is freely available from http......Background: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure...
Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

Science.gov (United States)

Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

2017-10-12

Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.
Intelligent sampling for the measurement of structured surfaces

International Nuclear Information System (INIS)

Wang, J; Jiang, X; Blunt, L A; Scott, P J; Leach, R K

2012-01-01

Uniform sampling in metrology has known drawbacks such as coherent spectral aliasing and a lack of efficiency in terms of measuring time and data storage. The requirement for intelligent sampling strategies has been outlined over recent years, particularly where the measurement of structured surfaces is concerned. Most of the present research on intelligent sampling has focused on dimensional metrology using coordinate-measuring machines with little reported on the area of surface metrology. In the research reported here, potential intelligent sampling strategies for surface topography measurement of structured surfaces are investigated by using numerical simulation and experimental verification. The methods include the jittered uniform method, low-discrepancy pattern sampling and several adaptive methods which originate from computer graphics, coordinate metrology and previous research by the authors. By combining the use of advanced reconstruction methods and feature-based characterization techniques, the measurement performance of the sampling methods is studied using case studies. The advantages, stability and feasibility of these techniques for practical measurements are discussed. (paper)
Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and knowledge-based information.

Science.gov (United States)

Karczyńska, Agnieszka S; Mozolewska, Magdalena A; Krupa, Paweł; Giełdoń, Artur; Liwo, Adam; Czaplewski, Cezary

2018-03-01

A new approach to assisted protein-structure prediction has been proposed, which is based on running multiplexed replica exchange molecular dynamics simulations with the coarse-grained UNRES force field with restraints derived from knowledge-based models and distance distribution from small angle X-ray scattering (SAXS) measurements. The latter restraints are incorporated into the target function as a maximum-likelihood term that guides the shape of the simulated structures towards that defined by SAXS. The approach was first verified with the 1KOY protein, for which the distance distribution was calculated from the experimental structure, and subsequently used to predict the structures of 11 data-assisted targets in the CASP12 experiment. Major improvement of the GDT_TS was obtained for 2 targets, minor improvement for other 2 while, for 6 target GDT_TS deteriorated compared with that calculated for predictions without the SAXS data, partly because of assuming a wrong multimeric state (for Ts866) or because the crystal conformation was more compact than the solution conformation (for Ts942). Particularly good results were obtained for Ts909, in which use of SAXS data resulted in the selection of a correctly packed trimer and, subsequently, increased the GDT_TS of monomer prediction. It was found that running simulations with correct oligomeric state is essential for the success in SAXS-data-assisted prediction. © 2017 Wiley Periodicals, Inc.

Predicting Protein Secondary Structure with Markov Models

DEFF Research Database (Denmark)

Fischer, Paul; Larsen, Simon; Thomsen, Claus

2004-01-01

we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....
The influence of sampling unit size and spatial arrangement patterns on neighborhood-based spatial structure analyses of forest stands

Energy Technology Data Exchange (ETDEWEB)

Wang, H.; Zhang, G.; Hui, G.; Li, Y.; Hu, Y.; Zhao, Z.

2016-07-01

Aim of study: Neighborhood-based stand spatial structure parameters can quantify and characterize forest spatial structure effectively. How these neighborhood-based structure parameters are influenced by the selection of different numbers of nearest-neighbor trees is unclear, and there is some disagreement in the literature regarding the appropriate number of nearest-neighbor trees to sample around reference trees. Understanding how to efficiently characterize forest structure is critical for forest management. Area of study: Multi-species uneven-aged forests of Northern China. Material and methods: We simulated stands with different spatial structural characteristics and systematically compared their structure parameters when two to eight neighboring trees were selected. Main results: Results showed that values of uniform angle index calculated in the same stand were different with different sizes of structure unit. When tree species and sizes were completely randomly interspersed, different numbers of neighbors had little influence on mingling and dominance indices. Changes of mingling or dominance indices caused by different numbers of neighbors occurred when the tree species or size classes were not randomly interspersed and their changing characteristics can be detected according to the spatial arrangement patterns of tree species and sizes. Research highlights: The number of neighboring trees selected for analyzing stand spatial structure parameters should be fixed. We proposed that the four-tree structure unit is the best compromise between sampling accuracy and costs for practical forest management. (Author)
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

Science.gov (United States)

Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

2017-09-01

α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD PHP, MySQL and Apache, with all major browsers supported.
Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions

KAUST Repository

Abdelaziz, Ibrahim; Fokoue, Achille; Hassanzadeh, Oktie; Zhang, Ping; Sadoghi, Mohammad

2017-01-01

Drug-Drug Interactions (DDIs) are a major cause of preventable Adverse Drug Reactions (ADRs), causing a significant burden on the patients’ health and the healthcare system. It is widely known that clinical studies cannot sufficiently and accurately identify DDIs for new drugs before they are made available on the market. In addition, existing public and proprietary sources of DDI information are known to be incomplete and/or inaccurate and so not reliable. As a result, there is an emerging body of research on in-silico prediction of drug-drug interactions. In this paper, we present Tiresias, a large-scale similarity-based framework that predicts DDIs through link prediction. Tiresias takes in various sources of drug-related data and knowledge as inputs, and provides DDI predictions as outputs. The process starts with semantic integration of the input data that results in a knowledge graph describing drug attributes and relationships with various related entities such as enzymes, chemical structures, and pathways. The knowledge graph is then used to compute several similarity measures between all the drugs in a scalable and distributed framework. In particular, Tiresias utilizes two classes of features in a knowledge graph: local and global features. Local features are derived from the information directly associated to each drug (i.e., one hop away) while global features are learnt by minimizing a global loss function that considers the complete structure of the knowledge graph. The resulting similarity metrics are used to build features for a large-scale logistic regression model to predict potential DDIs. We highlight the novelty of our proposed Tiresias and perform thorough evaluation of the quality of the predictions. The results show the effectiveness of Tiresias in both predicting new interactions among existing drugs as well as newly developed drugs.
Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions

KAUST Repository

Abdelaziz, Ibrahim

2017-06-12

Drug-Drug Interactions (DDIs) are a major cause of preventable Adverse Drug Reactions (ADRs), causing a significant burden on the patients’ health and the healthcare system. It is widely known that clinical studies cannot sufficiently and accurately identify DDIs for new drugs before they are made available on the market. In addition, existing public and proprietary sources of DDI information are known to be incomplete and/or inaccurate and so not reliable. As a result, there is an emerging body of research on in-silico prediction of drug-drug interactions. In this paper, we present Tiresias, a large-scale similarity-based framework that predicts DDIs through link prediction. Tiresias takes in various sources of drug-related data and knowledge as inputs, and provides DDI predictions as outputs. The process starts with semantic integration of the input data that results in a knowledge graph describing drug attributes and relationships with various related entities such as enzymes, chemical structures, and pathways. The knowledge graph is then used to compute several similarity measures between all the drugs in a scalable and distributed framework. In particular, Tiresias utilizes two classes of features in a knowledge graph: local and global features. Local features are derived from the information directly associated to each drug (i.e., one hop away) while global features are learnt by minimizing a global loss function that considers the complete structure of the knowledge graph. The resulting similarity metrics are used to build features for a large-scale logistic regression model to predict potential DDIs. We highlight the novelty of our proposed Tiresias and perform thorough evaluation of the quality of the predictions. The results show the effectiveness of Tiresias in both predicting new interactions among existing drugs as well as newly developed drugs.
New tips for structure prediction by comparative modeling

OpenAIRE

Rayan, Anwar

2009-01-01

Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence iden...
Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids.

Science.gov (United States)

Raicar, Gaurav; Saini, Harsh; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok

2016-08-07

Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.
TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

Directory of Open Access Journals (Sweden)

Johannes Eichner

Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Towards a unified fatigue life prediction method for marine structures

CERN Document Server

Cui, Weicheng; Wang, Fang

2014-01-01

In order to apply the damage tolerance design philosophy to design marine structures, accurate prediction of fatigue crack growth under service conditions is required. Now, more and more people have realized that only a fatigue life prediction method based on fatigue crack propagation (FCP) theory has the potential to explain various fatigue phenomena observed. In this book, the issues leading towards the development of a unified fatigue life prediction (UFLP) method based on FCP theory are addressed. Based on the philosophy of the UFLP method, the current inconsistency between fatigue design and inspection of marine structures could be resolved. This book presents the state-of-the-art and recent advances, including those by the authors, in fatigue studies. It is designed to lead the future directions and to provide a useful tool in many practical applications. It is intended to address to engineers, naval architects, research staff, professionals and graduates engaged in fatigue prevention design and survey ...
Impact of sampling interval in training data acquisition on intrafractional predictive accuracy of indirect dynamic tumor-tracking radiotherapy.

Science.gov (United States)

Mukumoto, Nobutaka; Nakamura, Mitsuhiro; Akimoto, Mami; Miyabe, Yuki; Yokota, Kenji; Matsuo, Yukinori; Mizowaki, Takashi; Hiraoka, Masahiro

2017-08-01

To explore the effect of sampling interval of training data acquisition on the intrafractional prediction error of surrogate signal-based dynamic tumor-tracking using a gimbal-mounted linac. Twenty pairs of respiratory motions were acquired from 20 patients (ten lung, five liver, and five pancreatic cancer patients) who underwent dynamic tumor-tracking with the Vero4DRT. First, respiratory motions were acquired as training data for an initial construction of the prediction model before the irradiation. Next, additional respiratory motions were acquired for an update of the prediction model due to the change of the respiratory pattern during the irradiation. The time elapsed prior to the second acquisition of the respiratory motion was 12.6 ± 3.1 min. A four-axis moving phantom reproduced patients' three dimensional (3D) target motions and one dimensional surrogate motions. To predict the future internal target motion from the external surrogate motion, prediction models were constructed by minimizing residual prediction errors for training data acquired at 80 and 320 ms sampling intervals for 20 s, and at 500, 1,000, and 2,000 ms sampling intervals for 60 s using orthogonal kV x-ray imaging systems. The accuracies of prediction models trained with various sampling intervals were estimated based on training data with each sampling interval during the training process. The intrafractional prediction errors for various prediction models were then calculated on intrafractional monitoring images taken for 30 s at the constant sampling interval of a 500 ms fairly to evaluate the prediction accuracy for the same motion pattern. In addition, the first respiratory motion was used for the training and the second respiratory motion was used for the evaluation of the intrafractional prediction errors for the changed respiratory motion to evaluate the robustness of the prediction models. The training error of the prediction model was 1.7 ± 0.7 mm in 3D for all sampling
Model-free and model-based reward prediction errors in EEG.

Science.gov (United States)

Sambrook, Thomas D; Hardwick, Ben; Wills, Andy J; Goslin, Jeremy

2018-05-24

Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain. Copyright © 2018 Elsevier Inc. All rights reserved.
3dRPC: a web server for 3D RNA-protein structure prediction.

Science.gov (United States)

Huang, Yangyu; Li, Haotian; Xiao, Yi

2018-04-01

RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.
Predicting the future of sports organizations

Directory of Open Access Journals (Sweden)

Jugoslav Vojinovic

2015-06-01

Full Text Available The current crisis of sport in Serbia justifies its prediction of real potential future of sport organizations. Sample of respondents (N=277 was divided in two subsamples: 113 professional persons involved in the management of sports clubs ("experimental" sample and 164 individuals ("control" sample. The results of structural analysis showed that experimental sample based its vision on the staff as a determinant of the system, which is providing creativity as a characteristic of the organizational culture of the club. Control subsample of respondents could indicate some characteristic variables to predict the future of clubs, but can't say a clear prediction system based on a long sequence of reasoning. We can conclude that the mentioned two sub-samples are differerent in terms of the ability to orient to predict the future of their clubs on the basis of assessment of the key variables that shape the future scenarios.
Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.

Science.gov (United States)

Legendre, Audrey; Angel, Eric; Tahi, Fariza

2018-01-15

RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure. We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic bi-objective integer programming algorithm allowing to return optimal and sub-optimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F 1 -score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F 1 -scores are always higher than 70% for any number of solutions returned. The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several sub-optimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better mono-criterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multi-objective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univ-evry.fr .
Contingency Table Browser - prediction of early stage protein structure.

Science.gov (United States)

Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

2015-01-01

The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.
Applications of contact predictions to structural biology

Directory of Open Access Journals (Sweden)

Felix Simkovic

2017-05-01

Full Text Available Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallography, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental
Psychometric Validation of the Parental Bonding Instrument in a U.K. Population-Based Sample: Role of Gender and Association With Mental Health in Mid-Late Life.

Science.gov (United States)

Xu, Man K; Morin, Alexandre J S; Marsh, Herbert W; Richards, Marcus; Jones, Peter B

2016-08-01

The factorial structure of the Parental Bonding Instrument (PBI) has been frequently studied in diverse samples but no study has examined its psychometric properties from large, population-based samples. In particular, important questions have not been addressed such as the measurement invariance properties across parental and offspring gender. We evaluated the PBI based on responses from a large, representative population-based sample, using an exploratory structural equation modeling method appropriate for categorical data. Analysis revealed a three-factor structure representing "care," "overprotection," and "autonomy" parenting styles. In terms of psychometric measurement validity, our results supported the complete invariance of the PBI ratings across sons and daughters for their mothers and fathers. The PBI ratings were also robust in relation to personality and mental health status. In terms of predictive value, paternal care showed a protective effect on mental health at age 43 in sons. The PBI is a sound instrument for capturing perceived parenting styles, and is predictive of mental health in middle adulthood. © The Author(s) 2016.
Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

Directory of Open Access Journals (Sweden)

Lei Jia

Full Text Available Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG and melting temperature change (dTm were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.
Crystal structure prediction of flexible molecules using parallel genetic algorithms with a standard force field.

Science.gov (United States)

Kim, Seonah; Orendt, Anita M; Ferraro, Marta B; Facelli, Julio C

2009-10-01

This article describes the application of our distributed computing framework for crystal structure prediction (CSP) the modified genetic algorithms for crystal and cluster prediction (MGAC), to predict the crystal structure of flexible molecules using the general Amber force field (GAFF) and the CHARMM program. The MGAC distributed computing framework includes a series of tightly integrated computer programs for generating the molecule's force field, sampling crystal structures using a distributed parallel genetic algorithm and local energy minimization of the structures followed by the classifying, sorting, and archiving of the most relevant structures. Our results indicate that the method can consistently find the experimentally known crystal structures of flexible molecules, but the number of missing structures and poor ranking observed in some crystals show the need for further improvement of the potential. Copyright 2009 Wiley Periodicals, Inc.
I-TASSER server for protein 3D structure prediction

Directory of Open Access Journals (Sweden)

Zhang Yang

2008-01-01

Full Text Available Abstract Background Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. Results An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1] of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD. Conclusion The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available

Evaluation and use of in-silico structure-based epitope prediction with foot-and-mouth disease virus.

Directory of Open Access Journals (Sweden)

Daryl W Borley

Full Text Available Understanding virus antigenicity is of fundamental importance for the development of better, more cross-reactive vaccines. However, as far as we are aware, no systematic work has yet been conducted using the 3D structure of a virus to identify novel epitopes. Therefore we have extended several existing structural prediction algorithms to build a method for identifying epitopes on the appropriate outer surface of intact virus capsids (which are structurally different from globular proteins in both shape and arrangement of multiple repeated elements and applied it here as a proof of principle concept to the capsid of foot-and-mouth disease virus (FMDV. We have analysed how reliably several freely available structure-based B cell epitope prediction programs can identify already known viral epitopes of FMDV in the context of the viral capsid. To do this we constructed a simple objective metric to measure the sensitivity and discrimination of such algorithms. After optimising the parameters for five methods using an independent training set we used this measure to evaluate the methods. Individually any one algorithm performed rather poorly (three performing better than the other two suggesting that there may be value in developing virus-specific software. Taking a very conservative approach requiring a consensus between all three top methods predicts a number of previously described antigenic residues as potential epitopes on more than one serotype of FMDV, consistent with experimental results. The consensus results identified novel residues as potential epitopes on more than one serotype. These include residues 190-192 of VP2 (not previously determined to be antigenic, residues 69-71 and 193-197 of VP3 spanning the pentamer-pentamer interface, and another region incorporating residues 83, 84 and 169-174 of VP1 (all only previously experimentally defined on serotype A. The computer programs needed to create a semi-automated procedure for carrying out
Slope Deformation Prediction Based on Support Vector Machine

Directory of Open Access Journals (Sweden)

Lei JIA

2013-07-01

Full Text Available This paper principally studies the prediction of slope deformation based on Support Vector Machine (SVM. In the prediction process，explore how to reconstruct the phase space. The geological body’s displacement data obtained from chaotic time series are used as SVM’s training samples. Slope displacement caused by multivariable coupling is predicted by means of single variable. Results show that this model is of high fitting accuracy and generalization, and provides reference for deformation prediction in slope engineering.
Prediction of RNA secondary structures: from theory to models and real molecules

International Nuclear Information System (INIS)

Schuster, Peter

2006-01-01

RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA
Free energy minimization to predict RNA secondary structures and computational RNA design.

Science.gov (United States)

Churkin, Alexander; Weinbrand, Lina; Barash, Danny

2015-01-01

Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.
Sampling design optimisation for rainfall prediction using a non-stationary geostatistical model

Science.gov (United States)

Wadoux, Alexandre M. J.-C.; Brus, Dick J.; Rico-Ramirez, Miguel A.; Heuvelink, Gerard B. M.

2017-09-01

The accuracy of spatial predictions of rainfall by merging rain-gauge and radar data is partly determined by the sampling design of the rain-gauge network. Optimising the locations of the rain-gauges may increase the accuracy of the predictions. Existing spatial sampling design optimisation methods are based on minimisation of the spatially averaged prediction error variance under the assumption of intrinsic stationarity. Over the past years, substantial progress has been made to deal with non-stationary spatial processes in kriging. Various well-documented geostatistical models relax the assumption of stationarity in the mean, while recent studies show the importance of considering non-stationarity in the variance for environmental processes occurring in complex landscapes. We optimised the sampling locations of rain-gauges using an extension of the Kriging with External Drift (KED) model for prediction of rainfall fields. The model incorporates both non-stationarity in the mean and in the variance, which are modelled as functions of external covariates such as radar imagery, distance to radar station and radar beam blockage. Spatial predictions are made repeatedly over time, each time recalibrating the model. The space-time averaged KED variance was minimised by Spatial Simulated Annealing (SSA). The methodology was tested using a case study predicting daily rainfall in the north of England for a one-year period. Results show that (i) the proposed non-stationary variance model outperforms the stationary variance model, and (ii) a small but significant decrease of the rainfall prediction error variance is obtained with the optimised rain-gauge network. In particular, it pays off to place rain-gauges at locations where the radar imagery is inaccurate, while keeping the distribution over the study area sufficiently uniform.
Predicted crystal structures of molybdenum under high pressure

Energy Technology Data Exchange (ETDEWEB)

Wang, Bing; Zhang, Guang Biao [Institute for Computational Materials Science, School of Physics and Electronics, Henan University, Kaifeng 475004 (China); Wang, Yuan Xu, E-mail: wangyx@henu.edu.cn [Institute for Computational Materials Science, School of Physics and Electronics, Henan University, Kaifeng 475004 (China); Guizhou Provincial Key Laboratory of Computational Nano-Material Science, Institute of Applied Physics, Guizhou Normal College, Guiyang 550018 (China)

2013-04-15

Highlights: ► A double-hexagonal close-packed (dhcp) structure of molybdenum is predicted. ► Calculated acoustic velocity confirms the bcc–dhcp phase transition at 660 GPa. ► The valence electrons of dhcp Mo are mostly localized in the interstitial sites. -- Abstract: The high-pressure structures of molybdenum (Mo) at zero temperature have been extensively explored through the newly developed particle swarm optimization (PSO) algorithm on crystal structural prediction. All the experimental and earlier theoretical structures were successfully reproduced in certain pressure ranges, validating our methodology in application to Mo. A double-hexagonal close-packed (dhcp) structure found by Mikhaylushkin et al. (2008) [12] is confirmed by the present PSO calculations. The lattice parameters and physical properties of the dhcp phase were investigated based on first principles calculations. The phase transition occurs only from bcc phase to dhcp phase at 660 GPa and at zero temperature. The calculated acoustic velocities also indicate a transition from the bcc to dhcp phases for Mo. More intriguingly, the calculated density of states (DOS) shows that the dhcp structure remains metallic. The calculated electron density difference (EDD) reveals that its valence electrons are localized in the interstitial regions.
Achilles tendons from decorin- and biglycan-null mouse models have inferior mechanical and structural properties predicted by an image-based empirical damage model.

Science.gov (United States)

Gordon, J A; Freedman, B R; Zuskov, A; Iozzo, R V; Birk, D E; Soslowsky, L J

2015-07-16

Achilles tendons are a common source of pain and injury, and their pathology may originate from aberrant structure function relationships. Small leucine rich proteoglycans (SLRPs) influence mechanical and structural properties in a tendon-specific manner. However, their roles in the Achilles tendon have not been defined. The objective of this study was to evaluate the mechanical and structural differences observed in mouse Achilles tendons lacking class I SLRPs; either decorin or biglycan. In addition, empirical modeling techniques based on mechanical and image-based measures were employed. Achilles tendons from decorin-null (Dcn(-/-)) and biglycan-null (Bgn(-/-)) C57BL/6 female mice (N=102) were used. Each tendon underwent a dynamic mechanical testing protocol including simultaneous polarized light image capture to evaluate both structural and mechanical properties of each Achilles tendon. An empirical damage model was adapted for application to genetic variation and for use with image based structural properties to predict tendon dynamic mechanical properties. We found that Achilles tendons lacking decorin and biglycan had inferior mechanical and structural properties that were age dependent; and that simple empirical models, based on previously described damage models, were predictive of Achilles tendon dynamic modulus in both decorin- and biglycan-null mice. Copyright © 2015 Elsevier Ltd. All rights reserved.
Predicting nucleic acid binding interfaces from structural models of proteins.

Science.gov (United States)

Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

2012-02-01

The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.
PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Directory of Open Access Journals (Sweden)

Adeel Malik

2010-01-01

Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.
Using NDVI and guided sampling to develop yield prediction maps of processing tomato crop

Energy Technology Data Exchange (ETDEWEB)

Fortes, A.; Henar Prieto, M. del; García-Martín, A.; Córdoba, A.; Martínez, L.; Campillo, C.

2015-07-01

The use of yield prediction maps is an important tool for the delineation of within-field management zones. Vegetation indices based on crop reflectance are of potential use in the attainment of this objective. There are different types of vegetation indices based on crop reflectance, the most commonly used of which is the NDVI (normalized difference vegetation index). NDVI values are reported to have good correlation with several vegetation parameters including the ability to predict yield. The field research was conducted in two commercial farms of processing tomato crop, Cantillana and Enviciados. An NDVI prediction map developed through ordinary kriging technique was used for guided sampling of processing tomato yield. Yield was studied and related with NDVI, and finally a prediction map of crop yield for the entire plot was generated using two geostatistical methodologies (ordinary and regression kriging). Finally, a comparison was made between the yield obtained at validation points and the yield values according to the prediction maps. The most precise yield maps were obtained with the regression kriging methodology with RRMSE values of 14% and 17% in Cantillana and Enviciados, respectively, using the NDVI as predictor. The coefficient of correlation between NDVI and yield was correlated in the point samples taken in the two locations, with values of 0.71 and 0.67 in Cantillana and Enviciados, respectively. The results suggest that the use of a massive sampling parameter such as NDVI is a good indicator of the distribution of within-field yield variation. (Author)
From structure prediction to genomic screens for novel non-coding RNAs.

Science.gov (United States)

Gorodkin, Jan; Hofacker, Ivo L

2011-08-01

Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.
Sample size calculation to externally validate scoring systems based on logistic regression models.

Directory of Open Access Journals (Sweden)

Antonio Palazón-Bru

Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.
Evolutionary Structure Prediction of Stoichiometric Compounds

Science.gov (United States)

Zhu, Qiang; Oganov, Artem

2014-03-01

In general, for a given ionic compound AmBn\\ at ambient pressure condition, its stoichiometry reflects the valence state ratio between per chemical specie (i.e., the charges for each anion and cation). However, compounds under high pressure exhibit significantly behavior, compared to those analogs at ambient condition. Here we developed a method to solve the crystal structure prediction problem based on the evolutionary algorithms, which can predict both the stable compounds and their crystal structures at arbitrary P,T-conditions, given just the set of chemical elements. By applying this method to a wide range of binary ionic systems (Na-Cl, Mg-O, Xe-O, Cs-F, etc), we discovered a lot of compounds with brand new stoichimetries which can become thermodynamically stable. Further electronic structure analysis on these novel compounds indicates that several factors can contribute to this extraordinary phenomenon: (1) polyatomic anions; (2) free electron localization; (3) emergence of new valence states; (4) metallization. In particular, part of the results have been confirmed by experiment, which warrants that this approach can play a crucial role in new materials design under extreme pressure conditions. This work is funded by DARPA (Grants No. W31P4Q1210008 and W31P4Q1310005), NSF (EAR-1114313 and DMR-1231586).
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

Science.gov (United States)

Wang, Leilei; Cheng, Jinyong

2018-03-01

Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
Structure-based feeding strategies: A key component of child nutrition.

Science.gov (United States)

Taylor, Maija B; Emley, Elizabeth; Pratt, Mercedes; Musher-Eizenman, Dara R

2017-07-01

This study examined the relationship between structure, autonomy promotion, and control feeding strategies and parent-reported child diet. Participants (N = 497) were parents of children ages 2.5 to 7.5 recruited from Amazon Mechanical Turk. This sample was a Caucasian (79%), educated sample (61% college graduates) with most reports from mothers (76%). Online survey including measures of parent feeding strategies and child dietary intake. Use of structure-based feeding strategies explained 21% of the variance in child consumption of added sugar, 12% of the variance in child intake of added sugar from sugar-sweetened beverages, and 16% of the variance in child consumption of fruits and vegetables. Higher unhealthy food availability and permissive feeding uniquely predicted higher child added sugar intake and child consumption of added sugar from sugar-sweetened beverages. Greater healthy food availability uniquely predicted higher child fruit and vegetable intake. and Future Directions: In Caucasian educated families, structure-based feeding strategies appear to be a relatively stronger correlate of parent-reported child intake of added sugar and fruits and vegetables as compared to autonomy promotion and control feeding strategies. Longitudinal research may be needed in order to reveal the relationships between autonomy promotion and control feeding strategies with child diet. If future studies have similar findings to this study's results, researchers may want to focus more heavily on investigating the impact of teaching parents stimulus-control techniques and feeding-related assertiveness skills on child dietary intake. Copyright © 2017 Elsevier Ltd. All rights reserved.
Estimating cross-validatory predictive p-values with integrated importance sampling for disease mapping models.

Science.gov (United States)

Li, Longhai; Feng, Cindy X; Qiu, Shi

2017-06-30

An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Model Predictive Vibration Control Efficient Constrained MPC Vibration Control for Lightly Damped Mechanical Structures

CERN Document Server

Takács, Gergely

2012-01-01

Real-time model predictive controller (MPC) implementation in active vibration control (AVC) is often rendered difficult by fast sampling speeds and extensive actuator-deformation asymmetry. If the control of lightly damped mechanical structures is assumed, the region of attraction containing the set of allowable initial conditions requires a large prediction horizon, making the already computationally demanding on-line process even more complex. Model Predictive Vibration Control provides insight into the predictive control of lightly damped vibrating structures by exploring computationally efficient algorithms which are capable of low frequency vibration control with guaranteed stability and constraint feasibility. In addition to a theoretical primer on active vibration damping and model predictive control, Model Predictive Vibration Control provides a guide through the necessary steps in understanding the founding ideas of predictive control applied in AVC such as: · the implementation of ...
Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

Science.gov (United States)

Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

2014-01-01

Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Optimal sample size for predicting viability of cabbage and radish seeds based on near infrared spectra of single seeds

DEFF Research Database (Denmark)

Shetty, Nisha; Min, Tai-Gi; Gislum, René

2011-01-01

The effects of the number of seeds in a training sample set on the ability to predict the viability of cabbage or radish seeds are presented and discussed. The supervised classification method extended canonical variates analysis (ECVA) was used to develop a classification model. Calibration sub......-sets of different sizes were chosen randomly with several iterations and using the spectral-based sample selection algorithms DUPLEX and CADEX. An independent test set was used to validate the developed classification models. The results showed that 200 seeds were optimal in a calibration set for both cabbage...... using all 600 seeds in the calibration set. Thus, the number of seeds in the calibration set can be reduced by up to 67% without significant loss of classification accuracy, which will effectively enhance the cost-effectiveness of NIR spectral analysis. Wavelength regions important...
A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

Science.gov (United States)

2014-01-01

Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in

Blood gas sample spiking with total parenteral nutrition, lipid emulsion, and concentrated dextrose solutions as a model for predicting sample contamination based on glucose result.

Science.gov (United States)

Jara-Aguirre, Jose C; Smeets, Steven W; Wockenfus, Amy M; Karon, Brad S

2018-05-01

Evaluate the effects of blood gas sample contamination with total parenteral nutrition (TPN)/lipid emulsion and dextrose 50% (D50) solutions on blood gas and electrolyte measurement; and determine whether glucose concentration can predict blood gas sample contamination with TPN/lipid emulsion or D50. Residual lithium heparin arterial blood gas samples were spiked with TPN/lipid emulsion (0 to 15%) and D50 solutions (0 to 2.5%). Blood gas (pH, pCO2, pO2), electrolytes (Na+, K+ ionized calcium) and hemoglobin were measured with a Radiometer ABL90. Glucose concentration was measured in separated plasma by Roche Cobas c501. Chart review of neonatal blood gas results with glucose >300 mg/dL (>16.65 mmol/L) over a seven month period was performed to determine whether repeat (within 4 h) blood gas results suggested pre-analytical errors in blood gas results. Results were used to determine whether a glucose threshold could predict contamination resulting in blood gas and electrolyte results with greater than laboratory-defined allowable error. Samples spiked with 5% or more TPN/lipid emulsion solution or 1% D50 showed glucose concentration >500 mg/dL (>27.75 mmol/L) and produced blood gas (pH, pO 2 , pCO 2 ) results with greater than laboratory-defined allowable error. TPN/lipid emulsion, but not D50, produced greater than allowable error in electrolyte (Na + ,K + ,Ca ++ ,Hb) results at these concentrations. Based on chart review of 144 neonatal blood gas results with glucose >250 mg/dL received over seven months, four of ten neonatal intensive care unit (NICU) patients with glucose results >500 mg/dL and repeat blood gas results within 4 h had results highly suggestive of pre-analytical error. Only 3 of 36 NICU patients with glucose results 300-500 mg/dL and repeat blood gas results within 4 h had clear pre-analytical errors in blood gas results. Glucose concentration can be used as an indicator of significant blood sample contamination with either TPN
Neuro-fuzzy GMDH based particle swarm optimization for prediction of scour depth at downstream of grade control structures

Directory of Open Access Journals (Sweden)

Mohammad Najafzadeh

2015-03-01

Full Text Available In the present study, neuro-fuzzy based-group method of data handling (NF-GMDH as an adaptive learning network was utilized to predict the maximum scour depth at the downstream of grade-control structures. The NF-GMDH network was developed using particle swarm optimization (PSO. Effective parameters on the scour depth include sediment size, geometry of weir, and flow characteristics in the upstream and downstream of structure. Training and testing of performances were carried out using non-dimensional variables. Datasets were divided into three series of dataset (DS. The testing results of performances were compared with the gene-expression programming (GEP, evolutionary polynomial regression (EPR model, and conventional techniques. The NF-GMDH-PSO network produced lower error of the scour depth prediction than those obtained using the other models. Also, the effective input parameter on the maximum scour depth was determined through a sensitivity analysis.
From structure prediction to genomic screens for novel non-coding RNAs.

Directory of Open Access Journals (Sweden)

Jan Gorodkin

2011-08-01

Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.
Protein Structure Prediction by Protein Threading

Science.gov (United States)

Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.
CentroidFold: a web server for RNA secondary structure prediction

OpenAIRE

Sato, Kengo; Hamada, Michiaki; Asai, Kiyoshi; Mituyama, Toutai

2009-01-01

The CentroidFold web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the ser...
Link Prediction in Evolving Networks Based on Popularity of Nodes.

Science.gov (United States)

Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

2017-08-02

Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Predicting cognitive function of the Malaysian elderly: a structural equation modelling approach.

Science.gov (United States)

Foong, Hui Foh; Hamid, Tengku Aizan; Ibrahim, Rahimah; Haron, Sharifah Azizah; Shahar, Suzana

2018-01-01

The aim of this study was to identify the predictors of elderly's cognitive function based on biopsychosocial and cognitive reserve perspectives. The study included 2322 community-dwelling elderly in Malaysia, randomly selected through a multi-stage proportional cluster random sampling from Peninsular Malaysia. The elderly were surveyed on socio-demographic information, biomarkers, psychosocial status, disability, and cognitive function. A biopsychosocial model of cognitive function was developed to test variables' predictive power on cognitive function. Statistical analyses were performed using SPSS (version 15.0) in conjunction with Analysis of Moment Structures Graphics (AMOS 7.0). The estimated theoretical model fitted the data well. Psychosocial stress and metabolic syndrome (MetS) negatively predicted cognitive function and psychosocial stress appeared as a main predictor. Socio-demographic characteristics, except gender, also had significant effects on cognitive function. However, disability failed to predict cognitive function. Several factors together may predict cognitive function in the Malaysian elderly population, and the variance accounted for it is large enough to be considered substantial. Key factor associated with the elderly's cognitive function seems to be psychosocial well-being. Thus, psychosocial well-being should be included in the elderly assessment, apart from medical conditions, both in clinical and community setting.
A nucleobase-centered coarse-grained representation for structure prediction of RNA motifs.

Science.gov (United States)

Poblete, Simón; Bottaro, Sandro; Bussi, Giovanni

2018-02-28

We introduce the SPlit-and-conQueR (SPQR) model, a coarse-grained (CG) representation of RNA designed for structure prediction and refinement. In our approach, the representation of a nucleotide consists of a point particle for the phosphate group and an anisotropic particle for the nucleoside. The interactions are, in principle, knowledge-based potentials inspired by the $\\mathcal {E}$SCORE function, a base-centered scoring function. However, a special treatment is given to base-pairing interactions and certain geometrical conformations which are lost in a raw knowledge-based model. This results in a representation able to describe planar canonical and non-canonical base pairs and base-phosphate interactions and to distinguish sugar puckers and glycosidic torsion conformations. The model is applied to the folding of several structures, including duplexes with internal loops of non-canonical base pairs, tetraloops, junctions and a pseudoknot. For the majority of these systems, experimental structures are correctly predicted at the level of individual contacts. We also propose a method for efficiently reintroducing atomistic detail from the CG representation.
Predictive factors for somatization in a trauma sample

DEFF Research Database (Denmark)

Elklit, Ask; Christiansen, Dorte M

2009-01-01

ABSTRACT: BACKGROUND: Unexplained somatic symptoms are common among trauma survivors. The relationship between trauma and somatization appears to be mediated by posttraumatic stress disorder (PTSD). However, only few studies have focused on what other psychological risk factors may predispose...... a trauma victim towards developing somatoform symptoms. METHODS: The present paper examines the predictive value of PTSD severity, dissociation, negative affectivity, depression, anxiety, and feeling incompetent on somatization in a Danish sample of 169 adult men and women who were affected by a series...... of incompetence significantly predicted somatization in the trauma sample whereas dissociation, depression, and anxiety were not associated with degree of somatization. PTSD as a risk factor was mediated by negative affectivity....
Cascaded bidirectional recurrent neural networks for protein secondary structure prediction.

Science.gov (United States)

Chen, Jinmiao; Chaudhari, Narendra

2007-01-01

Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.
Theoretical prediction of low-density hexagonal ZnO hollow structures

Energy Technology Data Exchange (ETDEWEB)

Tuoc, Vu Ngoc, E-mail: tuoc.vungoc@hust.edu.vn [Institute of Engineering Physics, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi (Viet Nam); Huan, Tran Doan [Institute of Materials Science, University of Connecticut, Storrs, Connecticut 06269-3136 (United States); Thao, Nguyen Thi [Institute of Engineering Physics, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi (Viet Nam); Hong Duc University, 307 Le Lai, Thanh Hoa City (Viet Nam); Tuan, Le Manh [Hong Duc University, 307 Le Lai, Thanh Hoa City (Viet Nam)

2016-10-14

Along with wurtzite and zinc blende, zinc oxide (ZnO) has been found in a large number of polymorphs with substantially different properties and, hence, applications. Therefore, predicting and synthesizing new classes of ZnO polymorphs are of great significance and have been gaining considerable interest. Herein, we perform a density functional theory based tight-binding study, predicting several new series of ZnO hollow structures using the bottom-up approach. The geometry of the building blocks allows for obtaining a variety of hexagonal, low-density nanoporous, and flexible ZnO hollow structures. Their stability is discussed by means of the free energy computed within the lattice-dynamics approach. Our calculations also indicate that all the reported hollow structures are wide band gap semiconductors in the same fashion with bulk ZnO. The electronic band structures of the ZnO hollow structures are finally examined in detail.
Exploring high-pressure FeB{sub 2}: Structural and electronic properties predictions

Energy Technology Data Exchange (ETDEWEB)

Harran, Ismail [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Al Fashir University (Sudan); Wang, Hongyan [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Chen, Yuanzheng, E-mail: cyz@calypso.org.cn [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Jia, Mingzhen [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Wu, Nannan [School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science & Technology, Baotou, 014010 (China)

2016-09-05

The high pressure (HP) structural phase of FeB{sub 2} compound is investigated by using first-principles crystal structure prediction based on the CALYPSO technique. A thermodynamically stable phase of FeB{sub 2} with space group Imma is predicted at pressure above 225 GPa, which is characterized by a layered orthorhombic structure containing puckered graphite-like boron layers. Its electronic and mechanical properties are identified and analyzed. The feature of band structures favors the occurrence of superconductivity, whereas, the calculated Pugh's ratio reveals that the HP Imma structure exhibits ductile mechanical property. - Highlights: • The high pressure structural phase of FeB{sub 2} compound is firstly investigated by the CALYPSO technique. • A thermodynamically stable Imma phase of FeB{sub 2} is predicted at pressure above 225 GPa. • The Imma structure is characterized by a 2D boron network containing puckered graphite-like boron layers. • The band feature of Imma structure favors the occurrence of superconductivity. • The calculated Pugh's ratio suggests that the Imma structure exhibits ductile mechanical property.
A Method to Predict the Structure and Stability of RNA/RNA Complexes.

Science.gov (United States)

Xu, Xiaojun; Chen, Shi-Jie

2016-01-01

RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.
Support vector machine incremental learning triggered by wrongly predicted samples

Science.gov (United States)

Tang, Ting-long; Guan, Qiu; Wu, Yi-rong

2018-05-01

According to the classic Karush-Kuhn-Tucker (KKT) theorem, at every step of incremental support vector machine (SVM) learning, the newly adding sample which violates the KKT conditions will be a new support vector (SV) and migrate the old samples between SV set and non-support vector (NSV) set, and at the same time the learning model should be updated based on the SVs. However, it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs. Additionally, the learning model will be unnecessarily updated, which will not greatly increase its accuracy but decrease the training speed. Therefore, how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning. In this work, a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously. Experimental results show that the proposed algorithm can achieve good performance with high efficiency, high speed and good accuracy.
Bitter or not? BitterPredict, a tool for predicting taste from chemical structure.

Science.gov (United States)

Dagan-Wiener, Ayana; Nissim, Ido; Ben Abu, Natalie; Borgonovo, Gigliola; Bassoli, Angela; Niv, Masha Y

2017-09-21

Bitter taste is an innately aversive taste modality that is considered to protect animals from consuming toxic compounds. Yet, bitterness is not always noxious and some bitter compounds have beneficial effects on health. Hundreds of bitter compounds were reported (and are accessible via the BitterDB http://bitterdb.agri.huji.ac.il/dbbitter.php ), but numerous additional bitter molecules are still unknown. The dramatic chemical diversity of bitterants makes bitterness prediction a difficult task. Here we present a machine learning classifier, BitterPredict, which predicts whether a compound is bitter or not, based on its chemical structure. BitterDB was used as the positive set, and non-bitter molecules were gathered from literature to create the negative set. Adaptive Boosting (AdaBoost), based on decision trees machine-learning algorithm was applied to molecules that were represented using physicochemical and ADME/Tox descriptors. BitterPredict correctly classifies over 80% of the compounds in the hold-out test set, and 70-90% of the compounds in three independent external sets and in sensory test validation, providing a quick and reliable tool for classifying large sets of compounds into bitter and non-bitter groups. BitterPredict suggests that about 40% of random molecules, and a large portion (66%) of clinical and experimental drugs, and of natural products (77%) are bitter.
Protein Function Prediction Based on Sequence and Structure Information

KAUST Repository

Smaili, Fatima Z.

2016-01-01

operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching
Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features

Directory of Open Access Journals (Sweden)

Zhang Ya-Nan

2012-05-01

Full Text Available Abstract Background Adenosine-5′-triphosphate (ATP is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex. Results In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary. Conclusions Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

KAUST Repository

Cui, Xuefeng

2016-06-15

Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.
A support vector density-based importance sampling for reliability assessment

International Nuclear Information System (INIS)

Dai, Hongzhe; Zhang, Hao; Wang, Wei

2012-01-01

An importance sampling method based on the adaptive Markov chain simulation and support vector density estimation is developed in this paper for efficient structural reliability assessment. The methodology involves the generation of samples that can adaptively populate the important region by the adaptive Metropolis algorithm, and the construction of importance sampling density by support vector density. The use of the adaptive Metropolis algorithm may effectively improve the convergence and stability of the classical Markov chain simulation. The support vector density can approximate the sampling density with fewer samples in comparison to the conventional kernel density estimation. The proposed importance sampling method can effectively reduce the number of structural analysis required for achieving a given accuracy. Examples involving both numerical and practical structural problems are given to illustrate the application and efficiency of the proposed methodology.
Integrating chemical footprinting data into RNA secondary structure prediction.

Directory of Open Access Journals (Sweden)

Kourosh Zarringhalam

Full Text Available Chemical and enzymatic footprinting experiments, such as shape (selective 2'-hydroxyl acylation analyzed by primer extension, yield important information about RNA secondary structure. Indeed, since the [Formula: see text]-hydroxyl is reactive at flexible (loop regions, but unreactive at base-paired regions, shape yields quantitative data about which RNA nucleotides are base-paired. Recently, low error rates in secondary structure prediction have been reported for three RNAs of moderate size, by including base stacking pseudo-energy terms derived from shape data into the computation of minimum free energy secondary structure. Here, we describe a novel method, RNAsc (RNA soft constraints, which includes pseudo-energy terms for each nucleotide position, rather than only for base stacking positions. We prove that RNAsc is self-consistent, in the sense that the nucleotide-specific probabilities of being unpaired in the low energy Boltzmann ensemble always become more closely correlated with the input shape data after application of RNAsc. From this mathematical perspective, the secondary structure predicted by RNAsc should be 'correct', in as much as the shape data is 'correct'. We benchmark RNAsc against the previously mentioned method for eight RNAs, for which both shape data and native structures are known, to find the same accuracy in 7 out of 8 cases, and an improvement of 25% in one case. Furthermore, we present what appears to be the first direct comparison of shape data and in-line probing data, by comparing yeast asp-tRNA shape data from the literature with data from in-line probing experiments we have recently performed. With respect to several criteria, we find that shape data appear to be more robust than in-line probing data, at least in the case of asp-tRNA.

MCTBI: a web server for predicting metal ion effects in RNA structures.

Science.gov (United States)

Sun, Li-Zhen; Zhang, Jing-Xiang; Chen, Shi-Jie

2017-08-01

Metal ions play critical roles in RNA structure and function. However, web servers and software packages for predicting ion effects in RNA structures are notably scarce. Furthermore, the existing web servers and software packages mainly neglect ion correlation and fluctuation effects, which are potentially important for RNAs. We here report a new web server, the MCTBI server (http://rna.physics.missouri.edu/MCTBI), for the prediction of ion effects for RNA structures. This server is based on the recently developed MCTBI, a model that can account for ion correlation and fluctuation effects for nucleic acid structures and can provide improved predictions for the effects of metal ions, especially for multivalent ions such as Mg 2+ effects, as shown by extensive theory-experiment test results. The MCTBI web server predicts metal ion binding fractions, the most probable bound ion distribution, the electrostatic free energy of the system, and the free energy components. The results provide mechanistic insights into the role of metal ions in RNA structure formation and folding stability, which is important for understanding RNA functions and the rational design of RNA structures. © 2017 Sun et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Composition-Based Prediction of Temperature-Dependent Thermophysical Food Properties: Reevaluating Component Groups and Prediction Models.

Science.gov (United States)

Phinney, David Martin; Frelka, John C; Heldman, Dennis Ray

2017-01-01

Prediction of temperature-dependent thermophysical properties (thermal conductivity, density, specific heat, and thermal diffusivity) is an important component of process design for food manufacturing. Current models for prediction of thermophysical properties of foods are based on the composition, specifically, fat, carbohydrate, protein, fiber, water, and ash contents, all of which change with temperature. The objectives of this investigation were to reevaluate and improve the prediction expressions for thermophysical properties. Previously published data were analyzed over the temperature range from 10 to 150 °C. These data were analyzed to create a series of relationships between the thermophysical properties and temperature for each food component, as well as to identify the dependence of the thermophysical properties on more specific structural properties of the fats, carbohydrates, and proteins. Results from this investigation revealed that the relationships between the thermophysical properties of the major constituents of foods and temperature can be statistically described by linear expressions, in contrast to the current polynomial models. Links between variability in thermophysical properties and structural properties were observed. Relationships for several thermophysical properties based on more specific constituents have been identified. Distinctions between simple sugars (fructose, glucose, and lactose) and complex carbohydrates (starch, pectin, and cellulose) have been proposed. The relationships between the thermophysical properties and proteins revealed a potential correlation with the molecular weight of the protein. The significance of relating variability in constituent thermophysical properties with structural properties--such as molecular mass--could significantly improve composition-based prediction models and, consequently, the effectiveness of process design. © 2016 Institute of Food Technologists®.
Ligand and structure-based classification models for Prediction of P-glycoprotein inhibitors

DEFF Research Database (Denmark)

Klepsch, Freya; Poongavanam, Vasanthanathan; Ecker, Gerhard Franz

2014-01-01

an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function Chem...
Electronic structure prediction via data-mining the empirical pseudopotential method

Energy Technology Data Exchange (ETDEWEB)

Zenasni, H; Aourag, H [LEPM, URMER, Departement of Physics, University Abou Bakr Belkaid, Tlemcen 13000 (Algeria); Broderick, S R; Rajan, K [Department of Materials Science and Engineering, Iowa State University, Ames, Iowa 50011-2230 (United States)

2010-01-15

We introduce a new approach for accelerating the calculation of the electronic structure of new materials by utilizing the empirical pseudopotential method combined with data mining tools. Combining data mining with the empirical pseudopotential method allows us to convert an empirical approach to a predictive approach. Here we consider tetrahedrally bounded III-V Bi semiconductors, and through the prediction of form factors based on basic elemental properties we can model the band structure and charge density for these semi-conductors, for which limited results exist. This work represents a unique approach to modeling the electronic structure of a material which may be used to identify new promising semi-conductors and is one of the few efforts utilizing data mining at an electronic level. (Abstract Copyright [2010], Wiley Periodicals, Inc.)
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

Science.gov (United States)

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Sampled-data-based vibration control for structural systems with finite-time state constraint and sensor outage.

Science.gov (United States)

Weng, Falu; Liu, Mingxin; Mao, Weijie; Ding, Yuanchun; Liu, Feifei

2018-05-10

The problem of sampled-data-based vibration control for structural systems with finite-time state constraint and sensor outage is investigated in this paper. The objective of designing controllers is to guarantee the stability and anti-disturbance performance of the closed-loop systems while some sensor outages happen. Firstly, based on matrix transformation, the state-space model of structural systems with sensor outages and uncertainties appearing in the mass, damping and stiffness matrices is established. Secondly, by considering most of those earthquakes or strong winds happen in a very short time, and it is often the peak values make the structures damaged, the finite-time stability analysis method is introduced to constrain the state responses in a given time interval, and the H-infinity stability is adopted in the controller design to make sure that the closed-loop system has a prescribed level of disturbance attenuation performance during the whole control process. Furthermore, all stabilization conditions are expressed in the forms of linear matrix inequalities (LMIs), whose feasibility can be easily checked by using the LMI Toolbox. Finally, numerical examples are given to demonstrate the effectiveness of the proposed theorems. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Toward Structure Prediction for Short Peptides Using the Improved SAAP Force Field Parameters

Directory of Open Access Journals (Sweden)

Kenichi Dedachi

2013-01-01

Full Text Available Based on the observation that Ramachandran-type potential energy surfaces of single amino acid units in water are in good agreement with statistical structures of the corresponding amino acid residues in proteins, we recently developed a new all-atom force field called SAAP, in which the total energy function for a polypeptide is expressed basically as a sum of single amino acid potentials and electrostatic and Lennard-Jones potentials between the amino acid units. In this study, the SAAP force field (SAAPFF parameters were improved, and classical canonical Monte Carlo (MC simulation was carried out for short peptide models, that is, Met-enkephalin and chignolin, at 300 K in an implicit water model. Diverse structures were reasonably obtained for Met-enkephalin, while three folded structures, one of which corresponds to a native-like structure with three native hydrogen bonds, were obtained for chignolin. The results suggested that the SAAP-MC method is useful for conformational sampling for the short peptides. A protocol of SAAP-MC simulation followed by structural clustering and examination of the obtained structures by ab initio calculation or simply by the number of the hydrogen bonds (or the hardness was demonstrated to be an effective strategy toward structure prediction for short peptide molecules.
A comprehensive comparison of comparative RNA structure prediction approaches

DEFF Research Database (Denmark)

Gardner, P. P.; Giegerich, R.

2004-01-01

-finding and multiple-sequence-alignment algorithms. Results Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. Conclusions We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms......Background An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene...
A copula-based sampling method for data-driven prognostics

International Nuclear Information System (INIS)

Xi, Zhimin; Jing, Rong; Wang, Pingfeng; Hu, Chao

2014-01-01

This paper develops a Copula-based sampling method for data-driven prognostics. The method essentially consists of an offline training process and an online prediction process: (i) the offline training process builds a statistical relationship between the failure time and the time realizations at specified degradation levels on the basis of off-line training data sets; and (ii) the online prediction process identifies probable failure times for online testing units based on the statistical model constructed in the offline process and the online testing data. Our contributions in this paper are three-fold, namely the definition of a generic health index system to quantify the health degradation of an engineering system, the construction of a Copula-based statistical model to learn the statistical relationship between the failure time and the time realizations at specified degradation levels, and the development of a simulation-based approach for the prediction of remaining useful life (RUL). Two engineering case studies, namely the electric cooling fan health prognostics and the 2008 IEEE PHM challenge problem, are employed to demonstrate the effectiveness of the proposed methodology. - Highlights: • We develop a novel mechanism for data-driven prognostics. • A generic health index system quantifies health degradation of engineering systems. • Off-line training model is constructed based on the Bayesian Copula model. • Remaining useful life is predicted from a simulation-based approach
From structure prediction to genomic screens for novel non-coding RNAs

DEFF Research Database (Denmark)

Gorodkin, Jan; Hofacker, Ivo L.

2011-01-01

Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....
Learning-based prediction of gestational age from ultrasound images of the fetal brain.

Science.gov (United States)

Namburete, Ana I L; Stebbing, Richard V; Kemp, Bryn; Yaqub, Mohammad; Papageorghiou, Aris T; Alison Noble, J

2015-04-01

We propose an automated framework for predicting gestational age (GA) and neurodevelopmental maturation of a fetus based on 3D ultrasound (US) brain image appearance. Our method capitalizes on age-related sonographic image patterns in conjunction with clinical measurements to develop, for the first time, a predictive age model which improves on the GA-prediction potential of US images. The framework benefits from a manifold surface representation of the fetal head which delineates the inner skull boundary and serves as a common coordinate system based on cranial position. This allows for fast and efficient sampling of anatomically-corresponding brain regions to achieve like-for-like structural comparison of different developmental stages. We develop bespoke features which capture neurosonographic patterns in 3D images, and using a regression forest classifier, we characterize structural brain development both spatially and temporally to capture the natural variation existing in a healthy population (N=447) over an age range of active brain maturation (18-34weeks). On a routine clinical dataset (N=187) our age prediction results strongly correlate with true GA (r=0.98,accurate within±6.10days), confirming the link between maturational progression and neurosonographic activity observable across gestation. Our model also outperforms current clinical methods by ±4.57 days in the third trimester-a period complicated by biological variations in the fetal population. Through feature selection, the model successfully identified the most age-discriminating anatomies over this age range as being the Sylvian fissure, cingulate, and callosal sulci. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements

KAUST Repository

Guturu, H.

2013-11-11

Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and \\'through-DNA\\' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.
Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements

KAUST Repository

Guturu, H.; Doxey, A. C.; Wenger, A. M.; Bejerano, G.

2013-01-01

Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.
Limited Sampling Strategy for Accurate Prediction of Pharmacokinetics of Saroglitazar: A 3-point Linear Regression Model Development and Successful Prediction of Human Exposure.

Science.gov (United States)

Joshi, Shuchi N; Srinivas, Nuggehally R; Parmar, Deven V

2018-03-01

Our aim was to develop and validate the extrapolative performance of a regression model using a limited sampling strategy for accurate estimation of the area under the plasma concentration versus time curve for saroglitazar. Healthy subject pharmacokinetic data from a well-powered food-effect study (fasted vs fed treatments; n = 50) was used in this work. The first 25 subjects' serial plasma concentration data up to 72 hours and corresponding AUC 0-t (ie, 72 hours) from the fasting group comprised a training dataset to develop the limited sampling model. The internal datasets for prediction included the remaining 25 subjects from the fasting group and all 50 subjects from the fed condition of the same study. The external datasets included pharmacokinetic data for saroglitazar from previous single-dose clinical studies. Limited sampling models were composed of 1-, 2-, and 3-concentration-time points' correlation with AUC 0-t of saroglitazar. Only models with regression coefficients (R 2 ) >0.90 were screened for further evaluation. The best R 2 model was validated for its utility based on mean prediction error, mean absolute prediction error, and root mean square error. Both correlations between predicted and observed AUC 0-t of saroglitazar and verification of precision and bias using Bland-Altman plot were carried out. None of the evaluated 1- and 2-concentration-time points models achieved R 2 > 0.90. Among the various 3-concentration-time points models, only 4 equations passed the predefined criterion of R 2 > 0.90. Limited sampling models with time points 0.5, 2, and 8 hours (R 2 = 0.9323) and 0.75, 2, and 8 hours (R 2 = 0.9375) were validated. Mean prediction error, mean absolute prediction error, and root mean square error were prediction of saroglitazar. The same models, when applied to the AUC 0-t prediction of saroglitazar sulfoxide, showed mean prediction error, mean absolute prediction error, and root mean square error model predicts the exposure of
Nucleic acid secondary structure prediction and display.

OpenAIRE

Stüber, K

1986-01-01

A set of programs has been developed for the prediction and display of nucleic acid secondary structures. Information from experimental data can be used to restrict or enforce secondary structural elements. The predictions can be displayed either on normal line printers or on graphic devices like plotters or graphic terminals.
Polymer physics predicts the effects of structural variants on chromatin architecture.

Science.gov (United States)

Bianco, Simona; Lupiáñez, Darío G; Chiariello, Andrea M; Annunziatella, Carlo; Kraft, Katerina; Schöpflin, Robert; Wittler, Lars; Andrey, Guillaume; Vingron, Martin; Pombo, Ana; Mundlos, Stefan; Nicodemi, Mario

2018-05-01

Structural variants (SVs) can result in changes in gene expression due to abnormal chromatin folding and cause disease. However, the prediction of such effects remains a challenge. Here we present a polymer-physics-based approach (PRISMR) to model 3D chromatin folding and to predict enhancer-promoter contacts. PRISMR predicts higher-order chromatin structure from genome-wide chromosome conformation capture (Hi-C) data. Using the EPHA4 locus as a model, the effects of pathogenic SVs are predicted in silico and compared to Hi-C data generated from mouse limb buds and patient-derived fibroblasts. PRISMR deconvolves the folding complexity of the EPHA4 locus and identifies SV-induced ectopic contacts and alterations of 3D genome organization in homozygous or heterozygous states. We show that SVs can reconfigure topologically associating domains, thereby producing extensive rewiring of regulatory interactions and causing disease by gene misexpression. PRISMR can be used to predict interactions in silico, thereby providing a tool for analyzing the disease-causing potential of SVs.
A new configurational bias scheme for sampling supramolecular structures

Energy Technology Data Exchange (ETDEWEB)

De Gernier, Robin; Mognetti, Bortolo M., E-mail: bmognett@ulb.ac.be [Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Code Postal 231, Campus Plaine, B-1050 Brussels (Belgium); Curk, Tine [Department of Chemistry, University of Cambridge, Cambridge CB2 1EW (United Kingdom); Dubacheva, Galina V. [Biosurfaces Unit, CIC biomaGUNE, Paseo Miramon 182, 20009 Donostia - San Sebastian (Spain); Richter, Ralf P. [Biosurfaces Unit, CIC biomaGUNE, Paseo Miramon 182, 20009 Donostia - San Sebastian (Spain); Université Grenoble Alpes, DCM, 38000 Grenoble (France); CNRS, DCM, 38000 Grenoble (France); Max Planck Institute for Intelligent Systems, 70569 Stuttgart (Germany)

2014-12-28

We present a new simulation scheme which allows an efficient sampling of reconfigurable supramolecular structures made of polymeric constructs functionalized by reactive binding sites. The algorithm is based on the configurational bias scheme of Siepmann and Frenkel and is powered by the possibility of changing the topology of the supramolecular network by a non-local Monte Carlo algorithm. Such a plan is accomplished by a multi-scale modelling that merges coarse-grained simulations, describing the typical polymer conformations, with experimental results accounting for free energy terms involved in the reactions of the active sites. We test the new algorithm for a system of DNA coated colloids for which we compute the hybridisation free energy cost associated to the binding of tethered single stranded DNAs terminated by short sequences of complementary nucleotides. In order to demonstrate the versatility of our method, we also consider polymers functionalized by receptors that bind a surface decorated by ligands. In particular, we compute the density of states of adsorbed polymers as a function of the number of ligand–receptor complexes formed. Such a quantity can be used to study the conformational properties of adsorbed polymers useful when engineering adsorption with tailored properties. We successfully compare the results with the predictions of a mean field theory. We believe that the proposed method will be a useful tool to investigate supramolecular structures resulting from direct interactions between functionalized polymers for which efficient numerical methodologies of investigation are still lacking.
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction.

Science.gov (United States)

Boiteau, Rene M; Hoyt, David W; Nicora, Carrie D; Kinmonth-Schultz, Hannah A; Ward, Joy K; Bingol, Kerem

2018-01-17

We introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS²), and NMR into a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter out the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture, and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana . The NMR/MS² approach is well suited to the discovery of new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.
Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

Science.gov (United States)

Chira, Camelia; Horvath, Dragos; Dumitrescu, D

2011-07-30

Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

Directory of Open Access Journals (Sweden)

Chira Camelia

2011-07-01

Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements

Directory of Open Access Journals (Sweden)

Sze Sing-Hoi

2008-07-01

Full Text Available Abstract Background Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins. Results By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states. Conclusion Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold is available at http://faculty.cs.tamu.edu/shsze/ssfold.
Prediction uncertainty assessment of a systems biology model requires a sample of the full probability distribution of its parameters

Directory of Open Access Journals (Sweden)

Simon van Mourik

2014-06-01

Full Text Available Multi-parameter models in systems biology are typically ‘sloppy’: some parameters or combinations of parameters may be hard to estimate from data, whereas others are not. One might expect that parameter uncertainty automatically leads to uncertain predictions, but this is not the case. We illustrate this by showing that the prediction uncertainty of each of six sloppy models varies enormously among different predictions. Statistical approximations of parameter uncertainty may lead to dramatic errors in prediction uncertainty estimation. We argue that prediction uncertainty assessment must therefore be performed on a per-prediction basis using a full computational uncertainty analysis. In practice this is feasible by providing a model with a sample or ensemble representing the distribution of its parameters. Within a Bayesian framework, such a sample may be generated by a Markov Chain Monte Carlo (MCMC algorithm that infers the parameter distribution based on experimental data. Matlab code for generating the sample (with the Differential Evolution Markov Chain sampler and the subsequent uncertainty analysis using such a sample, is supplied as Supplemental Information.
Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations.

Science.gov (United States)

Rogers, Emily; Murrugarra, David; Heitsch, Christine

2017-07-25

Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones. Copyright © 2017. Published by Elsevier Inc.
Viral IRES prediction system - a web server for prediction of the IRES secondary structure in silico.

Directory of Open Access Journals (Sweden)

Jun-Jie Hong

Full Text Available The internal ribosomal entry site (IRES functions as cap-independent translation initiation sites in eukaryotic cells. IRES elements have been applied as useful tools for bi-cistronic expression vectors. Current RNA structure prediction programs are unable to predict precisely the potential IRES element. We have designed a viral IRES prediction system (VIPS to perform the IRES secondary structure prediction. In order to obtain better results for the IRES prediction, the VIPS can evaluate and predict for all four different groups of IRESs with a higher accuracy. RNA secondary structure prediction, comparison, and pseudoknot prediction programs were implemented to form the three-stage procedure for the VIPS. The backbone of VIPS includes: the RNAL fold program, aimed to predict local RNA secondary structures by minimum free energy method; the RNA Align program, intended to compare predicted structures; and pknotsRG program, used to calculate the pseudoknot structure. VIPS was evaluated by using UTR database, IRES database and Virus database, and the accuracy rate of VIPS was assessed as 98.53%, 90.80%, 82.36% and 80.41% for IRES groups 1, 2, 3, and 4, respectively. This advance useful search approach for IRES structures will facilitate IRES related studies. The VIPS on-line website service is available at http://140.135.61.250/vips/.
PNN-based Rockburst Prediction Model and Its Applications

Directory of Open Access Journals (Sweden)

Yu Zhou

2017-07-01

Full Text Available Rock burst is one of main engineering geological problems significantly threatening the safety of construction. Prediction of rock burst is always an important issue concerning the safety of workers and equipment in tunnels. In this paper, a novel PNN-based rock burst prediction model is proposed to determine whether rock burst will happen in the underground rock projects and how much the intensity of rock burst is. The probabilistic neural network (PNN is developed based on Bayesian criteria of multivariate pattern classification. Because PNN has the advantages of low training complexity, high stability, quick convergence, and simple construction, it can be well applied in the prediction of rock burst. Some main control factors, such as rocks’ maximum tangential stress, rocks’ uniaxial compressive strength, rocks’ uniaxial tensile strength, and elastic energy index of rock are chosen as the characteristic vector of PNN. PNN model is obtained through training data sets of rock burst samples which come from underground rock project in domestic and abroad. Other samples are tested with the model. The testing results agree with the practical records. At the same time, two real-world applications are used to verify the proposed method. The results of prediction are same as the results of existing methods, just same as what happened in the scene, which verifies the effectiveness and applicability of our proposed work.
Protein structure refinement using a quantum mechanics-based chemical shielding predictor

DEFF Research Database (Denmark)

Bratholm, Lars Andersen; Jensen, Jan Halborg

2017-01-01

The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor...... of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic...
Methods of developing core collections based on the predicted genotypic value of rice ( Oryza sativa L.).

Science.gov (United States)

Li, C T; Shi, C H; Wu, J G; Xu, H M; Zhang, H Z; Ren, Y L

2004-04-01

The selection of an appropriate sampling strategy and a clustering method is important in the construction of core collections based on predicted genotypic values in order to retain the greatest degree of genetic diversity of the initial collection. In this study, methods of developing rice core collections were evaluated based on the predicted genotypic values for 992 rice varieties with 13 quantitative traits. The genotypic values of the traits were predicted by the adjusted unbiased prediction (AUP) method. Based on the predicted genotypic values, Mahalanobis distances were calculated and employed to measure the genetic similarities among the rice varieties. Six hierarchical clustering methods, including the single linkage, median linkage, centroid, unweighted pair-group average, weighted pair-group average and flexible-beta methods, were combined with random, preferred and deviation sampling to develop 18 core collections of rice germplasm. The results show that the deviation sampling strategy in combination with the unweighted pair-group average method of hierarchical clustering retains the greatest degree of genetic diversities of the initial collection. The core collections sampled using predicted genotypic values had more genetic diversity than those based on phenotypic values.
Computational predictions of zinc oxide hollow structures

Science.gov (United States)

Tuoc, Vu Ngoc; Huan, Tran Doan; Thao, Nguyen Thi

2018-03-01

Nanoporous materials are emerging as potential candidates for a wide range of technological applications in environment, electronic, and optoelectronics, to name just a few. Within this active research area, experimental works are predominant while theoretical/computational prediction and study of these materials face some intrinsic challenges, one of them is how to predict porous structures. We propose a computationally and technically feasible approach for predicting zinc oxide structures with hollows at the nano scale. The designed zinc oxide hollow structures are studied with computations using the density functional tight binding and conventional density functional theory methods, revealing a variety of promising mechanical and electronic properties, which can potentially find future realistic applications.
Refinement of NMR structures using implicit solvent and advanced sampling techniques.

Science.gov (United States)

Chen, Jianhan; Im, Wonpil; Brooks, Charles L

2004-12-15

NMR biomolecular structure calculations exploit simulated annealing methods for conformational sampling and require a relatively high level of redundancy in the experimental restraints to determine quality three-dimensional structures. Recent advances in generalized Born (GB) implicit solvent models should make it possible to combine information from both experimental measurements and accurate empirical force fields to improve the quality of NMR-derived structures. In this paper, we study the influence of implicit solvent on the refinement of protein NMR structures and identify an optimal protocol of utilizing these improved force fields. To do so, we carry out structure refinement experiments for model proteins with published NMR structures using full NMR restraints and subsets of them. We also investigate the application of advanced sampling techniques to NMR structure refinement. Similar to the observations of Xia et al. (J.Biomol. NMR 2002, 22, 317-331), we find that the impact of implicit solvent is rather small when there is a sufficient number of experimental restraints (such as in the final stage of NMR structure determination), whether implicit solvent is used throughout the calculation or only in the final refinement step. The application of advanced sampling techniques also seems to have minimal impact in this case. However, when the experimental data are limited, we demonstrate that refinement with implicit solvent can substantially improve the quality of the structures. In particular, when combined with an advanced sampling technique, the replica exchange (REX) method, near-native structures can be rapidly moved toward the native basin. The REX method provides both enhanced sampling and automatic selection of the most native-like (lowest energy) structures. An optimal protocol based on our studies first generates an ensemble of initial structures that maximally satisfy the available experimental data with conventional NMR software using a simplified
Capturing alternative secondary structures of RNA by decomposition of base-pairing probabilities.

Science.gov (United States)

Hagio, Taichi; Sakuraba, Shun; Iwakiri, Junichi; Mori, Ryota; Asai, Kiyoshi

2018-02-19

It is known that functional RNAs often switch their functions by forming different secondary structures. Popular tools for RNA secondary structures prediction, however, predict the single 'best' structures, and do not produce alternative structures. There are bioinformatics tools to predict suboptimal structures, but it is difficult to detect which alternative secondary structures are essential. We proposed a new computational method to detect essential alternative secondary structures from RNA sequences by decomposing the base-pairing probability matrix. The decomposition is calculated by a newly implemented software tool, RintW, which efficiently computes the base-pairing probability distributions over the Hamming distance from arbitrary reference secondary structures. The proposed approach has been demonstrated on ROSE element RNA thermometer sequence and Lysine RNA ribo-switch, showing that the proposed approach captures conformational changes in secondary structures. We have shown that alternative secondary structures are captured by decomposing base-paring probabilities over Hamming distance. Source code is available from http://www.ncRNA.org/RintW .
Prediction of residues in discontinuous B-cell epitopes using protein 3D structures

DEFF Research Database (Denmark)

Andersen, P.H.; Nielsen, Morten; Lund, Ole

2006-01-01

. We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15...... experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification....
Stochastic Extreme Load Predictions for Marine Structures

DEFF Research Database (Denmark)

Jensen, Jørgen Juncher

1999-01-01

Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non-linearity of the ......Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non......-linearity of the waves and the response. As example the wave-induced bending moment in the ship hull girder is considered....
Predictive modeling of neuroanatomic structures for brain atrophy detection

Science.gov (United States)

Hu, Xintao; Guo, Lei; Nie, Jingxin; Li, Kaiming; Liu, Tianming

2010-03-01

In this paper, we present an approach of predictive modeling of neuroanatomic structures for the detection of brain atrophy based on cross-sectional MRI image. The underlying premise of applying predictive modeling for atrophy detection is that brain atrophy is defined as significant deviation of part of the anatomy from what the remaining normal anatomy predicts for that part. The steps of predictive modeling are as follows. The central cortical surface under consideration is reconstructed from brain tissue map and Regions of Interests (ROI) on it are predicted from other reliable anatomies. The vertex pair-wise distance between the predicted vertex and the true one within the abnormal region is expected to be larger than that of the vertex in normal brain region. Change of white matter/gray matter ratio within a spherical region is used to identify the direction of vertex displacement. In this way, the severity of brain atrophy can be defined quantitatively by the displacements of those vertices. The proposed predictive modeling method has been evaluated by using both simulated atrophies and MRI images of Alzheimer's disease.
Prediction of residential radon exposure of the whole Swiss population: comparison of model-based predictions with measurement-based predictions.

Science.gov (United States)

Hauri, D D; Huss, A; Zimmermann, F; Kuehni, C E; Röösli, M

2013-10-01

Radon plays an important role for human exposure to natural sources of ionizing radiation. The aim of this article is to compare two approaches to estimate mean radon exposure in the Swiss population: model-based predictions at individual level and measurement-based predictions based on measurements aggregated at municipality level. A nationwide model was used to predict radon levels in each household and for each individual based on the corresponding tectonic unit, building age, building type, soil texture, degree of urbanization, and floor. Measurement-based predictions were carried out within a health impact assessment on residential radon and lung cancer. Mean measured radon levels were corrected for the average floor distribution and weighted with population size of each municipality. Model-based predictions yielded a mean radon exposure of the Swiss population of 84.1 Bq/m(3) . Measurement-based predictions yielded an average exposure of 78 Bq/m(3) . This study demonstrates that the model- and the measurement-based predictions provided similar results. The advantage of the measurement-based approach is its simplicity, which is sufficient for assessing exposure distribution in a population. The model-based approach allows predicting radon levels at specific sites, which is needed in an epidemiological study, and the results do not depend on how the measurement sites have been selected. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Antibody modeling using the prediction of immunoglobulin structure (PIGS) web server [corrected].

Science.gov (United States)

Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna; Tramontano, Anna

2014-12-01

Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.
Prediction of material damage in orthotropic metals for virtual structural testing

OpenAIRE

Ravindran, S.

2010-01-01

Models based on the Continuum Damage Mechanics principle are increasingly used for predicting the initiation and growth of damage in materials. The growing reliance on 3-D finite element (FE) virtual structural testing demands implementation and validation of robust material models that can predict the material behaviour accurately. The use of these models within numerical analyses requires suitable material data. EU aerospace companies along with Cranfield University and other similar resear...
Screw Remaining Life Prediction Based on Quantum Genetic Algorithm and Support Vector Machine

Directory of Open Access Journals (Sweden)

Xiaochen Zhang

2017-01-01

Full Text Available To predict the remaining life of ball screw, a screw remaining life prediction method based on quantum genetic algorithm (QGA and support vector machine (SVM is proposed. A screw accelerated test bench is introduced. Accelerometers are installed to monitor the performance degradation of ball screw. Combined with wavelet packet decomposition and isometric mapping (Isomap, the sensitive feature vectors are obtained and stored in database. Meanwhile, the sensitive feature vectors are randomly chosen from the database and constitute training samples and testing samples. Then the optimal kernel function parameter and penalty factor of SVM are searched with the method of QGA. Finally, the training samples are used to train optimized SVM while testing samples are adopted to test the prediction accuracy of the trained SVM so the screw remaining life prediction model can be got. The experiment results show that the screw remaining life prediction model could effectively predict screw remaining life.
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

Directory of Open Access Journals (Sweden)

Mile Sikić

2009-01-01

Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.
A quantitative prediction model of SCC rate for nuclear structure materials in high temperature water based on crack tip creep strain rate

International Nuclear Information System (INIS)

Yang, F.Q.; Xue, H.; Zhao, L.Y.; Fang, X.R.

2014-01-01

Highlights: • Creep is considered to be the primary mechanical factor of crack tip film degradation. • The prediction model of SCC rate is based on crack tip creep strain rate. • The SCC rate calculated at the secondary stage of creep is recommended. • The effect of stress intensity factor on SCC growth rate is discussed. - Abstract: The quantitative prediction of stress corrosion cracking (SCC) of structure materials is essential in safety assessment of nuclear power plants. A new quantitative prediction model is proposed by combining the Ford–Andresen model, a crack tip creep model and an elastic–plastic finite element method. The creep at the crack tip is considered to be the primary mechanical factor of protective film degradation, and the creep strain rate at the crack tip is suggested as primary mechanical factor in predicting the SCC rate. The SCC rates at secondary stage of creep are recommended when using the approach introduced in this study to predict the SCC rates of materials in high temperature water. The proposed approach can be used to understand the SCC crack growth in structural materials of light water reactors
State-based Communication on Time-predictable Multicore Processors

DEFF Research Database (Denmark)

Sørensen, Rasmus Bo; Schoeberl, Martin; Sparsø, Jens

2016-01-01

Some real-time systems use a form of task-to-task communication called state-based or sample-based communication that does not impose any flow control among the communicating tasks. The concept is similar to a shared variable, where a reader may read the same value multiple times or may not read...... a given value at all. This paper explores time-predictable implementations of state-based communication in network-on-chip based multicore platforms through five algorithms. With the presented analysis of the implemented algorithms, the communicating tasks of one core can be scheduled independently...... of tasks on other cores. Assuming a specific time-predictable multicore processor, we evaluate how the read and write primitives of the five algorithms contribute to the worst-case execution time of the communicating tasks. Each of the five algorithms has specific capabilities that make them suitable...

Predicting and validating protein interactions using network structure.

Directory of Open Access Journals (Sweden)

Pao-Yang Chen

2008-07-01

Full Text Available Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.
Predicting Reactive Transport Dynamics in Carbonates using Initial Pore Structure

Science.gov (United States)

Menke, H. P.; Nunes, J. P. P.; Blunt, M. J.

2017-12-01

Understanding rock-fluid interaction at the pore-scale is imperative for accurate predictive modelling of carbon storage permanence. However, coupled reactive transport models are computationally expensive, requiring either a sacrifice of resolution or high performance computing to solve relatively simple geometries. Many recent studies indicate that initial pore structure many be the dominant mechanism in determining the dissolution regime. Here we investigate how well the initial pore structure is predictive of distribution and amount of dissolution during reactive flow using particle tracking on the initial image. Two samples of carbonate rock with varying initial pore space heterogeneity were reacted with reservoir condition CO2-saturated brine and scanned dynamically during reactive flow at a 4-μm resolution between 4 and 40 times using 4D X-ray micro-tomography over the course of 1.5 hours using μ-CT. Flow was modelled on the initial binarized image using a Navier-Stokes solver. Particle tracking was then run on the velocity fields, the streamlines were traced, and the streamline density was calculated both on a voxel-by-voxel and a channel-by-channel basis. The density of streamlines was then compared to the amount of dissolution in subsequent time steps during reaction. It was found that for the flow and transport regimes studied, the streamline density distribution in the initial image accurately predicted the dominant pathways of dissolution and gave good indicators of the type of dissolution regime that would later develop. This work suggests that the eventual reaction-induced changes in pore structure are deterministic rather than stochastic and can be predicted with high resolution imaging of unreacted rock.
Evolving stochastic context-free grammars for RNA secondary structure prediction

DEFF Research Database (Denmark)

Anderson, James WJ; Tataru, Paula Cristina; Stains, Joe

2012-01-01

Background Stochastic Context-Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few...... to structure prediction as has been previously suggested. Results These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars...... with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars. Conclusions Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many...
Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks.

Science.gov (United States)

Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali

2010-12-01

The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10.

Science.gov (United States)

Zhang, Yang

2014-02-01

We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems. Copyright © 2013 Wiley Periodicals, Inc.
Hierarchical prediction of industrial water demand based on refined Laspeyres decomposition analysis.

Science.gov (United States)

Shang, Yizi; Lu, Shibao; Gong, Jiaguo; Shang, Ling; Li, Xiaofei; Wei, Yongping; Shi, Hongwang

2017-12-01

A recent study decomposed the changes in industrial water use into three hierarchies (output, technology, and structure) using a refined Laspeyres decomposition model, and found monotonous and exclusive trends in the output and technology hierarchies. Based on that research, this study proposes a hierarchical prediction approach to forecast future industrial water demand. Three water demand scenarios (high, medium, and low) were then established based on potential future industrial structural adjustments, and used to predict water demand for the structural hierarchy. The predictive results of this approach were compared with results from a grey prediction model (GPM (1, 1)). The comparison shows that the results of the two approaches were basically identical, differing by less than 10%. Taking Tianjin, China, as a case, and using data from 2003-2012, this study predicts that industrial water demand will continuously increase, reaching 580 million m 3 , 776.4 million m 3 , and approximately 1.09 billion m 3 by the years 2015, 2020 and 2025 respectively. It is concluded that Tianjin will soon face another water crisis if no immediate measures are taken. This study recommends that Tianjin adjust its industrial structure with water savings as the main objective, and actively seek new sources of water to increase its supply.
Protein Loop Structure Prediction Using Conformational Space Annealing.

Science.gov (United States)

Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung

2017-05-22

We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.
HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information

Directory of Open Access Journals (Sweden)

Hu Jianjun

2011-05-01

Full Text Available Abstract Background Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Results Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM. The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone
SNP-based heritability estimates of the personality dimensions and polygenic prediction of both neuroticism and major depression: findings from CONVERGE.

Science.gov (United States)

Docherty, A R; Moscati, A; Peterson, R; Edwards, A C; Adkins, D E; Bacanu, S A; Bigdeli, T B; Webb, B T; Flint, J; Kendler, K S

2016-10-25

Biometrical genetic studies suggest that the personality dimensions, including neuroticism, are moderately heritable (~0.4 to 0.6). Quantitative analyses that aggregate the effects of many common variants have recently further informed genetic research on European samples. However, there has been limited research to date on non-European populations. This study examined the personality dimensions in a large sample of Han Chinese descent (N=10 064) from the China, Oxford, and VCU Experimental Research on Genetic Epidemiology study, aimed at identifying genetic risk factors for recurrent major depression among a rigorously ascertained cohort. Heritability of neuroticism as measured by the Eysenck Personality Questionnaire (EPQ) was estimated to be low but statistically significant at 10% (s.e.=0.03, P=0.0001). In addition to EPQ, neuroticism based on a three-factor model, data for the Big Five (BF) personality dimensions (neuroticism, openness, conscientiousness, extraversion and agreeableness) measured by the Big Five Inventory were available for controls (n=5596). Heritability estimates of the BF were not statistically significant despite high power (>0.85) to detect heritabilities of 0.10. Polygenic risk scores constructed by best linear unbiased prediction weights applied to split-half samples failed to significantly predict any of the personality traits, but polygenic risk for neuroticism, calculated with LDpred and based on predictive variants previously identified from European populations (N=171 911), significantly predicted major depressive disorder case-control status (P=0.0004) after false discovery rate correction. The scores also significantly predicted EPQ neuroticism (P=6.3 × 10 -6 ). Factor analytic results of the measures indicated that any differences in heritabilities across samples may be due to genetic variation or variation in haplotype structure between samples, rather than measurement non-invariance. Findings demonstrate that neuroticism
GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

Science.gov (United States)

Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

2017-07-03

Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Magnetophoresis of flexible DNA-based dumbbell structures

Science.gov (United States)

Babić, B.; Ghai, R.; Dimitrov, K.

2008-02-01

Controlled movement and manipulation of magnetic micro- and nanostructures using magnetic forces can give rise to important applications in biomedecine, diagnostics, and immunology. We report controlled magnetophoresis and stretching, in aqueous solution, of a DNA-based dumbbell structure containing magnetic and diamagnetic microspheres. The velocity and stretching of the dumbbell were experimentally measured and correlated with a theoretical model based on the forces acting on individual magnetic beads or the entire dumbbell structures. The results show that precise and predictable manipulation of dumbbell structures is achievable and can potentially be applied to immunomagnetic cell separators.
A novel knowledge-based potential for RNA 3D structure evaluation

Science.gov (United States)

Yang, Yi; Gu, Qi; Zhang, Ben-Gong; Shi, Ya-Zhou; Shao, Zhi-Gang

2018-03-01

Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation. Project supported by the National Science Foundation of China (Grants Nos. 11605125, 11105054, 11274124, and 11401448).
Advancing viral RNA structure prediction: measuring the thermodynamics of pyrimidine-rich internal loops.

Science.gov (United States)

Phan, Andy; Mailey, Katherine; Saeki, Jessica; Gu, Xiaobo; Schroeder, Susan J

2017-05-01

Accurate thermodynamic parameters improve RNA structure predictions and thus accelerate understanding of RNA function and the identification of RNA drug binding sites. Many viral RNA structures, such as internal ribosome entry sites, have internal loops and bulges that are potential drug target sites. Current models used to predict internal loops are biased toward small, symmetric purine loops, and thus poorly predict asymmetric, pyrimidine-rich loops with >6 nucleotides (nt) that occur frequently in viral RNA. This article presents new thermodynamic data for 40 pyrimidine loops, many of which can form UU or protonated CC base pairs. Uracil and protonated cytosine base pairs stabilize asymmetric internal loops. Accurate prediction rules are presented that account for all thermodynamic measurements of RNA asymmetric internal loops. New loop initiation terms for loops with >6 nt are presented that do not follow previous assumptions that increasing asymmetry destabilizes loops. Since the last 2004 update, 126 new loops with asymmetry or sizes greater than 2 × 2 have been measured. These new measurements significantly deepen and diversify the thermodynamic database for RNA. These results will help better predict internal loops that are larger, pyrimidine-rich, and occur within viral structures such as internal ribosome entry sites. © 2017 Phan et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Churn prediction based on text mining and CRM data analysis

OpenAIRE

Schatzmann, Anders; Heitz, Christoph; Münch, Thomas

2014-01-01

Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. ...
Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

Science.gov (United States)

Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.
Prediction Model of Collapse Risk Based on Information Entropy and Distance Discriminant Analysis Method

Directory of Open Access Journals (Sweden)

Hujun He

2017-01-01

Full Text Available The prediction and risk classification of collapse is an important issue in the process of highway construction in mountainous regions. Based on the principles of information entropy and Mahalanobis distance discriminant analysis, we have produced a collapse hazard prediction model. We used the entropy measure method to reduce the influence indexes of the collapse activity and extracted the nine main indexes affecting collapse activity as the discriminant factors of the distance discriminant analysis model (i.e., slope shape, aspect, gradient, and height, along with exposure of the structural face, stratum lithology, relationship between weakness face and free face, vegetation cover rate, and degree of rock weathering. We employ postearthquake collapse data in relation to construction of the Yingxiu-Wolong highway, Hanchuan County, China, as training samples for analysis. The results were analyzed using the back substitution estimation method, showing high accuracy and no errors, and were the same as the prediction result of uncertainty measure. Results show that the classification model based on information entropy and distance discriminant analysis achieves the purpose of index optimization and has excellent performance, high prediction accuracy, and a zero false-positive rate. The model can be used as a tool for future evaluation of collapse risk.
Mapping monomeric threading to protein-protein structure prediction.

Science.gov (United States)

Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

2013-03-25

The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.
Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Directory of Open Access Journals (Sweden)

Martin Alberto JM

2009-01-01

Full Text Available Abstract Background Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure. Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cα trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10% yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 �
EVA: continuous automatic evaluation of protein structure prediction servers.

Science.gov (United States)

Eyrich, V A; Martí-Renom, M A; Przybylski, D; Madhusudhan, M S; Fiser, A; Pazos, F; Valencia, A; Sali, A; Rost, B

2001-12-01

Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. http://cubic.bioc.columbia.edu/eva. eva@cubic.bioc.columbia.edu
Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

DEFF Research Database (Denmark)

Ashraf, Bilal

In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

Directory of Open Access Journals (Sweden)

Chi-Hua Tung

2016-01-01

Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.
Predicting deleterious nsSNPs: an analysis of sequence and structural attributes

Directory of Open Access Journals (Sweden)

Saqi Mansoor AS

2006-04-01

Full Text Available Abstract Background There has been an explosion in the number of single nucleotide polymorphisms (SNPs within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs, some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. Results The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. Conclusion The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html
Seminal Quality Prediction Using Clustering-Based Decision Forests

Directory of Open Access Journals (Sweden)

Hong Wang

2014-08-01

Full Text Available Prediction of seminal quality with statistical learning tools is an emerging methodology in decision support systems in biomedical engineering and is very useful in early diagnosis of seminal patients and selection of semen donors candidates. However, as is common in medical diagnosis, seminal quality prediction faces the class imbalance problem. In this paper, we propose a novel supervised ensemble learning approach, namely Clustering-Based Decision Forests, to tackle unbalanced class learning problem in seminal quality prediction. Experiment results on real fertility diagnosis dataset have shown that Clustering-Based Decision Forests outperforms decision tree, Support Vector Machines, random forests, multilayer perceptron neural networks and logistic regression by a noticeable margin. Clustering-Based Decision Forests can also be used to evaluate variables’ importance and the top five important factors that may affect semen concentration obtained in this study are age, serious trauma, sitting time, the season when the semen sample is produced, and high fevers in the last year. The findings could be helpful in explaining seminal concentration problems in infertile males or pre-screening semen donor candidates.
Bioinformatical approaches to RNA structure prediction & Sequencing of an ancient human genome

DEFF Research Database (Denmark)

Lindgreen, Stinus

Stinus Lindgreen has been working in two different fields during his Ph.D. The first part has been focused on computational approaches to predict the structure of non-coding RNA molecules at the base pairing level. This has resulted in the analysis of various measures of the base pairing potentia...
Predicting Consensus Structures for RNA Alignments Via Pseudo-Energy Minimization

Directory of Open Access Journals (Sweden)

Junilda Spirollari

2009-01-01

Full Text Available Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http:// datalab.njit.edu/biology/RSpredict.
Prediction of the Fundamental Period of Infilled RC Frame Structures Using Artificial Neural Networks

Directory of Open Access Journals (Sweden)

Panagiotis G. Asteris

2016-01-01

Full Text Available The fundamental period is one of the most critical parameters for the seismic design of structures. There are several literature approaches for its estimation which often conflict with each other, making their use questionable. Furthermore, the majority of these approaches do not take into account the presence of infill walls into the structure despite the fact that infill walls increase the stiffness and mass of structure leading to significant changes in the fundamental period. In the present paper, artificial neural networks (ANNs are used to predict the fundamental period of infilled reinforced concrete (RC structures. For the training and the validation of the ANN, a large data set is used based on a detailed investigation of the parameters that affect the fundamental period of RC structures. The comparison of the predicted values with analytical ones indicates the potential of using ANNs for the prediction of the fundamental period of infilled RC frame structures taking into account the crucial parameters that influence its value.
NAPR: a Cloud-Based Framework for Neuroanatomical Age Prediction.

Science.gov (United States)

Pardoe, Heath R; Kuzniecky, Ruben

2018-01-01

The availability of cloud computing services has enabled the widespread adoption of the "software as a service" (SaaS) approach for software distribution, which utilizes network-based access to applications running on centralized servers. In this paper we apply the SaaS approach to neuroimaging-based age prediction. Our system, named "NAPR" (Neuroanatomical Age Prediction using R), provides access to predictive modeling software running on a persistent cloud-based Amazon Web Services (AWS) compute instance. The NAPR framework allows external users to estimate the age of individual subjects using cortical thickness maps derived from their own locally processed T1-weighted whole brain MRI scans. As a demonstration of the NAPR approach, we have developed two age prediction models that were trained using healthy control data from the ABIDE, CoRR, DLBS and NKI Rockland neuroimaging datasets (total N = 2367, age range 6-89 years). The provided age prediction models were trained using (i) relevance vector machines and (ii) Gaussian processes machine learning methods applied to cortical thickness surfaces obtained using Freesurfer v5.3. We believe that this transparent approach to out-of-sample evaluation and comparison of neuroimaging age prediction models will facilitate the development of improved age prediction models and allow for robust evaluation of the clinical utility of these methods.
Validation of Quantitative Structure-Activity Relationship (QSAR Model for Photosensitizer Activity Prediction

Directory of Open Access Journals (Sweden)

Sharifuddin M. Zain

2011-11-01

Full Text Available Photodynamic therapy is a relatively new treatment method for cancer which utilizes a combination of oxygen, a photosensitizer and light to generate reactive singlet oxygen that eradicates tumors via direct cell-killing, vasculature damage and engagement of the immune system. Most of photosensitizers that are in clinical and pre-clinical assessments, or those that are already approved for clinical use, are mainly based on cyclic tetrapyrroles. In an attempt to discover new effective photosensitizers, we report the use of the quantitative structure-activity relationship (QSAR method to develop a model that could correlate the structural features of cyclic tetrapyrrole-based compounds with their photodynamic therapy (PDT activity. In this study, a set of 36 porphyrin derivatives was used in the model development where 24 of these compounds were in the training set and the remaining 12 compounds were in the test set. The development of the QSAR model involved the use of the multiple linear regression analysis (MLRA method. Based on the method, r2 value, r2 (CV value and r2 prediction value of 0.87, 0.71 and 0.70 were obtained. The QSAR model was also employed to predict the experimental compounds in an external test set. This external test set comprises 20 porphyrin-based compounds with experimental IC50 values ranging from 0.39 µM to 7.04 µM. Thus the model showed good correlative and predictive ability, with a predictive correlation coefficient (r2 prediction for external test set of 0.52. The developed QSAR model was used to discover some compounds as new lead photosensitizers from this external test set.
Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

Science.gov (United States)

Bratholm, Lars A; Jensen, Jan H

2017-03-01

The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural
Predicted versus observed cosmic-ray-produced noble gases in lunar samples: improved Kr production ratios

International Nuclear Information System (INIS)

Regnier, S.; Hohenberg, C.M.; Marti, K.; Reedy, R.C.

1979-01-01

New sets of cross sections for the production of krypton isotopes from targets of Rb, Sr, Y, and Zr were constructed primarily on the bases of experimental excitation functions for Kr production from Y. These cross sections were used to calculate galactic-cosmic-ray and solar-proton production rates for Kr isotopes in the moon. Spallation Kr data obtained from ilmenite separates of rocks 10017 and 10047 are reported. Production rates and isotopic ratios for cosmogenic Kr observed in ten well-documented lunar samples and in ilmenite separates and bulk samples from several lunar rocks with long but unknown irradiation histories were compared with predicted rates and ratios. The agreements were generally quite good. Erosion of rock surfaces affected rates or ratios for only near-surface samples, where solar-proton production is important. There were considerable spreads in predicted-to-observed production rates of 83 Kr, due at least in part to uncertainties in chemical abundances. The 78 Kr/ 83 Kr ratios were predicted quite well for samples with a wide range of Zr/Sr abundance ratios. The calculated 80 Kr/ 83 Kr ratios were greater than the observed ratios when production by the 79 Br(n,γ) reaction was included, but were slightly undercalculated if the Br reaction was omitted; these results suggest that Br(n,γ)-produced Kr is not retained well by lunar rocks. The productions of 81 Kr and 82 Kr were overcalculated by approximately 10% relative to 83 Kr. Predicted-to-observed 84 Kr/ 83 ratios scattered considerably, possibly because of uncertainties in corrections for trapped and fission components and in cross sections for 84 Kr production. Most predicted 84 Kr and 86 Kr production rates were lower than observed. Shielding depths of several Apollo 11 rocks were determined from the measured 78 Kr/ 83 Kr ratios of ilmenite separates. 4 figures, 5 tables
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

Science.gov (United States)

Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

2017-10-01

The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Prediction degradation trend of nuclear equipment based on GM (1, 1)-Markov chain

International Nuclear Information System (INIS)

Zhang Liming; Zhao Xinwen; Cai Qi; Wu Guangjiang

2010-01-01

The degradation trend prediction results are important references for nuclear equipment in-service inspection and maintenance plan. But it is difficult to predict the nuclear equipment degradation trend accurately by the traditional statistical probability due to the small samples, lack of degradation data and the wavy degradation locus. Therefore, a method of equipment degradation trend prediction based on GM (1, l)-Markov chain was proposed in this paper. The method which makes use of the advantages of both GM (1, 1) method and Markov chain could improve the prediction precision of nuclear equipment degradation trend. The paper collected degradation data as samples and accurately predicted the degradation trend of canned motor pump. Compared with the prediction results by GM (1, 1) method, the prediction precision by GM (1, l)-Markov chain is more accurate. (authors)
Comparing Structural Identification Methodologies for Fatigue Life Prediction of a Highway Bridge

Directory of Open Access Journals (Sweden)

Sai G. S. Pai

2018-01-01

Full Text Available Accurate measurement-data interpretation leads to increased understanding of structural behavior and enhanced asset-management decision making. In this paper, four data-interpretation methodologies, residual minimization, traditional Bayesian model updating, modified Bayesian model updating (with an L∞-norm-based Gaussian likelihood function, and error-domain model falsification (EDMF, a method that rejects models that have unlikely differences between predictions and measurements, are compared. In the modified Bayesian model updating methodology, a correction is used in the likelihood function to account for the effect of a finite number of measurements on posterior probability–density functions. The application of these data-interpretation methodologies for condition assessment and fatigue life prediction is illustrated on a highway steel–concrete composite bridge having four spans with a total length of 219 m. A detailed 3D finite-element plate and beam model of the bridge and weigh-in-motion data are used to obtain the time–stress response at a fatigue critical location along the bridge span. The time–stress response, presented as a histogram, is compared to measured strain responses either to update prior knowledge of model parameters using residual minimization and Bayesian methodologies or to obtain candidate model instances using the EDMF methodology. It is concluded that the EDMF and modified Bayesian model updating methodologies provide robust prediction of fatigue life compared with residual minimization and traditional Bayesian model updating in the presence of correlated non-Gaussian uncertainty. EDMF has additional advantages due to ease of understanding and applicability for practicing engineers, thus enabling incremental asset-management decision making over long service lives. Finally, parallel implementations of EDMF using grid sampling have lower computations times than implementations using adaptive sampling.
Detecting spatial structures in throughfall data: The effect of extent, sample size, sampling design, and variogram estimation method

Science.gov (United States)

Voss, Sebastian; Zimmermann, Beate; Zimmermann, Alexander

2016-09-01

In the last decades, an increasing number of studies analyzed spatial patterns in throughfall by means of variograms. The estimation of the variogram from sample data requires an appropriate sampling scheme: most importantly, a large sample and a layout of sampling locations that often has to serve both variogram estimation and geostatistical prediction. While some recommendations on these aspects exist, they focus on Gaussian data and high ratios of the variogram range to the extent of the study area. However, many hydrological data, and throughfall data in particular, do not follow a Gaussian distribution. In this study, we examined the effect of extent, sample size, sampling design, and calculation method on variogram estimation of throughfall data. For our investigation, we first generated non-Gaussian random fields based on throughfall data with large outliers. Subsequently, we sampled the fields with three extents (plots with edge lengths of 25 m, 50 m, and 100 m), four common sampling designs (two grid-based layouts, transect and random sampling) and five sample sizes (50, 100, 150, 200, 400). We then estimated the variogram parameters by method-of-moments (non-robust and robust estimators) and residual maximum likelihood. Our key findings are threefold. First, the choice of the extent has a substantial influence on the estimation of the variogram. A comparatively small ratio of the extent to the correlation length is beneficial for variogram estimation. Second, a combination of a minimum sample size of 150, a design that ensures the sampling of small distances and variogram estimation by residual maximum likelihood offers a good compromise between accuracy and efficiency. Third, studies relying on method-of-moments based variogram estimation may have to employ at least 200 sampling points for reliable variogram estimates. These suggested sample sizes exceed the number recommended by studies dealing with Gaussian data by up to 100 %. Given that most previous
Domain Adaptation for Pedestrian Detection Based on Prediction Consistency

Directory of Open Access Journals (Sweden)

Yu Li-ping

2014-01-01

Full Text Available Pedestrian detection is an active area of research in computer vision. It remains a quite challenging problem in many applications where many factors cause a mismatch between source dataset used to train the pedestrian detector and samples in the target scene. In this paper, we propose a novel domain adaptation model for merging plentiful source domain samples with scared target domain samples to create a scene-specific pedestrian detector that performs as well as rich target domain simples are present. Our approach combines the boosting-based learning algorithm with an entropy-based transferability, which is derived from the prediction consistency with the source classifications, to selectively choose the samples showing positive transferability in source domains to the target domain. Experimental results show that our approach can improve the detection rate, especially with the insufficient labeled data in target scene.
Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12.

Science.gov (United States)

Haas, Jürgen; Barbato, Alessandro; Behringer, Dario; Studer, Gabriel; Roth, Steven; Bertoni, Martino; Mostaguir, Khaled; Gumienny, Rafal; Schwede, Torsten

2018-03-01

Every second year, the community experiment "Critical Assessment of Techniques for Structure Prediction" (CASP) is conducting an independent blind assessment of structure prediction methods, providing a framework for comparing the performance of different approaches and discussing the latest developments in the field. Yet, developers of automated computational modeling methods clearly benefit from more frequent evaluations based on larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements the CASP experiment by conducting fully automated blind prediction assessments based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the PDB Protein Data Bank. CAMEO publishes weekly benchmarking results based on models collected during a 4-day prediction window, on average assessing ca. 100 targets during a time frame of 5 weeks. CAMEO benchmarking data is generated consistently for all participating methods at the same point in time, enabling developers to benchmark and cross-validate their method's performance, and directly refer to the benchmarking results in publications. In order to facilitate server development and promote shorter release cycles, CAMEO sends weekly email with submission statistics and low performance warnings. Many participants of CASP have successfully employed CAMEO when preparing their methods for upcoming community experiments. CAMEO offers a variety of scores to allow benchmarking diverse aspects of structure prediction methods. By introducing new scoring schemes, CAMEO facilitates new development in areas of active research, for example, modeling quaternary structure, complexes, or ligand binding sites. © 2017 Wiley Periodicals, Inc.
Structural syntactic prediction measured with ELAN: evidence from ERPs.

Science.gov (United States)

Fonteneau, Elisabeth

2013-02-08

The current study used event-related potentials (ERPs) to investigate how and when argument structure information is used during the processing of sentences with a filler-gap dependency. We hypothesize that one specific property - animacy (living vs. non-living) - is used by the parser during the building of the syntactic structure. Participants heard sentences that were rated off-line as having an expected noun (Who did the Lion King chase the caravan with?) or an unexpected noun (Who did Lion King chase the animal with?). This prediction is based on the animacy properties relation between the wh-word and the noun in the object position. ERPs from the noun in the unexpected condition (animal) elicited a typical Early Left Anterior Negativity (ELAN)/P600 complex compared to the noun in the expected condition (caravan). Firstly, these results demonstrate that the ELAN reflects not only grammatical category violation but also animacy property expectations in filler-gap dependency. Secondly, our data suggests that the language comprehension system is able to make detailed predictions about aspects of the upcoming words to build up the syntactic structure. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Prediction of the Formulation Dependence of the Glass Transition Temperature for Amine-Epoxy Copolymers Using a Quantitative Structure-Property Relationship Based on the AM1 Method

National Research Council Canada - National Science Library

Morrill, Jason

2004-01-01

A designer Quantitative Structure-Property Relationsbip (QSPR) based upon molecular properties calculated using the AM1 semi-empirical quantum mechanical metbod was developed to predict the glass transition temperature (Tg...
Structure of nonevaporating sprays - Measurements and predictions

Science.gov (United States)

Solomon, A. S. P.; Shuen, J.-S.; Zhang, Q.-F.; Faeth, G. M.

1984-01-01

Structure measurements were completed within the dilute portion of axisymmetric nonevaporating sprays (SMD of 30 and 87 microns) injected into a still air environment, including: mean and fluctuating gas velocities and Reynolds stress using laser-Doppler anemometry; mean liquid fluxes using isokinetic sampling; drop sizes using slide impaction; and drop sizes and velocities using multiflash photography. The new measurements were used to evaluate three representative models of sprays: (1) a locally homogeneous flow (LHF) model, where slip between the phases was neglected; (2) a deterministic separated flow (DSF) model, where slip was considered but effects of drop interaction with turbulent fluctuations were ignored; and (3) a stochastic separated flow (SSF) model, where effects of both interphase slip and turbulent fluctuations were considered using random sampling for turbulence properties in conjunction with random-walk computations for drop motion. The LHF and DSF models were unsatisfactory for present test conditions-both underestimating flow widths and the rate of spread of drops. In contrast, the SSF model provided reasonably accurate predictions, including effects of enhanced spreading rates of sprays due to drop dispersion by turbulence, with all empirical parameters fixed from earlier work.
The sequential structure of brain activation predicts skill.

Science.gov (United States)

Anderson, John R; Bothell, Daniel; Fincham, Jon M; Moon, Jungaa

2016-01-29

In an fMRI study, participants were trained to play a complex video game. They were scanned early and then again after substantial practice. While better players showed greater activation in one region (right dorsal striatum) their relative skill was better diagnosed by considering the sequential structure of whole brain activation. Using a cognitive model that played this game, we extracted a characterization of the mental states that are involved in playing a game and the statistical structure of the transitions among these states. There was a strong correspondence between this measure of sequential structure and the skill of different players. Using multi-voxel pattern analysis, it was possible to recognize, with relatively high accuracy, the cognitive states participants were in during particular scans. We used the sequential structure of these activation-recognized states to predict the skill of individual players. These findings indicate that important features about information-processing strategies can be identified from a model-based analysis of the sequential structure of brain activation. Copyright © 2015 Elsevier Ltd. All rights reserved.

Aromatic claw: A new fold with high aromatic content that evades structural prediction: Aromatic Claw

Energy Technology Data Exchange (ETDEWEB)

Sachleben, Joseph R. [Biomolecular NMR Core Facility, University of Chicago, Chicago Illinois; Adhikari, Aashish N. [Department of Chemistry, University of Chicago, Chicago Illinois; Gawlak, Grzegorz [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Hoey, Robert J. [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Liu, Gaohua [Northeast Structural Genomics Consortium (NESG), Department of Molecular Biology and Biochemistry, School of Arts and Sciences, and Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, and Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway New Jersey; Joachimiak, Andrzej [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Biological Sciences Division, Argonne National Laboratory, Argonne Illinois; Montelione, Gaetano T. [Northeast Structural Genomics Consortium (NESG), Department of Molecular Biology and Biochemistry, School of Arts and Sciences, and Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, and Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway New Jersey; Sosnick, Tobin R. [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Koide, Shohei [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Department of Biochemistry and Molecular Pharmacology and the Perlmutter Cancer Center, New York University School of Medicine, New York New York

2016-11-10

We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β-sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α-helical with a small two-stranded β-sheet with a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.
A burnout prediction model based around char morphology

Energy Technology Data Exchange (ETDEWEB)

Tao Wu; Edward Lester; Michael Cloke [University of Nottingham, Nottingham (United Kingdom). School of Chemical, Environmental and Mining Engineering

2006-05-15

Several combustion models have been developed that can make predictions about coal burnout and burnout potential. Most of these kinetic models require standard parameters such as volatile content and particle size to make a burnout prediction. This article presents a new model called the char burnout (ChB) model, which also uses detailed information about char morphology in its prediction. The input data to the model is based on information derived from two different image analysis techniques. One technique generates characterization data from real char samples, and the other predicts char types based on characterization data from image analysis of coal particles. The pyrolyzed chars in this study were created in a drop tube furnace operating at 1300{sup o}C, 200 ms, and 1% oxygen. Modeling results were compared with a different carbon burnout kinetic model as well as the actual burnout data from refiring the same chars in a drop tube furnace operating at 1300{sup o}C, 5% oxygen, and residence times of 200, 400, and 600 ms. A good agreement between ChB model and experimental data indicates that the inclusion of char morphology in combustion models could well improve model predictions. 38 refs., 5 figs., 6 tabs.
Inorganic Nitrogen Application Affects Both Taxonomical and Predicted Functional Structure of Wheat Rhizosphere Bacterial Communities

Directory of Open Access Journals (Sweden)

Vanessa N. Kavamura

2018-05-01

Full Text Available The effects of fertilizer regime on bulk soil microbial communities have been well studied, but this is not the case for the rhizosphere microbiome. The aim of this work was to assess the impact of fertilization regime on wheat rhizosphere microbiome assembly and 16S rRNA gene-predicted functions with soil from the long term Broadbalk experiment at Rothamsted Research. Soil from four N fertilization regimes (organic N, zero N, medium inorganic N and high inorganic N was sown with seeds of Triticum aestivum cv. Cadenza. 16S rRNA gene amplicon sequencing was performed with the Illumina platform on bulk soil and rhizosphere samples of 4-week-old and flowering plants (10 weeks. Phylogenetic and 16S rRNA gene-predicted functional analyses were performed. Fertilization regime affected the structure and composition of wheat rhizosphere bacterial communities. Acidobacteria and Planctomycetes were significantly depleted in treatments receiving inorganic N, whereas the addition of high levels of inorganic N enriched members of the phylum Bacteroidetes, especially after 10 weeks. Bacterial richness and diversity decreased with inorganic nitrogen inputs and was highest after organic treatment (FYM. In general, high levels of inorganic nitrogen fertilizers negatively affect bacterial richness and diversity, leading to a less stable bacterial community structure over time, whereas, more stable bacterial communities are provided by organic amendments. 16S rRNA gene-predicted functional structure was more affected by growth stage than by fertilizer treatment, although, some functions related to energy metabolism and metabolism of terpenoids and polyketides were enriched in samples not receiving any inorganic N, whereas inorganic N addition enriched predicted functions related to metabolism of other amino acids and carbohydrates. Understanding the impact of different fertilizers on the structure and dynamics of the rhizosphere microbiome is an important step
RNA 3D modules in genome-wide predictions of RNA 2D structure

DEFF Research Database (Denmark)

Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

2015-01-01

. These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D......Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution...
Longitudinal connectome-based predictive modeling for REM sleep behavior disorder from structural brain connectivity

Science.gov (United States)

Giancardo, Luca; Ellmore, Timothy M.; Suescun, Jessika; Ocasio, Laura; Kamali, Arash; Riascos-Castaneda, Roy; Schiess, Mya C.

2018-02-01

Methods to identify neuroplasticity patterns in human brains are of the utmost importance in understanding and potentially treating neurodegenerative diseases. Parkinson disease (PD) research will greatly benefit and advance from the discovery of biomarkers to quantify brain changes in the early stages of the disease, a prodromal period when subjects show no obvious clinical symptoms. Diffusion tensor imaging (DTI) allows for an in-vivo estimation of the structural connectome inside the brain and may serve to quantify the degenerative process before the appearance of clinical symptoms. In this work, we introduce a novel strategy to compute longitudinal structural connectomes in the context of a whole-brain data-driven pipeline. In these initial tests, we show that our predictive models are able to distinguish controls from asymptomatic subjects at high risk of developing PD (REM sleep behavior disorder, RBD) with an area under the receiving operating characteristic curve of 0.90 (pParkinson's Progression Markers Initiative. By analyzing the brain connections most relevant for the predictive ability of the best performing model, we find connections that are biologically relevant to the disease.
Protein secondary structure: category assignment and predictability

DEFF Research Database (Denmark)

Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

2001-01-01

In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary......-forward neural network with one hidden layer on a data set identical to the one used in earlier work....
Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

Science.gov (United States)

Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

2011-08-01

The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.
Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction

Directory of Open Access Journals (Sweden)

Cobaugh Christian W

2004-08-01

Full Text Available Abstract Background A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1. Results The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases. Conclusion Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.
Structural properties of MHC class II ligands, implications for the prediction of MHC class II epitopes.

Directory of Open Access Journals (Sweden)

Kasper Winther Jørgensen

2010-12-01

Full Text Available Major Histocompatibility class II (MHC-II molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Prediction of MHC class II ligands is complicated by the open binding cleft of the MHC class II molecule, allowing binding of peptides extending out of the binding groove. Furthermore, only a few HLA-DR alleles have been characterized with a sufficient number of peptides (100-200 peptides per allele to derive accurate description of their binding motif. Little work has been performed characterizing structural properties of MHC class II ligands. Here, we perform one such large-scale analysis. A large set of SYFPEITHI MHC class II ligands covering more than 20 different HLA-DR molecules was analyzed in terms of their secondary structure and surface exposure characteristics in the context of the native structure of the corresponding source protein. We demonstrated that MHC class II ligands are significantly more exposed and have significantly more coil content than other peptides in the same protein with similar predicted binding affinity. We next exploited this observation to derive an improved prediction method for MHC class II ligands by integrating prediction of MHC- peptide binding with prediction of surface exposure and protein secondary structure. This combined prediction method was shown to significantly outperform the state-of-the-art MHC class II peptide binding prediction method when used to identify MHC class II ligands. We also tried to integrate N- and O-glycosylation in our prediction methods but this additional information was found not to improve prediction performance. In summary, these findings strongly suggest that local structural properties influence antigen processing and/or the accessibility of peptides to the MHC class II molecule.
Predictive Modeling of Mechanical Properties of Welded Joints Based on Dynamic Fuzzy RBF Neural Network

Directory of Open Access Journals (Sweden)

ZHANG Yongzhi

2016-10-01

Full Text Available A dynamic fuzzy RBF neural network model was built to predict the mechanical properties of welded joints, and the purpose of the model was to overcome the shortcomings of static neural networks including structural identification, dynamic sample training and learning algorithm. The structure and parameters of the model are no longer head of default, dynamic adaptive adjustment in the training, suitable for dynamic sample data for learning, learning algorithm introduces hierarchical learning and fuzzy rule pruning strategy, to accelerate the training speed of model and make the model more compact. Simulation of the model was carried out by using three kinds of thickness and different process TC4 titanium alloy TIG welding test data. The results show that the model has higher prediction accuracy, which is suitable for predicting the mechanical properties of welded joints, and has opened up a new way for the on-line control of the welding process.
GARN2: coarse-grained prediction of 3D structure of large RNA molecules by regret minimization.

Science.gov (United States)

Boudard, Mélanie; Barth, Dominique; Bernauer, Julie; Denise, Alain; Cohen, Johanne

2017-08-15

Predicting the 3D structure of RNA molecules is a key feature towards predicting their functions. Methods which work at atomic or nucleotide level are not suitable for large molecules. In these cases, coarse-grained prediction methods aim to predict a shape which could be refined later by using more precise methods on smaller parts of the molecule. We developed a complete method for sampling 3D RNA structure at a coarse-grained model, taking a secondary structure as input. One of the novelties of our method is that a second step extracts two best possible structures close to the native, from a set of possible structures. Although our method benefits from the first version of GARN, some of the main features on GARN2 are very different. GARN2 is much faster than the previous version and than the well-known methods of the state-of-art. Our experiments show that GARN2 can also provide better structures than the other state-of-the-art methods. GARN2 is written in Java. It is freely distributed and available at http://garn.lri.fr/. melanie.boudard@lri.fr or johanne.cohen@lri.fr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

KAUST Repository

Chen, Peng

2015-12-03

Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures
A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

KAUST Repository

Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

2015-01-01

Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures
Towards Prediction Of Crystal Structure Of Al-Rich Intermetallides Formed In Al-T-A Systems

International Nuclear Information System (INIS)

Bram, Avraham I.; Meshic, Louisa; Ilse Katz institute for nanotechnology, Ben Gurion University of the Negev; Venkert, Arie

2014-01-01

Crystal structure of the material has a significant contribution on its properties. However, there is no universal model that can predict precisely the crystallographic structure of a stable material at specific composition and temperature. Since the 1950's, various prediction approaches were developed and yielded many different methods of computer simulation and innovative theories which are summarized in the review of Woodley et al. These methods are based on complicated calculations of quantum sizes
Structural Properties as Proxy for Semantic Relevance in RDF Graph Sampling

NARCIS (Netherlands)

Rietveld, L.; Hoekstra, R.; Schlobach, S.; Guéret, C.; Mika, P.; Tudorache, T.; Bernstein, A.; Welty, C.; Knoblock, C.; Vrandečić, D.; Groth, P.; Noy, N.; Janowicz, K.; Goble, C.

2014-01-01

The Linked Data cloud has grown to become the largest knowledge base ever constructed. Its size is now turning into a major bottleneck for many applications. In order to facilitate access to this structured information, this paper proposes an automatic sampling method targeted at maximizing answer
Comparison of Simple Versus Performance-Based Fall Prediction Models

Directory of Open Access Journals (Sweden)

Shekhar K. Gadkaree BS

2015-05-01

Full Text Available Objective: To compare the predictive ability of standard falls prediction models based on physical performance assessments with more parsimonious prediction models based on self-reported data. Design: We developed a series of fall prediction models progressing in complexity and compared area under the receiver operating characteristic curve (AUC across models. Setting: National Health and Aging Trends Study (NHATS, which surveyed a nationally representative sample of Medicare enrollees (age ≥65 at baseline (Round 1: 2011-2012 and 1-year follow-up (Round 2: 2012-2013. Participants: In all, 6,056 community-dwelling individuals participated in Rounds 1 and 2 of NHATS. Measurements: Primary outcomes were 1-year incidence of “ any fall ” and “ recurrent falls .” Prediction models were compared and validated in development and validation sets, respectively. Results: A prediction model that included demographic information, self-reported problems with balance and coordination, and previous fall history was the most parsimonious model that optimized AUC for both any fall (AUC = 0.69, 95% confidence interval [CI] = [0.67, 0.71] and recurrent falls (AUC = 0.77, 95% CI = [0.74, 0.79] in the development set. Physical performance testing provided a marginal additional predictive value. Conclusion: A simple clinical prediction model that does not include physical performance testing could facilitate routine, widespread falls risk screening in the ambulatory care setting.
IRSS: a web-based tool for automatic layout and analysis of IRES secondary structure prediction and searching system in silico

Directory of Open Access Journals (Sweden)

Hong Jun-Jie

2009-05-01

Full Text Available Abstract Background Internal ribosomal entry sites (IRESs provide alternative, cap-independent translation initiation sites in eukaryotic cells. IRES elements are important factors in viral genomes and are also useful tools for bi-cistronic expression vectors. Most existing RNA structure prediction programs are unable to deal with IRES elements. Results We designed an IRES search system, named IRSS, to obtain better results for IRES prediction. RNA secondary structure prediction and comparison software programs were implemented to construct our two-stage strategy for the IRSS. Two software programs formed the backbone of IRSS: the RNAL fold program, used to predict local RNA secondary structures by minimum free energy method; and the RNA Align program, used to compare predicted structures. After complete viral genome database search, the IRSS have low error rate and up to 72.3% sensitivity in appropriated parameters. Conclusion IRSS is freely available at this website http://140.135.61.9/ires/. In addition, all source codes, precompiled binaries, examples and documentations are downloadable for local execution. This new search approach for IRES elements will provide a useful research tool on IRES related studies.
SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction.

Science.gov (United States)

Boniecki, Michal J; Lach, Grzegorz; Dawson, Wayne K; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M; Bujnicki, Janusz M

2016-04-20

RNA molecules play fundamental roles in cellular processes. Their function and interactions with other biomolecules are dependent on the ability to form complex three-dimensional (3D) structures. However, experimental determination of RNA 3D structures is laborious and challenging, and therefore, the majority of known RNAs remain structurally uncharacterized. Here, we present SimRNA: a new method for computational RNA 3D structure prediction, which uses a coarse-grained representation, relies on the Monte Carlo method for sampling the conformational space, and employs a statistical potential to approximate the energy and identify conformations that correspond to biologically relevant structures. SimRNA can fold RNA molecules using only sequence information, and, on established test sequences, it recapitulates secondary structure with high accuracy, including correct prediction of pseudoknots. For modeling of complex 3D structures, it can use additional restraints, derived from experimental or computational analyses, including information about secondary structure and/or long-range contacts. SimRNA also can be used to analyze conformational landscapes and identify potential alternative structures. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Predicting risk of trace element pollution from municipal roads using site-specific soil samples and remotely sensed data.

Science.gov (United States)

Reeves, Mari Kathryn; Perdue, Margaret; Munk, Lee Ann; Hagedorn, Birgit

2018-07-15

Studies of environmental processes exhibit spatial variation within data sets. The ability to derive predictions of risk from field data is a critical path forward in understanding the data and applying the information to land and resource management. Thanks to recent advances in predictive modeling, open source software, and computing, the power to do this is within grasp. This article provides an example of how we predicted relative trace element pollution risk from roads across a region by combining site specific trace element data in soils with regional land cover and planning information in a predictive model framework. In the Kenai Peninsula of Alaska, we sampled 36 sites (191 soil samples) adjacent to roads for trace elements. We then combined this site specific data with freely-available land cover and urban planning data to derive a predictive model of landscape scale environmental risk. We used six different model algorithms to analyze the dataset, comparing these in terms of their predictive abilities and the variables identified as important. Based on comparable predictive abilities (mean R 2 from 30 to 35% and mean root mean square error from 65 to 68%), we averaged all six model outputs to predict relative levels of trace element deposition in soils-given the road surface, traffic volume, sample distance from the road, land cover category, and impervious surface percentage. Mapped predictions of environmental risk from toxic trace element pollution can show land managers and transportation planners where to prioritize road renewal or maintenance by each road segment's relative environmental and human health risk. Published by Elsevier B.V.
On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

Directory of Open Access Journals (Sweden)

Julien Becker

Full Text Available Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix together with the CSP (cysteine separation profile are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to

QCD dipole predictions for DIS and diffractive structure functions

International Nuclear Information System (INIS)

Royon, C.

1997-01-01

The proton structure function F 2 , the gluon density F G , and the longitudinal structure function F L are derived in the QCD dipole picture of BFKL dynamics. We use a three parameter fit to describe the 1994 H1 proton structure function F 2 data in the low x, moderate Q 2 range. Without any additional parameter, the gluon density and the longitudinal structure functions are predicted. The diffractive dissociation processes are also discussed within the same framework, and a new prediction for the proton diffractive structure function is obtained
Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

Science.gov (United States)

Zhang, Wenxuan; Yang, Jianyi; He, Baoji; Walker, Sara Elizabeth; Zhang, Hongjiu; Govindarajoo, Brandon; Virtanen, Jouko; Xue, Zhidong; Shen, Hong-Bin; Zhang, Yang

2016-09-01

We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks.

Science.gov (United States)

Mei, Suyu; Zhu, Hao

2015-01-26

Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.
Delivery Mode and the Transition of Pioneering Gut-Microbiota Structure, Composition and Predicted Metabolic Function

Directory of Open Access Journals (Sweden)

Noel T. Mueller

2017-12-01

Full Text Available Cesarean (C-section delivery, recently shown to cause excess weight gain in mice, perturbs human neonatal gut microbiota development due to the lack of natural mother-to-newborn transfer of microbes. Neonates excrete first the in-utero intestinal content (referred to as meconium hours after birth, followed by intestinal contents reflective of extra-uterine exposure (referred to as transition stool 2 to 3 days after birth. It is not clear when the effect of C-section on the neonatal gut microbiota emerges. We examined bacterial DNA in carefully-collected meconium, and the subsequent transitional stool, from 59 neonates [13 born by scheduled C-section and 46 born by vaginal delivery] in a private hospital in Brazil. Bacterial DNA was extracted, and the V4 region of the 16S rRNA gene was sequenced using the Illumina MiSeq (San Diego, CA, USA platform. We found evidence of bacterial DNA in the majority of meconium samples in our study. The bacterial DNA structure (i.e., beta diversity of meconium differed significantly from that of the transitional stool microbiota. There was a significant reduction in bacterial alpha diversity (e.g., number of observed bacterial species and change in bacterial composition (e.g., reduced Proteobacteria in the transition from meconium to stool. However, changes in predicted microbiota metabolic function from meconium to transitional stool were only observed in vaginally-delivered neonates. Within sample comparisons showed that delivery mode was significantly associated with bacterial structure, composition and predicted microbiota metabolic function in transitional-stool samples, but not in meconium samples. Specifically, compared to vaginally delivered neonates, the transitional stool of C-section delivered neonates had lower proportions of the genera Bacteroides, Parabacteroides and Clostridium. These differences led to C-section neonates having lower predicted abundance of microbial genes related to metabolism of
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

Science.gov (United States)

Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

2017-05-01

Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Fabry-Pérot cavity based on chirped sampled fiber Bragg gratings.

Science.gov (United States)

Zheng, Jilin; Wang, Rong; Pu, Tao; Lu, Lin; Fang, Tao; Li, Weichun; Xiong, Jintian; Chen, Yingfang; Zhu, Huatao; Chen, Dalei; Chen, Xiangfei

2014-02-10

A novel kind of Fabry-Pérot (FP) structure based on chirped sampled fiber Bragg grating (CSFBG) is proposed and demonstrated. In this structure, the regular chirped FBG (CFBG) that functions as reflecting mirror in the FP cavity is replaced by CSFBG, which is realized by chirping the sampling periods of a sampled FBG having uniform local grating period. The realization of such CSFBG-FPs having diverse properties just needs a single uniform pitch phase mask and sub-micrometer precision moving stage. Compared with the conventional CFBG-FP, it becomes more flexible to design CSFBG-FPs of diverse functions, and the fabrication process gets simpler. As a demonstration, based on the same experimental facilities, FPs with uniform FSR (~73 pm) and chirped FSR (varying from 28 pm to 405 pm) are fabricated respectively, which shows good agreement with simulation results.
Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

Science.gov (United States)

Xu, Dong; Zhang, Yang

2012-07-01

Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.
BLAST-based structural annotation of protein residues using Protein Data Bank.

Science.gov (United States)

Singh, Harinder; Raghava, Gajendra P S

2016-01-25

In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .
QCD dipole prediction for dis and diffractive structure functions

International Nuclear Information System (INIS)

Royon, CH.

1996-01-01

The F 2 , F G , R = F L /F T proton structure functions are derived in the QCD dipole picture of BFKL dynamics. We get a three parameter fit describing the 1994 H1 proton structure function F 2 data in the low x, moderate Q 2 range. Without any additional parameter, the gluon density and the longitudinal structure functions are predicted. The diffractive dissociation processes are also discussed, and a new prediction for the proton diffractive structure function is obtained. (author)
Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

Science.gov (United States)

Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong

2009-07-01

A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
Alchemical and structural distribution based representation for universal quantum machine learning

Science.gov (United States)

Faber, Felix A.; Christensen, Anders S.; Huang, Bing; von Lilienfeld, O. Anatole

2018-06-01

We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model's learning curve, and, through interpolation across the periodic table, even enable "alchemical extrapolation" to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound datasets studied include thousands of structurally and compositionally diverse organic molecules, non-covalently bonded protein side-chains, (H2O)40-clusters, and crystalline solids. Learning curves for QML models also indicate competitive predictive power for various other electronic ground state properties of organic molecules, calculated with hybrid density functional theory, including polarizability, heat-capacity, HOMO-LUMO eigenvalues and gap, zero point vibrational energy, dipole moment, and highest vibrational fundamental frequency.
Effect of dissolved organic matter on pre-equilibrium passive sampling: A predictive QSAR modeling study.

Science.gov (United States)

Lin, Wei; Jiang, Ruifen; Shen, Yong; Xiong, Yaxin; Hu, Sizi; Xu, Jianqiao; Ouyang, Gangfeng

2018-04-13

Pre-equilibrium passive sampling is a simple and promising technique for studying sampling kinetics, which is crucial to determine the distribution, transfer and fate of hydrophobic organic compounds (HOCs) in environmental water and organisms. Environmental water samples contain complex matrices that complicate the traditional calibration process for obtaining the accurate rate constants. This study proposed a QSAR model to predict the sampling rate constants of HOCs (polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs) and pesticides) in aqueous systems containing complex matrices. A homemade flow-through system was established to simulate an actual aqueous environment containing dissolved organic matter (DOM) i.e. humic acid (HA) and (2-Hydroxypropyl)-β-cyclodextrin (β-HPCD)), and to obtain the experimental rate constants. Then, a quantitative structure-activity relationship (QSAR) model using Genetic Algorithm-Multiple Linear Regression (GA-MLR) was found to correlate the experimental rate constants to the system state including physicochemical parameters of the HOCs and DOM which were calculated and selected as descriptors by Density Functional Theory (DFT) and Chem 3D. The experimental results showed that the rate constants significantly increased as the concentration of DOM increased, and the enhancement factors of 70-fold and 34-fold were observed for the HOCs in HA and β-HPCD, respectively. The established QSAR model was validated as credible (R Adj. 2 =0.862) and predictable (Q 2 =0.835) in estimating the rate constants of HOCs for complex aqueous sampling, and a probable mechanism was developed by comparison to the reported theoretical study. The present study established a QSAR model of passive sampling rate constants and calibrated the effect of DOM on the sampling kinetics. Copyright © 2018 Elsevier B.V. All rights reserved.
PREDICTING APHASIA TYPE FROM BRAIN DAMAGE MEASURED WITH STRUCTURAL MRI

Science.gov (United States)

Yourganov, Grigori; Smith, Kimberly G.; Fridriksson, Julius; Rorden, Chris

2015-01-01

Chronic aphasia is a common consequence of a left-hemisphere stroke. Since the early insights by Broca and Wernicke, studying the relationship between the loci of cortical damage and patterns of language impairment has been one of the concerns of aphasiology. We utilized multivariate classification in a cross-validation framework to predict the type of chronic aphasia from the spatial pattern of brain damage. Our sample consisted of 98 patients with five types of aphasia (Broca’s, Wernicke’s, global, conduction, and anomic), classified based on scores on the Western Aphasia Battery. Binary lesion maps were obtained from structural MRI scans (obtained at least 6 months poststroke, and within 2 days of behavioural assessment); after spatial normalization, the lesions were parcellated into a disjoint set of brain areas. The proportion of damage to the brain areas was used to classify patients’ aphasia type. To create this parcellation, we relied on five brain atlases; our classifier (support vector machine) could differentiate between different kinds of aphasia using any of the five parcellations. In our sample, the best classification accuracy was obtained when using a novel parcellation that combined two previously published brain atlases, with the first atlas providing the segmentation of grey matter, and the second atlas used to segment the white matter. For each aphasia type, we computed the relative importance of different brain areas for distinguishing it from other aphasia types; our findings were consistent with previously published reports of lesion locations implicated in different types of aphasia. Overall, our results revealed that automated multivariate classification could distinguish between aphasia types based on damage to atlas-defined brain areas. PMID:26465238
Predicting aphasia type from brain damage measured with structural MRI.

Science.gov (United States)

Yourganov, Grigori; Smith, Kimberly G; Fridriksson, Julius; Rorden, Chris

2015-12-01

Chronic aphasia is a common consequence of a left-hemisphere stroke. Since the early insights by Broca and Wernicke, studying the relationship between the loci of cortical damage and patterns of language impairment has been one of the concerns of aphasiology. We utilized multivariate classification in a cross-validation framework to predict the type of chronic aphasia from the spatial pattern of brain damage. Our sample consisted of 98 patients with five types of aphasia (Broca's, Wernicke's, global, conduction, and anomic), classified based on scores on the Western Aphasia Battery (WAB). Binary lesion maps were obtained from structural MRI scans (obtained at least 6 months poststroke, and within 2 days of behavioural assessment); after spatial normalization, the lesions were parcellated into a disjoint set of brain areas. The proportion of damage to the brain areas was used to classify patients' aphasia type. To create this parcellation, we relied on five brain atlases; our classifier (support vector machine - SVM) could differentiate between different kinds of aphasia using any of the five parcellations. In our sample, the best classification accuracy was obtained when using a novel parcellation that combined two previously published brain atlases, with the first atlas providing the segmentation of grey matter, and the second atlas used to segment the white matter. For each aphasia type, we computed the relative importance of different brain areas for distinguishing it from other aphasia types; our findings were consistent with previously published reports of lesion locations implicated in different types of aphasia. Overall, our results revealed that automated multivariate classification could distinguish between aphasia types based on damage to atlas-defined brain areas. Copyright © 2015 Elsevier Ltd. All rights reserved.
Correlating Structural Order with Structural Rearrangement in Dusty Plasma Liquids: Can Structural Rearrangement be Predicted by Static Structural Information?

Science.gov (United States)

Su, Yen-Shuo; Liu, Yu-Hsuan; I, Lin

2012-11-01

Whether the static microstructural order information is strongly correlated with the subsequent structural rearrangement (SR) and their predicting power for SR are investigated experimentally in the quenched dusty plasma liquid with microheterogeneities. The poor local structural order is found to be a good alarm to identify the soft spot and predict the short term SR. For the site with good structural order, the persistent time for sustaining the structural memory until SR has a large mean value but a broad distribution. The deviation of the local structural order from that averaged over nearest neighbors serves as a good second alarm to further sort out the short time SR sites. It has the similar sorting power to that using the temporal fluctuation of the local structural order over a small time interval.
Predicting clinical symptoms of attention deficit hyperactivity disorder based on temporal patterns between and within intrinsic connectivity networks.

Science.gov (United States)

Wang, Xun-Heng; Jiao, Yun; Li, Lihua

2017-10-24

Attention deficit hyperactivity disorder (ADHD) is a common brain disorder with high prevalence in school-age children. Previously developed machine learning-based methods have discriminated patients with ADHD from normal controls by providing label information of the disease for individuals. Inattention and impulsivity are the two most significant clinical symptoms of ADHD. However, predicting clinical symptoms (i.e., inattention and impulsivity) is a challenging task based on neuroimaging data. The goal of this study is twofold: to build predictive models for clinical symptoms of ADHD based on resting-state fMRI and to mine brain networks for predictive patterns of inattention and impulsivity. To achieve this goal, a cohort of 74 boys with ADHD and a cohort of 69 age-matched normal controls were recruited from the ADHD-200 Consortium. Both structural and resting-state fMRI images were obtained for each participant. Temporal patterns between and within intrinsic connectivity networks (ICNs) were applied as raw features in the predictive models. Specifically, sample entropy was taken asan intra-ICN feature, and phase synchronization (PS) was used asan inter-ICN feature. The predictive models were based on the least absolute shrinkage and selectionator operator (LASSO) algorithm. The performance of the predictive model for inattention is r=0.79 (p<10 -8 ), and the performance of the predictive model for impulsivity is r=0.48 (p<10 -8 ). The ICN-related predictive patterns may provide valuable information for investigating the brain network mechanisms of ADHD. In summary, the predictive models for clinical symptoms could be beneficial for personalizing ADHD medications. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
A multicontroller structure for teaching and designing predictive control strategies

International Nuclear Information System (INIS)

Hodouin, D.; Desbiens, A.

1999-01-01

The paper deals with the unification of the existing linear control algorithms in order to facilitate their transfer to the engineering students and to industry's engineers. The resulting control algorithm is the Global Predictive Control (GlobPC), which is now taught at the graduate and continuing education levels. GlobPC is based on an internal model framework where three independent control criteria are minimized: one for tracking, one for regulation and one for feedforward. This structure allows to obtain desired tracking, regulation and feedforward behaviors in an optimal way while keeping them perfectly separated. It also cleanly separates the deterministic and stochastic predictions of the process model output. (author)
Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

Science.gov (United States)

Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

2018-03-01

Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach

DEFF Research Database (Denmark)

Nielsen, Morten; Lundegaard, Claus; Worning, Peder

2004-01-01

Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus......, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates...
Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

KAUST Repository

Maadooliat, Mehdi

2015-10-21

This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.

Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

KAUST Repository

Maadooliat, Mehdi; Zhou, Lan; Najibi, Seyed Morteza; Gao, Xin; Huang, Jianhua Z.

2015-01-01

This paper develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.
Inventory implications of using sampling variances in estimation of growth model coefficients

Science.gov (United States)

Albert R. Stage; William R. Wykoff

2000-01-01

Variables based on stand densities or stocking have sampling errors that depend on the relation of tree size to plot size and on the spatial structure of the population, ignoring the sampling errors of such variables, which include most measures of competition used in both distance-dependent and distance-independent growth models, can bias the predictions obtained from...
Prediction-based dynamic load-sharing heuristics

Science.gov (United States)

Goswami, Kumar K.; Devarakonda, Murthy; Iyer, Ravishankar K.

1993-01-01

The authors present dynamic load-sharing heuristics that use predicted resource requirements of processes to manage workloads in a distributed system. A previously developed statistical pattern-recognition method is employed for resource prediction. While nonprediction-based heuristics depend on a rapidly changing system status, the new heuristics depend on slowly changing program resource usage patterns. Furthermore, prediction-based heuristics can be more effective since they use future requirements rather than just the current system state. Four prediction-based heuristics, two centralized and two distributed, are presented. Using trace driven simulations, they are compared against random scheduling and two effective nonprediction based heuristics. Results show that the prediction-based centralized heuristics achieve up to 30 percent better response times than the nonprediction centralized heuristic, and that the prediction-based distributed heuristics achieve up to 50 percent improvements relative to their nonprediction counterpart.
Prediction of degradation and fracture of structural materials

International Nuclear Information System (INIS)

Tomkins, B.

1992-01-01

Prediction of materials performance in an engineering integrity context requires the underpinning of predictive modelling tuned by inputs from design, fabrication, operating experience, and laboratory testing. In this regard, in addition to fracture resistance four important areas of time dependent degradation are considered - mechanical, environmental, irradiation and thermal. The status of prediction of materials performance is discussed in relation to a number of important components such as LWR reactor pressure vessels and steam generators, and Fast Reactor high temperature structures. In each case the role of materials modelling is examined and the balance of factors which contribute to the overall prediction of component integrity/reliability noted. Structural integrity arguments must follow a clear strategy if the required level of confidence is to be established. Various strategies and their evolution are discussed. (author)
Novel structures of oxygen adsorbed on a Zr(0001) surface predicted from first principles

Energy Technology Data Exchange (ETDEWEB)

Gao, Bo [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China); Wang, Jianyun [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Lv, Jian [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); College of Materials Science and Engineering, Jilin University, Changchun, 130012 (China); Gao, Xingyu [Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, 100088 (China); CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Zhao, Yafan [CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Wang, Yanchao, E-mail: wyc@calypso.cn [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China); College of Materials Science and Engineering, Jilin University, Changchun, 130012 (China); Song, Haifeng, E-mail: song_haifeng@iapcm.ac.cn [Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, 100088 (China); CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Ma, Yanming [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China)

2017-01-30

Highlights: • Two stable structures of O adsorbed on a Zr(0001) surface are predicted with SLAM. • A stable structure of O adsorbed on a Zr(0001) surface is proposed with MLAM. • The calculated work function change is agreement with experimental value. - Abstract: The structures of O atoms adsorbed on a metal surface influence the metal properties significantly. Thus, studying O chemisorption on a Zr surface is of great interest. We investigated O adsorption on a Zr(0001) surface using our newly developed structure-searching method combined with first-principles calculations. A novel structural prototype with a unique combination of surface face-centered cubic (SFCC) and surface hexagonal close-packed (SHCP) O adsorption sites was predicted using a single-layer adsorption model (SLAM) for a 0.5 and 1.0 monolayer (ML) O coverage. First-principles calculations based on the SLAM revealed that the new predicted structures are energetically favorable compared with the well-known SFCC structures for a low O coverage (0.5 and 1.0 ML). Furthermore, on basis of our predicted SFCC + SHCP structures, a new structure within multi-layer adsorption model (MLAM) was proposed to be more stable at the O coverage of 1.0 ML, in which adsorbed O atoms occupy the SFCC + SHCP sites and the substitutional octahedral sites. The calculated work functions indicate that the SFCC + SHCP configuration has the lowest work function of all known structures at an O coverage of 0.5 ML within the SLAM, which agrees with the experimental trend of work function with variation in O coverage.
Cloud prediction of protein structure and function with PredictProtein for Debian.

Science.gov (United States)

Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

2013-01-01

We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Structural habitat predicts functional dispersal habitat of a large carnivore: how leopards change spots.

Science.gov (United States)

Fattebert, Julien; Robinson, Hugh S; Balme, Guy; Slotow, Rob; Hunter, Luke

2015-10-01

Natal dispersal promotes inter-population linkage, and is key to spatial distribution of populations. Degradation of suitable landscape structures beyond the specific threshold of an individual's ability to disperse can therefore lead to disruption of functional landscape connectivity and impact metapopulation function. Because it ignores behavioral responses of individuals, structural connectivity is easier to assess than functional connectivity and is often used as a surrogate for landscape connectivity modeling. However using structural resource selection models as surrogate for modeling functional connectivity through dispersal could be erroneous. We tested how well a second-order resource selection function (RSF) models (structural connectivity), based on GPS telemetry data from resident adult leopard (Panthera pardus L.), could predict subadult habitat use during dispersal (functional connectivity). We created eight non-exclusive subsets of the subadult data based on differing definitions of dispersal to assess the predictive ability of our adult-based RSF model extrapolated over a broader landscape. Dispersing leopards used habitats in accordance with adult selection patterns, regardless of the definition of dispersal considered. We demonstrate that, for a wide-ranging apex carnivore, functional connectivity through natal dispersal corresponds to structural connectivity as modeled by a second-order RSF. Mapping of the adult-based habitat classes provides direct visualization of the potential linkages between populations, without the need to model paths between a priori starting and destination points. The use of such landscape scale RSFs may provide insight into predicting suitable dispersal habitat peninsulas in human-dominated landscapes where mitigation of human-wildlife conflict should be focused. We recommend the use of second-order RSFs for landscape conservation planning and propose a similar approach to the conservation of other wide-ranging large
IPE data base structure and insights

International Nuclear Information System (INIS)

Lehner, J.; Youngblood, R.

1993-01-01

A data base (the ''IPE Insights Data Base''), has been developed that stores data obtained from the Individual Plant Examinations (IPEs) which licensees of nuclear power plants are conducting in response to the Nuclear Regulatory Commission's (NRC) Generic Letter GL88-20. The data base, which is a collection of linked dbase files, stores information about individual plant designs, core damage frequency, and containment performance in a uniform, structured way. This data base can be queried and used as a computational tool to derive insights regarding the plants for which data is stored. This paper sets out the objectives of the IPE Insights Data Base, describes its structure and contents, illustrates sample queries, and discusses possible future uses
Eating Disorders among a Community-Based Sample of Chilean Female Adolescents

Science.gov (United States)

Granillo, M. Teresa; Grogan-Kaylor, Andrew; Delva, Jorge; Castillo, Marcela

2011-01-01

The purpose of this study was to explore the prevalence and correlates of eating disorders among a community-based sample of female Chilean adolescents. Data were collected through structured interviews with 420 female adolescents residing in Santiago, Chile. Approximately 4% of the sample reported ever being diagnosed with an eating disorder.…
Leaf Area Prediction Using Three Alternative Sampling Methods for Seven Sierra Nevada Conifer Species

Directory of Open Access Journals (Sweden)

Dryw A. Jones

2015-07-01

Full Text Available Prediction of projected tree leaf area using allometric relationships with sapwood cross-sectional area is common in tree- and stand-level production studies. Measuring sapwood is difficult and often requires destructive sampling. This study tested multiple leaf area prediction models across seven diverse conifer species in the Sierra Nevada of California. The best-fit whole tree leaf area prediction model for overall simplicity, accuracy, and utility for all seven species was a nonlinear model with basal area as the primary covariate. A new non-destructive procedure was introduced to extend the branch summation approach to leaf area data collection on trees that cannot be destructively sampled. There were no significant differences between fixed effects assigned to sampling procedures, indicating that data from the tested sampling procedures can be combined for whole tree leaf area modeling purposes. These results indicate that, for the species sampled, accurate leaf area estimates can be obtained through partially-destructive sampling and using common forest inventory data.
Bridge Deterioration Prediction Model Based On Hybrid Markov-System Dynamic

Directory of Open Access Journals (Sweden)

Widodo Soetjipto Jojok

2017-01-01

Full Text Available Instantaneous bridge failure tends to increase in Indonesia. To mitigate this condition, Indonesia’s Bridge Management System (I-BMS has been applied to continuously monitor the condition of bridges. However, I-BMS only implements visual inspection for maintenance priority of the bridge structure component instead of bridge structure system. This paper proposes a new bridge failure prediction model based on hybrid Markov-System Dynamic (MSD. System dynamic is used to represent the correlation among bridge structure components while Markov chain is used to calculate temporal probability of the bridge failure. Around 235 data of bridges in Indonesia were collected from Directorate of Bridge the Ministry of Public Works and Housing for calculating transition probability of the model. To validate the model, a medium span concrete bridge was used as a case study. The result shows that the proposed model can accurately predict the bridge condition. Besides predicting the probability of the bridge failure, this model can also be used as an early warning system for bridge monitoring activity.
Prediction of Chloride Diffusion in Concrete Structure Using Meshless Methods

Directory of Open Access Journals (Sweden)

Ling Yao

2016-01-01

Full Text Available Degradation of RC structures due to chloride penetration followed by reinforcement corrosion is a serious problem in civil engineering. The numerical simulation methods at present mainly involve finite element methods (FEM, which are based on mesh generation. In this study, element-free Galerkin (EFG and meshless weighted least squares (MWLS methods are used to solve the problem of simulation of chloride diffusion in concrete. The range of a scaling parameter is presented using numerical examples based on meshless methods. One- and two-dimensional numerical examples validated the effectiveness and accuracy of the two meshless methods by comparing results obtained by MWLS with results computed by EFG and FEM and results calculated by an analytical method. A good agreement is obtained among MWLS and EFG numerical simulations and the experimental data obtained from an existing marine concrete structure. These results indicate that MWLS and EFG are reliable meshless methods that can be used for the prediction of chloride ingress in concrete structures.
HomPPI: a class of sequence homology based protein-protein interface prediction methods

Directory of Open Access Journals (Sweden)

Dobbs Drena

2011-06-01

Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of
Model structural uncertainty quantification and hydrologic parameter and prediction error analysis using airborne electromagnetic data

DEFF Research Database (Denmark)

Minsley, B. J.; Christensen, Nikolaj Kruse; Christensen, Steen

Model structure, or the spatial arrangement of subsurface lithological units, is fundamental to the hydrological behavior of Earth systems. Knowledge of geological model structure is critically important in order to make informed hydrological predictions and management decisions. Model structure...... is never perfectly known, however, and incorrect assumptions can be a significant source of error when making model predictions. We describe a systematic approach for quantifying model structural uncertainty that is based on the integration of sparse borehole observations and large-scale airborne...... electromagnetic (AEM) data. Our estimates of model structural uncertainty follow a Bayesian framework that accounts for both the uncertainties in geophysical parameter estimates given AEM data, and the uncertainties in the relationship between lithology and geophysical parameters. Using geostatistical sequential...
Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

Directory of Open Access Journals (Sweden)

Zhihua Wang

2014-01-01

Full Text Available Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.
Fast reconstruction and prediction of frozen flow turbulence based on structured Kalman filtering

NARCIS (Netherlands)

Fraanje, P.R.; Rice, J.; Verhaegen, M.; Doelman, N.J.

2010-01-01

Efficient and optimal prediction of frozen flow turbulence using the complete observation history of the wavefront sensor is an important issue in adaptive optics for large ground-based telescopes. At least for the sake of error budgeting and algorithm performance, the evaluation of an accurate
Structural features that predict real-value fluctuations of globular proteins.

Science.gov (United States)

Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

2012-05-01

It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.
A Wavelet Kernel-Based Primal Twin Support Vector Machine for Economic Development Prediction

Directory of Open Access Journals (Sweden)

Fang Su

2013-01-01

Full Text Available Economic development forecasting allows planners to choose the right strategies for the future. This study is to propose economic development prediction method based on the wavelet kernel-based primal twin support vector machine algorithm. As gross domestic product (GDP is an important indicator to measure economic development, economic development prediction means GDP prediction in this study. The wavelet kernel-based primal twin support vector machine algorithm can solve two smaller sized quadratic programming problems instead of solving a large one as in the traditional support vector machine algorithm. Economic development data of Anhui province from 1992 to 2009 are used to study the prediction performance of the wavelet kernel-based primal twin support vector machine algorithm. The comparison of mean error of economic development prediction between wavelet kernel-based primal twin support vector machine and traditional support vector machine models trained by the training samples with the 3–5 dimensional input vectors, respectively, is given in this paper. The testing results show that the economic development prediction accuracy of the wavelet kernel-based primal twin support vector machine model is better than that of traditional support vector machine.
Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures.

Science.gov (United States)

Kountouris, Petros; Hirst, Jonathan D

2010-07-31

Beta-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. We have developed a novel method that predicts beta-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of beta-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of beta-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. We have created an accurate predictor of beta-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.
ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

Science.gov (United States)

Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

2016-06-20

Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

Improved hybrid optimization algorithm for 3D protein structure prediction.

Science.gov (United States)

Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

2014-07-01

A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
Prefrontal Cortex Structure Predicts Training-Induced Improvements in Multitasking Performance.

Science.gov (United States)

Verghese, Ashika; Garner, K G; Mattingley, Jason B; Dux, Paul E

2016-03-02

The ability to perform multiple, concurrent tasks efficiently is a much-desired cognitive skill, but one that remains elusive due to the brain's inherent information-processing limitations. Multitasking performance can, however, be greatly improved through cognitive training (Van Selst et al., 1999, Dux et al., 2009). Previous studies have examined how patterns of brain activity change following training (for review, see Kelly and Garavan, 2005). Here, in a large-scale human behavioral and imaging study of 100 healthy adults, we tested whether multitasking training benefits, assessed using a standard dual-task paradigm, are associated with variability in brain structure. We found that the volume of the rostral part of the left dorsolateral prefrontal cortex (DLPFC) predicted an individual's response to training. Critically, this association was observed exclusively in a task-specific training group, and not in an active-training control group. Our findings reveal a link between DLPFC structure and an individual's propensity to gain from training on a task that taps the limits of cognitive control. Cognitive "brain" training is a rapidly growing, multibillion dollar industry (Hayden, 2012) that has been touted as the panacea for a variety of disorders that result in cognitive decline. A key process targeted by such training is "cognitive control." Here, we combined an established cognitive control measure, multitasking ability, with structural brain imaging in a sample of 100 participants. Our goal was to determine whether individual differences in brain structure predict the extent to which people derive measurable benefits from a cognitive training regime. Ours is the first study to identify a structural brain marker-volume of left hemisphere dorsolateral prefrontal cortex-associated with the magnitude of multitasking performance benefits induced by training at an individual level. Copyright © 2016 the authors 0270-6474/16/362638-08$15.00/0.
Mathematical Model to Predict the Permeability of Water Transport in Concrete Structure

OpenAIRE

Solomon Ndubuisi Eluozo

2013-01-01

Mathematical model to predict the permeability of water transport in concrete has been established, the model is to monitor the rate of water transport in concrete structure. The process of this water transport is based on the constituent in the mixture of concrete. Permeability established a relation on the influence of the micropores on the constituent that made of concrete, the method of concrete placement determine the rate of permeability deposition in concrete structure, permeability es...
IPE data base structure and insights

International Nuclear Information System (INIS)

Lehner, J.; Youngblood, R.

1994-01-01

A data base (the open-quotes IPE Insights Data Baseclose quotes), has been developed that stores data obtained from the Individual Plant Examinations (IPEs) which licensees of nuclear power plants are conducting in response to the Nuclear Regulatory Commission's (NRC) Generic Letter GL88-20. The data base, which is a collection of linked dBase files, stores information about individual plant designs, core damage frequency, and containment performance in a uniform, structured way. This data base can be queried and used as a computational tool to derive insights regarding the plants for which data is stored. This paper sets out the objectives of the IPE Insights Data Base, describes its structure and contents, illustrates sample queries, and discusses possible future uses
Trust-based collective view prediction

CERN Document Server

Luo, Tiejian; Xu, Guandong; Zhou, Jia

2013-01-01

Collective view prediction is to judge the opinions of an active web user based on unknown elements by referring to the collective mind of the whole community. Content-based recommendation and collaborative filtering are two mainstream collective view prediction techniques. They generate predictions by analyzing the text features of the target object or the similarity of users' past behaviors. Still, these techniques are vulnerable to the artificially-injected noise data, because they are not able to judge the reliability and credibility of the information sources. Trust-based Collective View
Experimental Protocol to Determine the Chloride Threshold Value for Corrosion in Samples Taken from Reinforced Concrete Structures.

Science.gov (United States)

Angst, Ueli M; Boschmann, Carolina; Wagner, Matthias; Elsener, Bernhard

2017-08-31

The aging of reinforced concrete infrastructure in developed countries imposes an urgent need for methods to reliably assess the condition of these structures. Corrosion of the embedded reinforcing steel is the most frequent cause for degradation. While it is well known that the ability of a structure to withstand corrosion depends strongly on factors such as the materials used or the age, it is common practice to rely on threshold values stipulated in standards or textbooks. These threshold values for corrosion initiation (Ccrit) are independent of the actual properties of a certain structure, which clearly limits the accuracy of condition assessments and service life predictions. The practice of using tabulated values can be traced to the lack of reliable methods to determine Ccrit on-site and in the laboratory. Here, an experimental protocol to determine Ccrit for individual engineering structures or structural members is presented. A number of reinforced concrete samples are taken from structures and laboratory corrosion testing is performed. The main advantage of this method is that it ensures real conditions concerning parameters that are well known to greatly influence Ccrit, such as the steel-concrete interface, which cannot be representatively mimicked in laboratory-produced samples. At the same time, the accelerated corrosion test in the laboratory permits the reliable determination of Ccrit prior to corrosion initiation on the tested structure; this is a major advantage over all common condition assessment methods that only permit estimating the conditions for corrosion after initiation, i.e., when the structure is already damaged. The protocol yields the statistical distribution of Ccrit for the tested structure. This serves as a basis for probabilistic prediction models for the remaining time to corrosion, which is needed for maintenance planning. This method can potentially be used in material testing of civil infrastructures, similar to established
Prediction of Protein Configurational Entropy (Popcoen).

Science.gov (United States)

Goethe, Martin; Gleixner, Jan; Fita, Ignacio; Rubi, J Miguel

2018-03-13

A knowledge-based method for configurational entropy prediction of proteins is presented; this methodology is extremely fast, compared to previous approaches, because it does not involve any type of configurational sampling. Instead, the configurational entropy of a query fold is estimated by evaluating an artificial neural network, which was trained on molecular-dynamics simulations of ∼1000 proteins. The predicted entropy can be incorporated into a large class of protein software based on cost-function minimization/evaluation, in which configurational entropy is currently neglected for performance reasons. Software of this type is used for all major protein tasks such as structure predictions, proteins design, NMR and X-ray refinement, docking, and mutation effect predictions. Integrating the predicted entropy can yield a significant accuracy increase as we show exemplarily for native-state identification with the prominent protein software FoldX. The method has been termed Popcoen for Prediction of Protein Configurational Entropy. An implementation is freely available at http://fmc.ub.edu/popcoen/ .
Researches of fruit quality prediction model based on near infrared spectrum

Science.gov (United States)

Shen, Yulin; Li, Lian

2018-04-01

With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.
Validation of Accelerometer-Based Energy Expenditure Prediction Models in Structured and Simulated Free-Living Settings

Science.gov (United States)

Montoye, Alexander H. K.; Conger, Scott A.; Connolly, Christopher P.; Imboden, Mary T.; Nelson, M. Benjamin; Bock, Josh M.; Kaminsky, Leonard A.

2017-01-01

This study compared accuracy of energy expenditure (EE) prediction models from accelerometer data collected in structured and simulated free-living settings. Twenty-four adults (mean age 45.8 years, 50% female) performed two sessions of 11 to 21 activities, wearing four ActiGraph GT9X Link activity monitors (right hip, ankle, both wrists) and a…
Quality assessment of protein model-structures based on structural and functional similarities.

Science.gov (United States)

Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

2012-09-21

Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and
Structure prediction of nanoclusters; a direct or a pre-screened search on the DFT energy landscape?

Science.gov (United States)

Farrow, M R; Chow, Y; Woodley, S M

2014-10-21

The atomic structure of inorganic nanoclusters obtained via a search for low lying minima on energy landscapes, or hypersurfaces, is reported for inorganic binary compounds: zinc oxide (ZnO)n, magnesium oxide (MgO)n, cadmium selenide (CdSe)n, and potassium fluoride (KF)n, where n = 1-12 formula units. The computational cost of each search is dominated by the effort to evaluate each sample point on the energy landscape and the number of required sample points. The effect of changing the balance between these two factors on the success of the search is investigated. The choice of sample points will also affect the number of required data points and therefore the efficiency of the search. Monte Carlo based global optimisation routines (evolutionary and stochastic quenching algorithms) within a new software package, viz. Knowledge Led Master Code (KLMC), are employed to search both directly and after pre-screening on the DFT energy landscape. Pre-screening includes structural relaxation to minimise a cheaper energy function - based on interatomic potentials - and is found to improve significantly the search efficiency, and typically reduces the number of DFT calculations required to locate the local minima by more than an order of magnitude. Although the choice of functional form is important, the approach is robust to small changes to the interatomic potential parameters. The computational cost of initial DFT calculations of each structure is reduced by employing Gaussian smearing to the electronic energy levels. Larger (KF)n nanoclusters are predicted to form cuboid cuts from the rock-salt phase, but also share many structural motifs with (MgO)n for smaller clusters. The transition from 2D rings to 3D (bubble, or fullerene-like) structures occur at a larger cluster size for (ZnO)n and (CdSe)n. Differences between the HOMO and LUMO energies, for all the compounds apart from KF, are in the visible region of the optical spectrum (2-3 eV); KF lies deep in the UV region
A domain-based approach to predict protein-protein interactions

Directory of Open Access Journals (Sweden)

Resat Haluk

2007-06-01

Full Text Available Abstract Background Knowing which proteins exist in a certain organism or cell type and how these proteins interact with each other are necessary for the understanding of biological processes at the whole cell level. The determination of the protein-protein interaction (PPI networks has been the subject of extensive research. Despite the development of reasonably successful methods, serious technical difficulties still exist. In this paper we present DomainGA, a quantitative computational approach that uses the information about the domain-domain interactions to predict the interactions between proteins. Results DomainGA is a multi-parameter optimization method in which the available PPI information is used to derive a quantitative scoring scheme for the domain-domain pairs. Obtained domain interaction scores are then used to predict whether a pair of proteins interacts. Using the yeast PPI data and a series of tests, we show the robustness and insensitivity of the DomainGA method to the selection of the parameter sets, score ranges, and detection rules. Our DomainGA method achieves very high explanation ratios for the positive and negative PPIs in yeast. Based on our cross-verification tests on human PPIs, comparison of the optimized scores with the structurally observed domain interactions obtained from the iPFAM database, and sensitivity and specificity analysis; we conclude that our DomainGA method shows great promise to be applicable across multiple organisms. Conclusion We envision the DomainGA as a first step of a multiple tier approach to constructing organism specific PPIs. As it is based on fundamental structural information, the DomainGA approach can be used to create potential PPIs and the accuracy of the constructed interaction template can be further improved using complementary methods. Explanation ratios obtained in the reported test case studies clearly show that the false prediction rates of the template networks constructed
Rigorous assessment and integration of the sequence and structure based features to predict hot spots

Directory of Open Access Journals (Sweden)

Wang Yong

2011-07-01

Full Text Available Abstract Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes. While in Ab- dataset (antigen-antibody complexes are excluded, there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs. The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine
Rigorous assessment and integration of the sequence and structure based features to predict hot spots

Science.gov (United States)

2011-01-01

Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite
The Structure of The Extended Psychosis Phenotype in Early Adolescence—A Cross-sample Replication

Science.gov (United States)

Wigman, Johanna T. W.; Vollebergh, Wilma A. M.; Raaijmakers, Quinten A. W.; Iedema, Jurjen; van Dorsselaer, Saskia; Ormel, Johan; Verhulst, Frank C.; van Os, Jim

2011-01-01

The extended psychosis phenotype, or the expression of nonclinical positive psychotic experiences, is already prevalent in adolescence and has a dose-response risk relationship with later psychotic disorder. In 2 large adolescent general population samples (n = 5422 and n = 2230), prevalence and structure of the extended psychosis phenotype was investigated. Positive psychotic experiences, broadly defined, were reported by the majority of adolescents. Exploratory analysis with Structural Equation Modelling (Exploratory Factor Analysis followed by Confirmatory Factor Analysis [CFA]) in sample 1 suggested that psychotic experiences were best represented by 5 underlying dimensions; CFA in sample 2 provided a replication of this model. Dimensions were labeled Hallucinations, Delusions, Paranoia, Grandiosity, and Paranormal beliefs. Prevalences differed strongly, Hallucinations having the lowest and Paranoia having the highest rates. Girls reported more experiences on all dimensions, except Grandiosity, and from age 12 to 16 years rates increased. Hallucinations, Delusions, and Paranoia, but not Grandiosity and Paranormal beliefs, were associated with distress and general measures of psychopathology. Thus, only some of the dimensions of the extended psychosis phenotype in young people may represent a continuum with more severe psychopathology and predict later psychiatric disorder. PMID:20044595
Structure Prediction and Analysis of Neuraminidase Sequence Variants

Science.gov (United States)

Thayer, Kelly M.

2016-01-01

Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…
The Satellite Clock Bias Prediction Method Based on Takagi-Sugeno Fuzzy Neural Network

Science.gov (United States)

Cai, C. L.; Yu, H. G.; Wei, Z. C.; Pan, J. D.

2017-05-01

The continuous improvement of the prediction accuracy of Satellite Clock Bias (SCB) is the key problem of precision navigation. In order to improve the precision of SCB prediction and better reflect the change characteristics of SCB, this paper proposes an SCB prediction method based on the Takagi-Sugeno fuzzy neural network. Firstly, the SCB values are pre-treated based on their characteristics. Then, an accurate Takagi-Sugeno fuzzy neural network model is established based on the preprocessed data to predict SCB. This paper uses the precise SCB data with different sampling intervals provided by IGS (International Global Navigation Satellite System Service) to realize the short-time prediction experiment, and the results are compared with the ARIMA (Auto-Regressive Integrated Moving Average) model, GM(1,1) model, and the quadratic polynomial model. The results show that the Takagi-Sugeno fuzzy neural network model is feasible and effective for the SCB short-time prediction experiment, and performs well for different types of clocks. The prediction results for the proposed method are better than the conventional methods obviously.
Less-structured time in children's daily lives predicts self-directed executive functioning.

Science.gov (United States)

Barker, Jane E; Semenov, Andrei D; Michaelson, Laura; Provan, Lindsay S; Snyder, Hannah R; Munakata, Yuko

2014-01-01

Executive functions (EFs) in childhood predict important life outcomes. Thus, there is great interest in attempts to improve EFs early in life. Many interventions are led by trained adults, including structured training activities in the lab, and less-structured activities implemented in schools. Such programs have yielded gains in children's externally-driven executive functioning, where they are instructed on what goal-directed actions to carry out and when. However, it is less clear how children's experiences relate to their development of self-directed executive functioning, where they must determine on their own what goal-directed actions to carry out and when. We hypothesized that time spent in less-structured activities would give children opportunities to practice self-directed executive functioning, and lead to benefits. To investigate this possibility, we collected information from parents about their 6-7 year-old children's daily, annual, and typical schedules. We categorized children's activities as "structured" or "less-structured" based on categorization schemes from prior studies on child leisure time use. We assessed children's self-directed executive functioning using a well-established verbal fluency task, in which children generate members of a category and can decide on their own when to switch from one subcategory to another. The more time that children spent in less-structured activities, the better their self-directed executive functioning. The opposite was true of structured activities, which predicted poorer self-directed executive functioning. These relationships were robust (holding across increasingly strict classifications of structured and less-structured time) and specific (time use did not predict externally-driven executive functioning). We discuss implications, caveats, and ways in which potential interpretations can be distinguished in future work, to advance an understanding of this fundamental aspect of growing up.
Interlinking backscatter, grain size and benthic community structure

Science.gov (United States)

McGonigle, Chris; Collier, Jenny S.

2014-06-01

The relationship between acoustic backscatter, sediment grain size and benthic community structure is examined using three different quantitative methods, covering image- and angular response-based approaches. Multibeam time-series backscatter (300 kHz) data acquired in 2008 off the coast of East Anglia (UK) are compared with grain size properties, macrofaunal abundance and biomass from 130 Hamon and 16 Clamshell grab samples. Three predictive methods are used: 1) image-based (mean backscatter intensity); 2) angular response-based (predicted mean grain size), and 3) image-based (1st principal component and classification) from Quester Tangent Corporation Multiview software. Relationships between grain size and backscatter are explored using linear regression. Differences in grain size and benthic community structure between acoustically defined groups are examined using ANOVA and PERMANOVA+. Results for the Hamon grab stations indicate significant correlations between measured mean grain size and mean backscatter intensity, angular response predicted mean grain size, and 1st principal component of QTC analysis (all p PERMANOVA for the Hamon abundance shows benthic community structure was significantly different between acoustic groups for all methods (p ≤ 0.001). Overall these results show considerable promise in that more than 60% of the variance in the mean grain size of the Clamshell grab samples can be explained by mean backscatter or acoustically-predicted grain size. These results show that there is significant predictive capacity for sediment characteristics from multibeam backscatter and that these acoustic classifications can have ecological validity.
Predicting taxonomic and functional structure of microbial communities in acid mine drainage.

Science.gov (United States)

Kuang, Jialiang; Huang, Linan; He, Zhili; Chen, Linxing; Hua, Zhengshuang; Jia, Pu; Li, Shengjin; Liu, Jun; Li, Jintian; Zhou, Jizhong; Shu, Wensheng

2016-06-01

Predicting the dynamics of community composition and functional attributes responding to environmental changes is an essential goal in community ecology but remains a major challenge, particularly in microbial ecology. Here, by targeting a model system with low species richness, we explore the spatial distribution of taxonomic and functional structure of 40 acid mine drainage (AMD) microbial communities across Southeast China profiled by 16S ribosomal RNA pyrosequencing and a comprehensive microarray (GeoChip). Similar environmentally dependent patterns of dominant microbial lineages and key functional genes were observed regardless of the large-scale geographical isolation. Functional and phylogenetic β-diversities were significantly correlated, whereas functional metabolic potentials were strongly influenced by environmental conditions and community taxonomic structure. Using advanced modeling approaches based on artificial neural networks, we successfully predicted the taxonomic and functional dynamics with significantly higher prediction accuracies of metabolic potentials (average Bray-Curtis similarity 87.8) as compared with relative microbial abundances (similarity 66.8), implying that natural AMD microbial assemblages may be better predicted at the functional genes level rather than at taxonomic level. Furthermore, relative metabolic potentials of genes involved in many key ecological functions (for example, nitrogen and phosphate utilization, metals resistance and stress response) were extrapolated to increase under more acidic and metal-rich conditions, indicating a critical strategy of stress adaptation in these extraordinary communities. Collectively, our findings indicate that natural selection rather than geographic distance has a more crucial role in shaping the taxonomic and functional patterns of AMD microbial community that readily predicted by modeling methods and suggest that the model-based approach is essential to better understand natural

Update on protein structure prediction: results of the 1995 IRBM workshop

DEFF Research Database (Denmark)

Hubbard, Tim; Tramontano, Anna; Hansen, Jan

1996-01-01

Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a numbe...
Update on protein structure prediction: results of the 1995 IRBM workshop

DEFF Research Database (Denmark)

Hubbard, Tim; Tramontano, Anna; Hansen, Jan

1996-01-01

Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a number...
SU-F-E-09: Respiratory Signal Prediction Based On Multi-Layer Perceptron Neural Network Using Adjustable Training Samples

Energy Technology Data Exchange (ETDEWEB)

Sun, W; Jiang, M; Yin, F [Duke University Medical Center, Durham, NC (United States)

2016-06-15

Purpose: Dynamic tracking of moving organs, such as lung and liver tumors, under radiation therapy requires prediction of organ motions prior to delivery. The shift of moving organ may change a lot due to huge transform of respiration at different periods. This study aims to reduce the influence of that changes using adjustable training signals and multi-layer perceptron neural network (ASMLP). Methods: Respiratory signals obtained using a Real-time Position Management(RPM) device were used for this study. The ASMLP uses two multi-layer perceptron neural networks(MLPs) to infer respiration position alternately and the training sample will be updated with time. Firstly, a Savitzky-Golay finite impulse response smoothing filter was established to smooth the respiratory signal. Secondly, two same MLPs were developed to estimate respiratory position from its previous positions separately. Weights and thresholds were updated to minimize network errors according to Leverberg-Marquart optimization algorithm through backward propagation method. Finally, MLP 1 was used to predict 120∼150s respiration position using 0∼120s training signals. At the same time, MLP 2 was trained using 30∼150s training signals. Then MLP is used to predict 150∼180s training signals according to 30∼150s training signals. The respiration position is predicted as this way until it was finished. Results: In this experiment, the two methods were used to predict 2.5 minute respiratory signals. For predicting 1s ahead of response time, correlation coefficient was improved from 0.8250(MLP method) to 0.8856(ASMLP method). Besides, a 30% improvement of mean absolute error between MLP(0.1798 on average) and ASMLP(0.1267 on average) was achieved. For predicting 2s ahead of response time, correlation coefficient was improved from 0.61415 to 0.7098.Mean absolute error of MLP method(0.3111 on average) was reduced by 35% using ASMLP method(0.2020 on average). Conclusion: The preliminary results
Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms.

Directory of Open Access Journals (Sweden)

Tammy M K Cheng

Full Text Available Recent analyses of human genome sequences have given rise to impressive advances in identifying non-synonymous single nucleotide polymorphisms (nsSNPs. By contrast, the annotation of nsSNPs and their links to diseases are progressing at a much slower pace. Many of the current approaches to analysing disease-associated nsSNPs use primarily sequence and evolutionary information, while structural information is relatively less exploited. In order to explore the potential of such information, we developed a structure-based approach, Bongo (Bonds ON Graph, to predict structural effects of nsSNPs. Bongo considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify the residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Our results show that Bongo is able to identify mutations that cause both local and global structural effects, with a remarkably low false positive rate. Application of the Bongo method to the prediction of 506 disease-associated nsSNPs resulted in a performance (positive predictive value, PPV, 78.5% similar to that of PolyPhen (PPV, 77.2% and PANTHER (PPV, 72.2%. As the Bongo method is solely structure-based, our results indicate that the structural changes resulting from nsSNPs are closely associated to their pathological consequences.
MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction

Science.gov (United States)

Yin, Xi; Yang, Jing; Xiao, Feng; Yang, Yang; Shen, Hong-Bin

2018-03-01

Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/. [Figure not available: see fulltext.
A sampling and metagenomic sequencing-based methodology for monitoring antimicrobial resistance in swine herds

DEFF Research Database (Denmark)

Munk, Patrick; Dalhoff Andersen, Vibe; de Knegt, Leonardo

2016-01-01

Objectives Reliable methods for monitoring antimicrobial resistance (AMR) in livestock and other reservoirs are essential to understand the trends, transmission and importance of agricultural resistance. Quantification of AMR is mostly done using culture-based techniques, but metagenomic read...... mapping shows promise for quantitative resistance monitoring. Methods We evaluated the ability of: (i) MIC determination for Escherichia coli; (ii) cfu counting of E. coli; (iii) cfu counting of aerobic bacteria; and (iv) metagenomic shotgun sequencing to predict expected tetracycline resistance based...... cultivation-based techniques in terms of predicting expected tetracycline resistance based on antimicrobial consumption. Our metagenomic approach had sufficient resolution to detect antimicrobial-induced changes to individual resistance gene abundances. Pen floor manure samples were found to represent rectal...
Optimization of Decision-Making for Spatial Sampling in the North China Plain, Based on Remote-Sensing a Priori Knowledge

Science.gov (United States)

Feng, J.; Bai, L.; Liu, S.; Su, X.; Hu, H.

2012-07-01

In this paper, the MODIS remote sensing data, featured with low-cost, high-timely and moderate/low spatial resolutions, in the North China Plain (NCP) as a study region were firstly used to carry out mixed-pixel spectral decomposition to extract an useful regionalized indicator parameter (RIP) (i.e., an available ratio, that is, fraction/percentage, of winter wheat planting area in each pixel as a regionalized indicator variable (RIV) of spatial sampling) from the initial selected indicators. Then, the RIV values were spatially analyzed, and the spatial structure characteristics (i.e., spatial correlation and variation) of the NCP were achieved, which were further processed to obtain the scalefitting, valid a priori knowledge or information of spatial sampling. Subsequently, founded upon an idea of rationally integrating probability-based and model-based sampling techniques and effectively utilizing the obtained a priori knowledge or information, the spatial sampling models and design schemes and their optimization and optimal selection were developed, as is a scientific basis of improving and optimizing the existing spatial sampling schemes of large-scale cropland remote sensing monitoring. Additionally, by the adaptive analysis and decision strategy the optimal local spatial prediction and gridded system of extrapolation results were able to excellently implement an adaptive report pattern of spatial sampling in accordance with report-covering units in order to satisfy the actual needs of sampling surveys.
Evaluation of Multiple Linear Regression-Based Limited Sampling Strategies for Enteric-Coated Mycophenolate Sodium in Adult Kidney Transplant Recipients.

Science.gov (United States)

Brooks, Emily K; Tett, Susan E; Isbel, Nicole M; McWhinney, Brett; Staatz, Christine E

2018-04-01

Although multiple linear regression-based limited sampling strategies (LSSs) have been published for enteric-coated mycophenolate sodium, none have been evaluated for the prediction of subsequent mycophenolic acid (MPA) exposure. This study aimed to examine the predictive performance of the published LSS for the estimation of future MPA area under the concentration-time curve from 0 to 12 hours (AUC0-12) in renal transplant recipients. Total MPA plasma concentrations were measured in 20 adult renal transplant patients on 2 occasions a week apart. All subjects received concomitant tacrolimus and were approximately 1 month after transplant. Samples were taken at 0, 0.33, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 6, and 8 hours and 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 3, 4, 6, 9, and 12 hours after dose on the first and second sampling occasion, respectively. Predicted MPA AUC0-12 was calculated using 19 published LSSs and data from the first or second sampling occasion for each patient and compared with the second occasion full MPA AUC0-12 calculated using the linear trapezoidal rule. Bias (median percentage prediction error) and imprecision (median absolute prediction error) were determined. Median percentage prediction error and median absolute prediction error for the prediction of full MPA AUC0-12 were multiple linear regression-based LSS was not possible without concentrations up to at least 8 hours after the dose.
Use of a multigrid technique to study effects of limited sampling of heterogeneity on transport prediction

International Nuclear Information System (INIS)

Cole, C.R.; Foote, H.P.

1987-02-01

Reliable ground water transport prediction requires accurate spatial and temporal characterization of a hydrogeologic system. However, cost constraints and the desire to maintain site integrity by minimizing drilling can restrict the amount of spatial sampling that can be obtained to resolve the flow parameter variability associated with heterogeneities. This study quantifies the errors in subsurface transport predictions resulting from incomplete characterization of hydraulic conductivity heterogeneity. A multigrid technique was used to simulate two-dimensional flow velocity fields with high resolution. To obtain these velocity fields, the finite difference code MGRID, which implements a multigrid solution technique, was applied to compute stream functions on a 256-by-256 grid for a variety of hypothetical systems having detailed distributions of hydraulic conductivity. Spatial variability in hydraulic conductivity distributions was characterized by the components in the spectrum of spatial frequencies. A low-pass spatial filtering technique was applied to the base case hydraulic conductivity distribution to produce a data set with lower spatial frequency content. Arrival time curves were then calculated for filtered hydraulic conductivity distribution and compared to base case results to judge the relative importance of the higher spatial frequency components. Results indicate a progression from multimode to single-mode arrival time curves as the number and extent of distinct flow pathways are reduced by low-pass filtering. This relationship between transport predictions and spatial frequencies was used to judge the consequences of sampling the hydraulic conductivity with reduced spatial resolution. 22 refs., 17 figs
Computational prediction of drug-drug interactions based on drugs functional similarities.

Science.gov (United States)

Ferdousi, Reza; Safdari, Reza; Omidi, Yadollah

2017-06-01

Therapeutic activities of drugs are often influenced by co-administration of drugs that may cause inevitable drug-drug interactions (DDIs) and inadvertent side effects. Prediction and identification of DDIs are extremely vital for the patient safety and success of treatment modalities. A number of computational methods have been employed for the prediction of DDIs based on drugs structures and/or functions. Here, we report on a computational method for DDIs prediction based on functional similarity of drugs. The model was set based on key biological elements including carriers, transporters, enzymes and targets (CTET). The model was applied for 2189 approved drugs. For each drug, all the associated CTETs were collected, and the corresponding binary vectors were constructed to determine the DDIs. Various similarity measures were conducted to detect DDIs. Of the examined similarity methods, the inner product-based similarity measures (IPSMs) were found to provide improved prediction values. Altogether, 2,394,766 potential drug pairs interactions were studied. The model was able to predict over 250,000 unknown potential DDIs. Upon our findings, we propose the current method as a robust, yet simple and fast, universal in silico approach for identification of DDIs. We envision that this proposed method can be used as a practical technique for the detection of possible DDIs based on the functional similarities of drugs. Copyright © 2017. Published by Elsevier Inc.
QCD predictions for weak neutral current structure functions

International Nuclear Information System (INIS)

Wu Jimin

1987-01-01

Employing the analytic expression (to the next leading order) for non-singlet component of structure function which the author got from QCD theory and putting recent experiment result of neutral current structure function at Q 2 = 11 (GeV/C) 2 as input, the QCD prediction for neutral current structure function of their scaling violation behaviours was given
The ordered network structure and prediction summary for M ≥ 7 earthquakes in Xinjiang region of China

International Nuclear Information System (INIS)

Men, Ke-Pei; Zhao, Kai

2014-01-01

M ≥ 7 earthquakes have showed an obvious commensurability and orderliness in Xinjiang of China and its adjacent region since 1800. The main orderly values are 30 a x k (k = 1, 2, 3), 11 ∝ 12 a, 41 ∝ 43 a, 18 ∝ 19 a, and 5 ∝ 6 a. In the guidance of the information forecasting theory of Wen-Bo Weng, based on previous research results, combining ordered network structure analysis with complex network technology, we focus on the prediction summary of M ≥ 7 earthquakes by using the ordered network structure, and add new information to further optimize network, hence construct the 2D- and 3D-ordered network structure of M ≥ 7 earthquakes. In this paper, the network structure revealed fully the regularity of seismic activity of M ≥ 7 earthquakes in the study region during the past 210 years. Based on this, the Karakorum M7.1 earthquake in 1996, the M7.9 earthquake on the frontier of Russia, Mongol, and China in 2003, and two Yutian M7.3 earthquakes in 2008 and 2014 were predicted successfully. At the same time, a new prediction opinion is presented that the future two M ≥ 7 earthquakes will probably occur around 2019-2020 and 2025-2026 in this region. The results show that large earthquake occurred in defined region can be predicted. The method of ordered network structure analysis produces satisfactory results for the mid-and-long term prediction of M ≥ 7 earthquakes.
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

Directory of Open Access Journals (Sweden)

Xiaoxia Yang

Full Text Available Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

Science.gov (United States)

Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

2015-01-01

Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
The experimental search for new predicted binary-alloy structures

Science.gov (United States)

Erb, K. C.; Richey, Lauren; Lang, Candace; Campbell, Branton; Hart, Gus

2010-10-01

Predicting new ordered phases in metallic alloys is a productive line of inquiry because configurational ordering in an alloy can dramatically alter their useful material properties. One is able to infer the existence of an ordered phase in an alloy using first-principles calculated formation enthalpies.ootnotetextG. L. W. Hart, ``Where are Nature's missing structures?,'' Nature Materials 6 941-945 2007 Using this approach, we have been able to identify stable (i.e. lowest energy) orderings in a variety of binary metallic alloys. Many of these phases have been observed experimentally in the past, though others have not. In pursuit of several of the missing structures, we have characterized potential orderings in PtCd, PtPd and PtMo alloys using synchrotron x-ray powder diffraction and symmetry-analysis tools.ootnotetextB. J. Campbell, H. T. Stokes, D. E. Tanner, and D. M. Hatch, ``ISODISPLACE: a web-based tool for exploring structural distortions,'' J. Appl. Cryst. 39, 607-614 (2006)
Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

Science.gov (United States)

Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

2017-11-01

In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.
Predicting protein structures with a multiplayer online game.

Science.gov (United States)

Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

2010-08-05

People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Predicting Hepatotoxicity of Drug Metabolites Via an Ensemble Approach Based on Support Vector Machine

Science.gov (United States)

Lu, Yin; Liu, Lili; Lu, Dong; Cai, Yudong; Zheng, Mingyue; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

2017-11-20

Drug-induced liver injury (DILI) is a major cause of drug withdrawal. The chemical properties of the drug, especially drug metabolites, play key roles in DILI. Our goal is to construct a QSAR model to predict drug hepatotoxicity based on drug metabolites. 64 hepatotoxic drug metabolites and 3,339 non-hepatotoxic drug metabolites were gathered from MDL Metabolite Database. Considering the imbalance of the dataset, we randomly split the negative samples and combined each portion with all the positive samples to construct individually balanced datasets for constructing independent classifiers. Then, we adopted an ensemble approach to make prediction based on the results of all individual classifiers and applied the minimum Redundancy Maximum Relevance (mRMR) feature selection method to select the molecular descriptors. Eventually, for the drugs in the external test set, a Bayesian inference method was used to predict the hepatotoxicity of a drug based on its metabolites. The model showed the average balanced accuracy=78.47%, sensitivity =74.17%, and specificity=82.77%. Five molecular descriptors characterizing molecular polarity, intramolecular bonding strength, and molecular frontier orbital energy were obtained. When predicting the hepatotoxicity of a drug based on all its metabolites, the sensitivity, specificity and balanced accuracy were 60.38%, 70.00%, and 65.19%, respectively, indicating that this method is useful for identifying the hepatotoxicity of drugs. We developed an in silico model to predict hepatotoxicity of drug metabolites. Moreover, Bayesian inference was applied to predict the hepatotoxicity of a drug based on its metabolites which brought out valuable high sensitivity and specificity. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
[Assessment of biological corrosion of ferroconcrete of ground-based industrial structures].

Science.gov (United States)

Rozhanskaia, A M; Piliashenko-Novokhatnyĭ, A I; Purish, L M; Durcheva, V N; Kozlova, I A

2001-01-01

One of the objects of a nuclear plant built in 1983 and put in 15-years long dead storage with the purpose to estimate the degree of contamination by rust-hazardous microorganisms of ferroconcrete structures and to predict their biocorrosion state after putting in operation was a subject of microbiological investigation. The everywhere distribution of sulphur cycle bacteria (thionic and sulphate-reducing bacteria) on the surface and in the bulk of concrete structures, their confineness to corrosion products of concrete and bars of the investigated building have been shown. It has been demonstrated that sulphate-reducing bacteria were the most distributed group in all the sampling points. An indirect estimation of participation degree of the microbial communities in the processes of ferroconcrete biological damages has been carried out as based on the accumulation intensity of aggressive gaseous metabolites--carbon dioxide and hydrogen. Probability of deterioration of biocorrosion situation under the full-scale operation of the object has been substantiated.
Physics-Based Hazard Assessment for Critical Structures Near Large Earthquake Sources

Science.gov (United States)

Hutchings, L.; Mert, A.; Fahjan, Y.; Novikova, T.; Golara, A.; Miah, M.; Fergany, E.; Foxall, W.

2017-09-01

We argue that for critical structures near large earthquake sources: (1) the ergodic assumption, recent history, and simplified descriptions of the hazard are not appropriate to rely on for earthquake ground motion prediction and can lead to a mis-estimation of the hazard and risk to structures; (2) a physics-based approach can address these issues; (3) a physics-based source model must be provided to generate realistic phasing effects from finite rupture and model near-source ground motion correctly; (4) wave propagations and site response should be site specific; (5) a much wider search of possible sources of ground motion can be achieved computationally with a physics-based approach; (6) unless one utilizes a physics-based approach, the hazard and risk to structures has unknown uncertainties; (7) uncertainties can be reduced with a physics-based approach, but not with an ergodic approach; (8) computational power and computer codes have advanced to the point that risk to structures can be calculated directly from source and site-specific ground motions. Spanning the variability of potential ground motion in a predictive situation is especially difficult for near-source areas, but that is the distance at which the hazard is the greatest. The basis of a "physical-based" approach is ground-motion syntheses derived from physics and an understanding of the earthquake process. This is an overview paper and results from previous studies are used to make the case for these conclusions. Our premise is that 50 years of strong motion records is insufficient to capture all possible ranges of site and propagation path conditions, rupture processes, and spatial geometric relationships between source and site. Predicting future earthquake scenarios is necessary; models that have little or no physical basis but have been tested and adjusted to fit available observations can only "predict" what happened in the past, which should be considered description as opposed to prediction

RNA folding: structure prediction, folding kinetics and ion electrostatics.

Science.gov (United States)

Tan, Zhijie; Zhang, Wenbing; Shi, Yazhou; Wang, Fenghua

2015-01-01

Beyond the "traditional" functions such as gene storage, transport and protein synthesis, recent discoveries reveal that RNAs have important "new" biological functions including the RNA silence and gene regulation of riboswitch. Such functions of noncoding RNAs are strongly coupled to the RNA structures and proper structure change, which naturally leads to the RNA folding problem including structure prediction and folding kinetics. Due to the polyanionic nature of RNAs, RNA folding structure, stability and kinetics are strongly coupled to the ion condition of solution. The main focus of this chapter is to review the recent progress in the three major aspects in RNA folding problem: structure prediction, folding kinetics and ion electrostatics. This chapter will introduce both the recent experimental and theoretical progress, while emphasize the theoretical modelling on the three aspects in RNA folding.
Constraint Logic Programming approach to protein structure prediction

Directory of Open Access Journals (Sweden)

Fogolari Federico

2004-11-01

Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.
Constraint Logic Programming approach to protein structure prediction.

Science.gov (United States)

Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

2004-11-30

The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.
Association Rule-based Predictive Model for Machine Failure in Industrial Internet of Things

Science.gov (United States)

Kwon, Jung-Hyok; Lee, Sol-Bee; Park, Jaehoon; Kim, Eui-Jik

2017-09-01

This paper proposes an association rule-based predictive model for machine failure in industrial Internet of things (IIoT), which can accurately predict the machine failure in real manufacturing environment by investigating the relationship between the cause and type of machine failure. To develop the predictive model, we consider three major steps: 1) binarization, 2) rule creation, 3) visualization. The binarization step translates item values in a dataset into one or zero, then the rule creation step creates association rules as IF-THEN structures using the Lattice model and Apriori algorithm. Finally, the created rules are visualized in various ways for users’ understanding. An experimental implementation was conducted using R Studio version 3.3.2. The results show that the proposed predictive model realistically predicts machine failure based on association rules.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP

Directory of Open Access Journals (Sweden)

Li Deng

2015-01-01

Full Text Available In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming, using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model’s input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators’ operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency.
Operating Comfort Prediction Model of Human-Machine Interface Layout for Cabin Based on GEP.

Science.gov (United States)

Deng, Li; Wang, Guohua; Chen, Bo

2015-01-01

In view of the evaluation and decision-making problem of human-machine interface layout design for cabin, the operating comfort prediction model is proposed based on GEP (Gene Expression Programming), using operating comfort to evaluate layout scheme. Through joint angles to describe operating posture of upper limb, the joint angles are taken as independent variables to establish the comfort model of operating posture. Factor analysis is adopted to decrease the variable dimension; the model's input variables are reduced from 16 joint angles to 4 comfort impact factors, and the output variable is operating comfort score. The Chinese virtual human body model is built by CATIA software, which will be used to simulate and evaluate the operators' operating comfort. With 22 groups of evaluation data as training sample and validation sample, GEP algorithm is used to obtain the best fitting function between the joint angles and the operating comfort; then, operating comfort can be predicted quantitatively. The operating comfort prediction result of human-machine interface layout of driller control room shows that operating comfort prediction model based on GEP is fast and efficient, it has good prediction effect, and it can improve the design efficiency.
Prediction of Protein-Protein Interactions Related to Protein Complexes Based on Protein Interaction Networks

Directory of Open Access Journals (Sweden)

Peng Liu

2015-01-01

Full Text Available A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptive k-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology.
Understanding the Molecular Determinant of Reversible Human Monoamine Oxidase B Inhibitors Containing 2H-Chromen-2-One Core: Structure-Based and Ligand-Based Derived Three-Dimensional Quantitative Structure-Activity Relationships Predictive Models.

Science.gov (United States)

Mladenović, Milan; Patsilinakos, Alexandros; Pirolli, Adele; Sabatino, Manuela; Ragno, Rino

2017-04-24

Monoamine oxidase B (MAO B) catalyzes the oxidative deamination of aryalkylamines neurotransmitters with concomitant reduction of oxygen to hydrogen peroxide. Consequently, the enzyme's malfunction can induce oxidative damage to mitochondrial DNA and mediates development of Parkinson's disease. Thus, MAO B emerges as a promising target for developing pharmaceuticals potentially useful to treat this vicious neurodegenerative condition. Aiming to contribute to the development of drugs with the reversible mechanism of MAO B inhibition only, herein, an extended in silico-in vitro procedure for the selection of novel MAO B inhibitors is demonstrated, including the following: (1) definition of optimized and validated structure-based three-dimensional (3-D) quantitative structure-activity relationships (QSAR) models derived from available cocrystallized inhibitor-MAO B complexes; (2) elaboration of SAR features for either irreversible or reversible MAO B inhibitors to characterize and improve coumarin-based inhibitor activity (Protein Data Bank ID: 2V61 ) as the most potent reversible lead compound; (3) definition of structure-based (SB) and ligand-based (LB) alignment rule assessments by which virtually any untested potential MAO B inhibitor might be evaluated; (4) predictive ability validation of the best 3-D QSAR model through SB/LB modeling of four coumarin-based external test sets (267 compounds); (5) design and SB/LB alignment of novel coumarin-based scaffolds experimentally validated through synthesis and biological evaluation in vitro. Due to the wide range of molecular diversity within the 3-D QSAR training set and derived features, the selected N probe-derived 3-D QSAR model proves to be a valuable tool for virtual screening (VS) of novel MAO B inhibitors and a platform for design, synthesis and evaluation of novel active structures. Accordingly, six highly active and selective MAO B inhibitors (picomolar to low nanomolar range of activity) were disclosed as a
Multivariate Methods for Prediction of Geologic Sample Composition with Laser-Induced Breakdown Spectroscopy

Science.gov (United States)

Morris, Richard; Anderson, R.; Clegg, S. M.; Bell, J. F., III

2010-01-01

Laser-induced breakdown spectroscopy (LIBS) uses pulses of laser light to ablate a material from the surface of a sample and produce an expanding plasma. The optical emission from the plasma produces a spectrum which can be used to classify target materials and estimate their composition. The ChemCam instrument on the Mars Science Laboratory (MSL) mission will use LIBS to rapidly analyze targets remotely, allowing more resource- and time-intensive in-situ analyses to be reserved for targets of particular interest. ChemCam will also be used to analyze samples that are not reachable by the rover's in-situ instruments. Due to these tactical and scientific roles, it is important that ChemCam-derived sample compositions are as accurate as possible. We have compared the results of partial least squares (PLS), multilayer perceptron (MLP) artificial neural networks (ANNs), and cascade correlation (CC) ANNs to determine which technique yields better estimates of quantitative element abundances in rock and mineral samples. The number of hidden nodes in the MLP ANNs was optimized using a genetic algorithm. The influence of two data preprocessing techniques were also investigated: genetic algorithm feature selection and averaging the spectra for each training sample prior to training the PLS and ANN algorithms. We used a ChemCam-like laboratory stand-off LIBS system to collect spectra of 30 pressed powder geostandards and a diverse suite of 196 geologic slab samples of known bulk composition. We tested the performance of PLS and ANNs on a subset of these samples, choosing to focus on silicate rocks and minerals with a loss on ignition of less than 2 percent. This resulted in a set of 22 pressed powder geostandards and 80 geologic samples. Four of the geostandards were used as a validation set and 18 were used as the training set for the algorithms. We found that PLS typically resulted in the lowest average absolute error in its predictions, but that the optimized MLP ANN and
Intelligent-based Structural Damage Detection Model

International Nuclear Information System (INIS)

Lee, Eric Wai Ming; Yu, K.F.

2010-01-01

This paper presents the application of a novel Artificial Neural Network (ANN) model for the diagnosis of structural damage. The ANN model, denoted as the GRNNFA, is a hybrid model combining the General Regression Neural Network Model (GRNN) and the Fuzzy ART (FA) model. It not only retains the important features of the GRNN and FA models (i.e. fast and stable network training and incremental growth of network structure) but also facilitates the removal of the noise embedded in the training samples. Structural damage alters the stiffness distribution of the structure and so as to change the natural frequencies and mode shapes of the system. The measured modal parameter changes due to a particular damage are treated as patterns for that damage. The proposed GRNNFA model was trained to learn those patterns in order to detect the possible damage location of the structure. Simulated data is employed to verify and illustrate the procedures of the proposed ANN-based damage diagnosis methodology. The results of this study have demonstrated the feasibility of applying the GRNNFA model to structural damage diagnosis even when the training samples were noise contaminated.
Intelligent-based Structural Damage Detection Model

Science.gov (United States)

Lee, Eric Wai Ming; Yu, Kin Fung

2010-05-01

This paper presents the application of a novel Artificial Neural Network (ANN) model for the diagnosis of structural damage. The ANN model, denoted as the GRNNFA, is a hybrid model combining the General Regression Neural Network Model (GRNN) and the Fuzzy ART (FA) model. It not only retains the important features of the GRNN and FA models (i.e. fast and stable network training and incremental growth of network structure) but also facilitates the removal of the noise embedded in the training samples. Structural damage alters the stiffness distribution of the structure and so as to change the natural frequencies and mode shapes of the system. The measured modal parameter changes due to a particular damage are treated as patterns for that damage. The proposed GRNNFA model was trained to learn those patterns in order to detect the possible damage location of the structure. Simulated data is employed to verify and illustrate the procedures of the proposed ANN-based damage diagnosis methodology. The results of this study have demonstrated the feasibility of applying the GRNNFA model to structural damage diagnosis even when the training samples were noise contaminated.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

Science.gov (United States)

Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

2016-01-11

Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
G-cimp status prediction of glioblastoma samples using mRNA expression data.

Science.gov (United States)

Baysan, Mehmet; Bozdag, Serdar; Cam, Margaret C; Kotliarova, Svetlana; Ahn, Susie; Walling, Jennifer; Killian, Jonathan K; Stevenson, Holly; Meltzer, Paul; Fine, Howard A

2012-01-01

Glioblastoma Multiforme (GBM) is a tumor with high mortality and no known cure. The dramatic molecular and clinical heterogeneity seen in this tumor has led to attempts to define genetically similar subgroups of GBM with the hope of developing tumor specific therapies targeted to the unique biology within each of these subgroups. Recently, a subset of relatively favorable prognosis GBMs has been identified. These glioma CpG island methylator phenotype, or G-CIMP tumors, have distinct genomic copy number aberrations, DNA methylation patterns, and (mRNA) expression profiles compared to other GBMs. While the standard method for identifying G-CIMP tumors is based on genome-wide DNA methylation data, such data is often not available compared to the more widely available gene expression data. In this study, we have developed and evaluated a method to predict the G-CIMP status of GBM samples based solely on gene expression data.
SGC method for predicting the standard enthalpy of formation of pure compounds from their molecular structures

International Nuclear Information System (INIS)

Albahri, Tareq A.; Aljasmi, Abdulla F.

2013-01-01

Highlights: • ΔH° f is predicted from the molecular structure of the compounds alone. • ANN-SGC model predicts ΔH° f with a correlation coefficient of 0.99. • ANN-MNLR model predicts ΔH° f with a correlation coefficient of 0.90. • Better definition of the atom-type molecular groups is presented. • The method is better than others in terms of combined simplicity, accuracy and generality. - Abstract: A theoretical method for predicting the standard enthalpy of formation of pure compounds from various chemical families is presented. Back propagation artificial neural networks were used to investigate several structural group contribution (SGC) methods available in literature. The networks were used to probe the structural groups that have significant contribution to the overall enthalpy of formation property of pure compounds and arrive at the set of groups that can best represent the enthalpy of formation for about 584 substances. The 51 atom-type structural groups listed provide better definitions of group contributions than others in the literature. The proposed method can predict the standard enthalpy of formation of pure compounds with an AAD of 11.38 kJ/mol and a correlation coefficient of 0.9934 from only their molecular structure. The results are further compared with those of the traditional SGC method based on MNLR as well as other methods in the literature
A Comparative Taxonomy of Parallel Algorithms for RNA Secondary Structure Prediction

Science.gov (United States)

Al-Khatib, Ra’ed M.; Abdullah, Rosni; Rashid, Nur’Aini Abdul

2010-01-01

RNA molecules have been discovered playing crucial roles in numerous biological and medical procedures and processes. RNA structures determination have become a major problem in the biology context. Recently, computer scientists have empowered the biologists with RNA secondary structures that ease an understanding of the RNA functions and roles. Detecting RNA secondary structure is an NP-hard problem, especially in pseudoknotted RNA structures. The detection process is also time-consuming; as a result, an alternative approach such as using parallel architectures is a desirable option. The main goal in this paper is to do an intensive investigation of parallel methods used in the literature to solve the demanding issues, related to the RNA secondary structure prediction methods. Then, we introduce a new taxonomy for the parallel RNA folding methods. Based on this proposed taxonomy, a systematic and scientific comparison is performed among these existing methods. PMID:20458364
CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway.

Science.gov (United States)

Zhou, Jiyun; Wang, Hongpeng; Zhao, Zhishan; Xu, Ruifeng; Lu, Qin

2018-05-08

Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
External validation of structure-biodegradation relationship (SBR) models for predicting the biodegradability of xenobiotics.

Science.gov (United States)

Devillers, J; Pandard, P; Richard, B

2013-01-01

Biodegradation is an important mechanism for eliminating xenobiotics by biotransforming them into simple organic and inorganic products. Faced with the ever growing number of chemicals available on the market, structure-biodegradation relationship (SBR) and quantitative structure-biodegradation relationship (QSBR) models are increasingly used as surrogates of the biodegradation tests. Such models have great potential for a quick and cheap estimation of the biodegradation potential of chemicals. The Estimation Programs Interface (EPI) Suite™ includes different models for predicting the potential aerobic biodegradability of organic substances. They are based on different endpoints, methodologies and/or statistical approaches. Among them, Biowin 5 and 6 appeared the most robust, being derived from the largest biodegradation database with results obtained only from the Ministry of International Trade and Industry (MITI) test. The aim of this study was to assess the predictive performances of these two models from a set of 356 chemicals extracted from notification dossiers including compatible biodegradation data. Another set of molecules with no more than four carbon atoms and substituted by various heteroatoms and/or functional groups was also embodied in the validation exercise. Comparisons were made with the predictions obtained with START (Structural Alerts for Reactivity in Toxtree). Biowin 5 and Biowin 6 gave satisfactorily prediction results except for the prediction of readily degradable chemicals. A consensus model built with Biowin 1 allowed the diminution of this tendency.
Comparing Structural Identification Methodologies for Fatigue Life Prediction of a Highway Bridge

OpenAIRE

Pai, Sai G.S.; Nussbaumer, Alain; Smith, Ian F.C.

2018-01-01

Accurate measurement-data interpretation leads to increased understanding of structural behavior and enhanced asset-management decision making. In this paper, four data-interpretation methodologies, residual minimization, traditional Bayesian model updating, modified Bayesian model updating (with an L∞-norm-based Gaussian likelihood function), and error-domain model falsification (EDMF), a method that rejects models that have unlikely differences between predictions and measurements, are comp...
Predictive modelling for shelf life determination of nutricereal based fermented baby food.

Science.gov (United States)

Rasane, Prasad; Jha, Alok; Sharma, Nitya

2015-08-01

A shelf life model based on storage temperatures was developed for a nutricereal based fermented baby food formulation. The formulated baby food samples were packaged and stored at 10, 25, 37 and 45 °C for a test storage period of 180 days. A shelf life study was conducted using consumer and semi-trained panels, along with chemical analysis (moisture and acidity). The chemical parameters (moisture and titratable acidity) were found inadequate in determining the shelf life of the formulated product. Weibull hazard analysis was used to determine the shelf life of the product based on sensory evaluation. Considering 25 and 50 % rejection probability, the shelf life of the baby food formulation was predicted to be 98 and 322 days, 84 and 271 days, 71 and 221 days and 58 and 171 days for the samples stored at 10, 25, 37 and 45 °C, respectively. A shelf life equation was proposed using the rejection times obtained from the consumer study. Finally, the formulated baby food samples were subjected to microbial analysis for the predicted shelf life period and were found microbiologically safe for consumption during the storage period of 360 days.
Nonlinear Model Predictive Control Based on a Self-Organizing Recurrent Neural Network.

Science.gov (United States)

Han, Hong-Gui; Zhang, Lu; Hou, Ying; Qiao, Jun-Fei

2016-02-01

A nonlinear model predictive control (NMPC) scheme is developed in this paper based on a self-organizing recurrent radial basis function (SR-RBF) neural network, whose structure and parameters are adjusted concurrently in the training process. The proposed SR-RBF neural network is represented in a general nonlinear form for predicting the future dynamic behaviors of nonlinear systems. To improve the modeling accuracy, a spiking-based growing and pruning algorithm and an adaptive learning algorithm are developed to tune the structure and parameters of the SR-RBF neural network, respectively. Meanwhile, for the control problem, an improved gradient method is utilized for the solution of the optimization problem in NMPC. The stability of the resulting control system is proved based on the Lyapunov stability theory. Finally, the proposed SR-RBF neural network-based NMPC (SR-RBF-NMPC) is used to control the dissolved oxygen (DO) concentration in a wastewater treatment process (WWTP). Comparisons with other existing methods demonstrate that the SR-RBF-NMPC can achieve a considerably better model fitting for WWTP and a better control performance for DO concentration.

Prediction of backbone dihedral angles and protein secondary structure using support vector machines

Directory of Open Access Journals (Sweden)

Hirst Jonathan D

2009-12-01

Full Text Available Abstract Background The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure. Results We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods. Conclusions We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at http://comp.chem.nottingham.ac.uk/disspred/.
FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling.

Science.gov (United States)

Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

2016-07-01

Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Predicting Liaison: an Example-Based Approach

NARCIS (Netherlands)

Greefhorst, A.P.M.; Bosch, A.P.J. van den

2016-01-01

Predicting liaison in French is a non-trivial problem to model. We compare a memory-based machine-learning algorithm with a rule-based baseline. The memory-based learner is trained to predict whether liaison occurs between two words on the basis of lexical, orthographic, morphosyntactic, and
The Structural and Predictive Properties of the Psychopathy Checklist-Revised in Canadian Aboriginal and Non-Aboriginal Offenders

Science.gov (United States)

Olver, Mark E.; Neumann, Craig S.; Wong, Stephen C. P.; Hare, Robert D.

2013-01-01

We examined the structural and predictive properties of the Psychopathy Checklist-Revised (PCL-R) in large samples of Canadian male Aboriginal and non-Aboriginal offenders. The PCL-R ratings were part of a risk assessment for criminal recidivism, with a mean follow-up of 26 months postrelease. Using multigroup confirmatory factor analysis, we were…
Remaining useful life prediction based on variation coefficient consistency test of a Wiener process

Directory of Open Access Journals (Sweden)

Juan LI

2018-01-01

Full Text Available High-cost equipment is often reused after maintenance, and whether the information before the maintenance can be used for the Remaining Useful Life (RUL prediction after the maintenance is directly determined by the consistency of the degradation pattern before and after the maintenance. Aiming at this problem, an RUL prediction method based on the consistency test of a Wiener process is proposed. Firstly, the parameters of the Wiener process estimated by Maximum Likelihood Estimation (MLE are proved to be biased, and a modified unbiased estimation method is proposed and verified by derivation and simulations. Then, the h statistic is constructed according to the reciprocal of the variation coefficient of the Wiener process, and the sampling distribution is derived. Meanwhile, a universal method for the consistency test is proposed based on the sampling distribution theorem, which is verified by simulation data and classical crack degradation data. Finally, based on the consistency test of the degradation model, a weighted fusion RUL prediction method is presented for the fuel pump of an airplane, and the validity of the presented method is verified by accurate computation results of real data, which provides a theoretical and practical guidance for engineers to predict the RUL of equipment after maintenance.
Immobilized metal-affinity chromatography protein-recovery screening is predictive of crystallographic structure success

International Nuclear Information System (INIS)

Choi, Ryan; Kelley, Angela; Leibly, David; Nakazawa Hewitt, Stephen; Napuli, Alberto; Van Voorhis, Wesley

2011-01-01

An overview of the methods used for high-throughput cloning and protein-expression screening of SSGCID hexahistidine recombinant proteins is provided. It is demonstrated that screening for recombinant proteins that are highly recoverable from immobilized metal-affinity chromatography improves the likelihood that a protein will produce a structure. The recombinant expression of soluble proteins in Escherichia coli continues to be a major bottleneck in structural genomics. The establishment of reliable protocols for the performance of small-scale expression and solubility testing is an essential component of structural genomic pipelines. The SSGCID Protein Production Group at the University of Washington (UW-PPG) has developed a high-throughput screening (HTS) protocol for the measurement of protein recovery from immobilized metal-affinity chromatography (IMAC) which predicts successful purification of hexahistidine-tagged proteins. The protocol is based on manual transfer of samples using multichannel pipettors and 96-well plates and does not depend on the use of robotic platforms. This protocol has been applied to evaluate the expression and solubility of more than 4000 proteins expressed in E. coli. The UW-PPG also screens large-scale preparations for recovery from IMAC prior to purification. Analysis of these results show that our low-cost non-automated approach is a reliable method for the HTS demands typical of large structural genomic projects. This paper provides a detailed description of these protocols and statistical analysis of the SSGCID screening results. The results demonstrate that screening for proteins that yield high recovery after IMAC, both after small-scale and large-scale expression, improves the selection of proteins that can be successfully purified and will yield a crystal structure
Structure-based prediction of free energy changes of binding of PTP1B inhibitors

Science.gov (United States)

Wang, Jing; Ling Chan, Shek; Ramnarayan, Kal

2003-08-01

The goals were (1) to understand the driving forces in the binding of small molecule inhibitors to the active site of PTP1B and (2) to develop a molecular mechanics-based empirical free energy function for compound potency prediction. A set of compounds with known activities was docked onto the active site. The related energy components and molecular surface areas were calculated. The bridging water molecules were identified and their contributions were considered. Linear relationships were explored between the above terms and the binding free energies of compounds derived based on experimental inhibition constants. We found that minimally three terms are required to give rise to a good correlation (0.86) with predictive power in five-group cross-validation test (q2 = 0.70). The dominant terms are the electrostatic energy and non-electrostatic energy stemming from the intra- and intermolecular interactions of solutes and from those of bridging water molecules in complexes.
Energy and carbon emissions analysis and prediction of complex petrochemical systems based on an improved extreme learning machine integrated interpretative structural model

International Nuclear Information System (INIS)

Han, Yongming; Zhu, Qunxiong; Geng, Zhiqiang; Xu, Yuan

2017-01-01

Highlights: • The ELM integrated ISM (ISM-ELM) method is proposed. • The proposed method is more efficient and accurate than the ELM through the UCI data set. • Energy and carbon emissions analysis and prediction of petrochemical industries based ISM-ELM is obtained. • The proposed method is valid in improving energy efficiency and reducing carbon emissions of ethylene plants. - Abstract: Energy saving and carbon emissions reduction of the petrochemical industry are affected by many factors. Thus, it is difficult to analyze and optimize the energy of complex petrochemical systems accurately. This paper proposes an energy and carbon emissions analysis and prediction approach based on an improved extreme learning machine (ELM) integrated interpretative structural model (ISM) (ISM-ELM). ISM based the partial correlation coefficient is utilized to analyze key parameters that affect the energy and carbon emissions of the complex petrochemical system, and can denoise and reduce dimensions of data to decrease the training time and errors of the ELM prediction model. Meanwhile, in terms of the model accuracy and the training time, the robustness and effectiveness of the ISM-ELM model are better than the ELM through standard data sets from the University of California Irvine (UCI) repository. Moreover, a multi-inputs and single-output (MISO) model of energy and carbon emissions of complex ethylene systems is established based on the ISM-ELM. Finally, detailed analyses and simulations using the real ethylene plant data demonstrate the effectiveness of the ISM-ELM and can guide the improvement direction of energy saving and carbon emissions reduction in complex petrochemical systems.
Do Culture-based Segments Predict Selection of Market Strategy?

Directory of Open Access Journals (Sweden)

Veronika Jadczaková

2015-01-01

Full Text Available Academists and practitioners have already acknowledged the importance of unobservable segmentation bases (such as psychographics yet still focusing on how well these bases are capable of describing relevant segments (the identifiability criterion rather than on how precisely these segments can predict (the predictability criterion. Therefore, this paper intends to add a debate to this topic by exploring whether culture-based segments do account for a selection of market strategy. To do so, a set of market strategy variables over a sample of 251 manufacturing firms was first regressed on a set of 19 cultural variables using canonical correlation analysis. Having found significant relationship in the first canonical function, it was further examined by means of correspondence analysis which cultural segments – if any – are linked to which market strategies. However, as correspondence analysis failed to find a significant relationship, it may be concluded that business culture might relate to the adoption of market strategy but not to the cultural groupings presented in the paper.
Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA.

Science.gov (United States)

Xia, Fei; Jin, Guoqing

2014-06-01

PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.
Comparative analysis of QSAR models for predicting pK(a) of organic oxygen acids and nitrogen bases from molecular structure.

Science.gov (United States)

Yu, Haiying; Kühne, Ralph; Ebert, Ralf-Uwe; Schüürmann, Gerrit

2010-11-22

For 1143 organic compounds comprising 580 oxygen acids and 563 nitrogen bases that cover more than 17 orders of experimental pK(a) (from -5.00 to 12.23), the pK(a) prediction performances of ACD, SPARC, and two calibrations of a semiempirical quantum chemical (QC) AM1 approach have been analyzed. The overall root-mean-square errors (rms) for the acids are 0.41, 0.58 (0.42 without ortho-substituted phenols with intramolecular H-bonding), and 0.55 and for the bases are 0.65, 0.70, 1.17, and 1.27 for ACD, SPARC, and both QC methods, respectively. Method-specific performances are discussed in detail for six acid subsets (phenols and aromatic and aliphatic carboxylic acids with different substitution patterns) and nine base subsets (anilines, primary, secondary and tertiary amines, meta/para-substituted and ortho-substituted pyridines, pyrimidines, imidazoles, and quinolines). The results demonstrate an overall better performance for acids than for bases but also a substantial variation across subsets. For the overall best-performing ACD, rms ranges from 0.12 to 1.11 and 0.40 to 1.21 pK(a) units for the acid and base subsets, respectively. With regard to the squared correlation coefficient r², the results are 0.86 to 0.96 (acids) and 0.79 to 0.95 (bases) for ACD, 0.77 to 0.95 (acids) and 0.85 to 0.97 (bases) for SPARC, and 0.64 to 0.87 (acids) and 0.43 to 0.83 (bases) for the QC methods, respectively. Attention is paid to structural and method-specific causes for observed pitfalls. The significant subset dependence of the prediction performances suggests a consensus modeling approach.
Prediction of potential drug targets based on simple sequence properties

Directory of Open Access Journals (Sweden)

Lai Luhua

2007-09-01

Full Text Available Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.
CASP10-BCL::Fold efficiently samples topologies of large proteins.

Science.gov (United States)

Heinze, Sten; Putnam, Daniel K; Fischer, Axel W; Kohlmann, Tim; Weiner, Brian E; Meiler, Jens

2015-03-01

During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein. © 2015 Wiley Periodicals, Inc.
Silicon based ultrafast optical waveform sampling

DEFF Research Database (Denmark)

Ji, Hua; Galili, Michael; Pu, Minhao

2010-01-01

A 300 nmx450 nmx5 mm silicon nanowire is designed and fabricated for a four wave mixing based non-linear optical gate. Based on this silicon nanowire, an ultra-fast optical sampling system is successfully demonstrated using a free-running fiber laser with a carbon nanotube-based mode-locker as th......A 300 nmx450 nmx5 mm silicon nanowire is designed and fabricated for a four wave mixing based non-linear optical gate. Based on this silicon nanowire, an ultra-fast optical sampling system is successfully demonstrated using a free-running fiber laser with a carbon nanotube-based mode......-locker as the sampling source. A clear eye-diagram of a 320 Gbit/s data signal is obtained. The temporal resolution of the sampling system is estimated to 360 fs....
Using quantitative structure-activity relationships (QSAR) to predict toxic endpoints for polycyclic aromatic hydrocarbons (PAH).

Science.gov (United States)

Bruce, Erica D; Autenrieth, Robin L; Burghardt, Robert C; Donnelly, K C; McDonald, Thomas J

2008-01-01

Quantitative structure-activity relationships (QSAR) offer a reliable, cost-effective alternative to the time, money, and animal lives necessary to determine chemical toxicity by traditional methods. Additionally, humans are exposed to tens of thousands of chemicals in their lifetimes, necessitating the determination of chemical toxicity and screening for those posing the greatest risk to human health. This study developed models to predict toxic endpoints for three bioassays specific to several stages of carcinogenesis. The ethoxyresorufin O-deethylase assay (EROD), the Salmonella/microsome assay, and a gap junction intercellular communication (GJIC) assay were chosen for their ability to measure toxic endpoints specific to activation-, induction-, and promotion-related effects of polycyclic aromatic hydrocarbons (PAH). Shape-electronic, spatial, information content, and topological descriptors proved to be important descriptors in predicting the toxicity of PAH in these bioassays. Bioassay-based toxic equivalency factors (TEF(B)) were developed for several PAH using the quantitative structure-toxicity relationships (QSTR) developed. Predicting toxicity for a specific PAH compound, such as a bioassay-based potential potency (PP(B)) or a TEF(B), is possible by combining the predicted behavior from the QSTR models. These toxicity estimates may then be incorporated into a risk assessment for compounds that lack toxicity data. Accurate toxicity predictions are made by examining each type of endpoint important to the process of carcinogenicity, and a clearer understanding between composition and toxicity can be obtained.
Reliability analysis of offshore structures using OMA based fatigue stresses

DEFF Research Database (Denmark)

Silva Nabuco, Bruna; Aissani, Amina; Glindtvad Tarpø, Marius

2017-01-01

focus is on the uncertainty observed on the different stresses used to predict the damage. This uncertainty can be reduced by Modal Based Fatigue Monitoring which is a technique based on continuously measuring of the accelerations in few points of the structure with the use of accelerometers known...... points of the structure, the stress history can be calculated in any arbitrary point of the structure. The accuracy of the estimated actual stress is analyzed by experimental tests on a scale model where the obtained stresses are compared to strain gauges measurements. After evaluating the fatigue...... stresses directly from the operational response of the structure, a reliability analysis is performed in order to estimate the reliability of using Modal Based Fatigue Monitoring for long term fatigue studies....
Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods.

Science.gov (United States)

Burbrink, Frank T; McKelvy, Alexander D; Pyron, R Alexander; Myers, Edward A

2015-11-22

Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities. © 2015 The Author(s).
A Knowledge-Base for a Personalized Infectious Disease Risk Prediction System.

Science.gov (United States)

Vinarti, Retno; Hederman, Lucy

2018-01-01

We present a knowledge-base to represent collated infectious disease risk (IDR) knowledge. The knowledge is about personal and contextual risk of contracting an infectious disease obtained from declarative sources (e.g. Atlas of Human Infectious Diseases). Automated prediction requires encoding this knowledge in a form that can produce risk probabilities (e.g. Bayesian Network - BN). The knowledge-base presented in this paper feeds an algorithm that can auto-generate the BN. The knowledge from 234 infectious diseases was compiled. From this compilation, we designed an ontology and five rule types for modelling IDR knowledge in general. The evaluation aims to assess whether the knowledge-base structure, and its application to three disease-country contexts, meets the needs of personalized IDR prediction system. From the evaluation results, the knowledge-base conforms to the system's purpose: personalization of infectious disease risk.
Methodology for predicting ultimate pressure capacity of the ACR-1000 containment structure

International Nuclear Information System (INIS)

Saudy, A.M.; Awad, A.; Elgohary, M.

2006-01-01

The Advanced CANDU Reactor or the ACR-1000 is developed by Atomic Energy of Canada Limited (AECL) to be the next step in the evolution of the CANDU product line. It is based on the proven CANDU technology and incorporates advanced design technologies. The ACR containment structure is an essential element of the overall defense in depth approach to reactor safety, and is a physical barrier against the release of radioactive material to the environment. Therefore, it is important to provide a robust design with an adequate margin of safety. One of the key design requirements of the ACR containment structure is to have an ultimate pressure capacity that is at least twice the design pressure Using standard design codes, the containment structure is expected to behave elastically at least up to 1.5 times the design pressure. Beyond this pressure level, the concrete containment structure with reinforcements and post-tension tendons behaves in a highly non-linear manner and exhibits a complex response when cracks initiate and propagate. To predict the structural non-linear responses, at least two critical features are involved. These are: the structural idealization by the geometry and material property models, and the adopted solution algorithm. Therefore, detailed idealization of the concrete structure is needed in order to accurately predict its ultimate pressure capacity. This paper summarizes the analysis methodology to be carried out to establish the ultimate pressure capacity of the ACR containment structure and to confirm that the structure meets the specified design requirements. (author)
The Effect of Sample Size and Data Numbering on Precision of Calibration Model to predict Soil Properties

Directory of Open Access Journals (Sweden)

H Mohamadi Monavar

2017-10-01

Full Text Available Introduction Precision agriculture (PA is a technology that measures and manages within-field variability, such as physical and chemical properties of soil. The nondestructive and rapid VIS-NIR technology detected a significant correlation between reflectance spectra and the physical and chemical properties of soil. On the other hand, quantitatively predict of soil factors such as nitrogen, carbon, cation exchange capacity and the amount of clay in precision farming is very important. The emphasis of this paper is comparing different techniques of choosing calibration samples such as randomly selected method, chemical data and also based on PCA. Since increasing the number of samples is usually time-consuming and costly, then in this study, the best sampling way -in available methods- was predicted for calibration models. In addition, the effect of sample size on the accuracy of the calibration and validation models was analyzed. Materials and Methods Two hundred and ten soil samples were collected from cultivated farm located in Avarzaman in Hamedan province, Iran. The crop rotation was mostly potato and wheat. Samples were collected from a depth of 20 cm above ground and passed through a 2 mm sieve and air dried at room temperature. Chemical analysis was performed in the soil science laboratory, faculty of agriculture engineering, Bu-ali Sina University, Hamadan, Iran. Two Spectrometer (AvaSpec-ULS 2048- UV-VIS and (FT-NIR100N were used to measure the spectral bands which cover the UV-Vis and NIR region (220-2200 nm. Each soil sample was uniformly tiled in a petri dish and was scanned 20 times. Then the pre-processing methods of multivariate scatter correction (MSC and base line correction (BC were applied on the raw signals using Unscrambler software. The samples were divided into two groups: one group for calibration 105 and the second group was used for validation. Each time, 15 samples were selected randomly and tested the accuracy of

Robust self-triggered model predictive control for constrained discrete-time LTI systems based on homothetic tubes

NARCIS (Netherlands)

Aydiner, E.; Brunner, F.D.; Heemels, W.P.M.H.; Allgower, F.

2015-01-01

In this paper we present a robust self-triggered model predictive control (MPC) scheme for discrete-time linear time-invariant systems subject to input and state constraints and additive disturbances. In self-triggered model predictive control, at every sampling instant an optimization problem based
Prediction of Protein-Protein Interaction By Metasample-Based Sparse Representation

Directory of Open Access Journals (Sweden)

Xiuquan Du

2015-01-01

Full Text Available Protein-protein interactions (PPIs play key roles in many cellular processes such as transcription regulation, cell metabolism, and endocrine function. Understanding these interactions takes a great promotion to the pathogenesis and treatment of various diseases. A large amount of data has been generated by experimental techniques; however, most of these data are usually incomplete or noisy, and the current biological experimental techniques are always very time-consuming and expensive. In this paper, we proposed a novel method (metasample-based sparse representation classification, MSRC for PPIs prediction. A group of metasamples are extracted from the original training samples and then use the l1-regularized least square method to express a new testing sample as the linear combination of these metasamples. PPIs prediction is achieved by using a discrimination function defined in the representation coefficients. The MSRC is applied to PPIs dataset; it achieves 84.9% sensitivity, and 94.55% specificity, which is slightly lower than support vector machine (SVM and much higher than naive Bayes (NB, neural networks (NN, and k-nearest neighbor (KNN. The result shows that the MSRC is efficient for PPIs prediction.
Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

Science.gov (United States)

Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

2018-02-23

Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.
Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion)

NARCIS (Netherlands)

Brus, D.J.; Gruijter, de J.J.

1997-01-01

Classical sampling theory has been repeatedly identified with classical statistics which assumes that data are identically and independently distributed. This explains the switch of many soil scientists from design-based sampling strategies, based on classical sampling theory, to the model-based
Interior noise analysis of a construction equipment cabin based on airborne and structure-borne noise predictions

Energy Technology Data Exchange (ETDEWEB)

Kim, Sung Hee; Hong, Suk Yoon [Seoul National University, Seoul (Korea, Republic of); Song, Jee Hun [Chonnam National University, Gwangju (Korea, Republic of); Joo, Won Ho [Hyundai Heavy Industries Co. Ltd, Ulsan (Korea, Republic of)

2012-04-15

Noise from construction equipment affects not only surrounding residents, but also the operators of the machines. Noise that affects drivers must be evaluated during the preliminary design stage. This paper suggests an interior noise analysis procedure for construction equipment cabins. The analysis procedure, which can be used in the preliminary design stage, was investigated for airborne and structure borne noise. The total interior noise of a cabin was predicted from the airborne noise analysis and structure-borne noise analysis. The analysis procedure consists of four steps: modeling, vibration analysis, acoustic analysis and total interior noise analysis. A mesh model of a cabin for numerical analysis was made at the modeling step. At the vibration analysis step, the mesh model was verified and modal analysis and frequency response analysis are performed. At the acoustic analysis step, the vibration results from the vibration analysis step were used as initial values for radiated noise analysis and noise reduction analysis. Finally, the total cabin interior noise was predicted using the acoustic results from the acoustic analysis step. Each step was applied to a cabin of a middle-sized excavator and verified by comparison with measured data. The cabin interior noise of a middle-sized wheel loader and a large-sized forklift were predicted using the analysis procedure of the four steps and were compared with measured data. The interior noise analysis procedure of construction equipment cabins is expected to be used during the preliminary design stage.
Interior noise analysis of a construction equipment cabin based on airborne and structure-borne noise predictions

International Nuclear Information System (INIS)

Kim, Sung Hee; Hong, Suk Yoon; Song, Jee Hun; Joo, Won Ho

2012-01-01

Noise from construction equipment affects not only surrounding residents, but also the operators of the machines. Noise that affects drivers must be evaluated during the preliminary design stage. This paper suggests an interior noise analysis procedure for construction equipment cabins. The analysis procedure, which can be used in the preliminary design stage, was investigated for airborne and structure borne noise. The total interior noise of a cabin was predicted from the airborne noise analysis and structure-borne noise analysis. The analysis procedure consists of four steps: modeling, vibration analysis, acoustic analysis and total interior noise analysis. A mesh model of a cabin for numerical analysis was made at the modeling step. At the vibration analysis step, the mesh model was verified and modal analysis and frequency response analysis are performed. At the acoustic analysis step, the vibration results from the vibration analysis step were used as initial values for radiated noise analysis and noise reduction analysis. Finally, the total cabin interior noise was predicted using the acoustic results from the acoustic analysis step. Each step was applied to a cabin of a middle-sized excavator and verified by comparison with measured data. The cabin interior noise of a middle-sized wheel loader and a large-sized forklift were predicted using the analysis procedure of the four steps and were compared with measured data. The interior noise analysis procedure of construction equipment cabins is expected to be used during the preliminary design stage
Data-driven sensor placement from coherent fluid structures

Science.gov (United States)

Manohar, Krithika; Kaiser, Eurika; Brunton, Bingni W.; Kutz, J. Nathan; Brunton, Steven L.

2017-11-01

Optimal sensor placement is a central challenge in the prediction, estimation and control of fluid flows. We reinterpret sensor placement as optimizing discrete samples of coherent fluid structures for full state reconstruction. This permits a drastic reduction in the number of sensors required for faithful reconstruction, since complex fluid interactions can often be described by a small number of coherent structures. Our work optimizes point sensors using the pivoted matrix QR factorization to sample coherent structures directly computed from flow data. We apply this sampling technique in conjunction with various data-driven modal identification methods, including the proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD). In contrast to POD-based sensors, DMD demonstrably enables the optimization of sensors for prediction in systems exhibiting multiple scales of dynamics. Finally, reconstruction accuracy from pivot sensors is shown to be competitive with sensors obtained using traditional computationally prohibitive optimization methods.
Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies.

KAUST Repository

Messih, Mario Abdel; Lepore, Rosalba; Marcatili, Paolo; Tramontano, Anna

2014-01-01

MOTIVATION: Antibodies are able to recognize a wide range of antigens through their complementary determining regions formed by six hypervariable loops. Predicting the 3D structure of these loops is essential for the analysis and reengineering of novel antibodies with enhanced affinity and specificity. The canonical structure model allows high accuracy prediction for five of the loops. The third loop of the heavy chain, H3, is the hardest to predict because of its diversity in structure, length and sequence composition. RESULTS: We describe a method, based on the Random Forest automatic learning technique, to select structural templates for H3 loops among a dataset of candidates. These can be used to predict the structure of the loop with a higher accuracy than that achieved by any of the presently available methods. The method also has the advantage of being extremely fast and returning a reliable estimate of the model quality. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at http://www.biocomputing.it/H3Loopred/ .
Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies.

KAUST Repository

Messih, Mario Abdel

2014-06-13

MOTIVATION: Antibodies are able to recognize a wide range of antigens through their complementary determining regions formed by six hypervariable loops. Predicting the 3D structure of these loops is essential for the analysis and reengineering of novel antibodies with enhanced affinity and specificity. The canonical structure model allows high accuracy prediction for five of the loops. The third loop of the heavy chain, H3, is the hardest to predict because of its diversity in structure, length and sequence composition. RESULTS: We describe a method, based on the Random Forest automatic learning technique, to select structural templates for H3 loops among a dataset of candidates. These can be used to predict the structure of the loop with a higher accuracy than that achieved by any of the presently available methods. The method also has the advantage of being extremely fast and returning a reliable estimate of the model quality. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at http://www.biocomputing.it/H3Loopred/ .
Polygenic scores predict alcohol problems in an independent sample and show moderation by the environment.

Science.gov (United States)

Salvatore, Jessica E; Aliev, Fazil; Edwards, Alexis C; Evans, David M; Macleod, John; Hickman, Matthew; Lewis, Glyn; Kendler, Kenneth S; Loukola, Anu; Korhonen, Tellervo; Latvala, Antti; Rose, Richard J; Kaprio, Jaakko; Dick, Danielle M

2014-04-10

Alcohol problems represent a classic example of a complex behavioral outcome that is likely influenced by many genes of small effect. A polygenic approach, which examines aggregate measured genetic effects, can have predictive power in cases where individual genes or genetic variants do not. In the current study, we first tested whether polygenic risk for alcohol problems-derived from genome-wide association estimates of an alcohol problems factor score from the age 18 assessment of the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 4304 individuals of European descent; 57% female)-predicted alcohol problems earlier in development (age 14) in an independent sample (FinnTwin12; n = 1162; 53% female). We then tested whether environmental factors (parental knowledge and peer deviance) moderated polygenic risk to predict alcohol problems in the FinnTwin12 sample. We found evidence for both polygenic association and for additive polygene-environment interaction. Higher polygenic scores predicted a greater number of alcohol problems (range of Pearson partial correlations 0.07-0.08, all p-values ≤ 0.01). Moreover, genetic influences were significantly more pronounced under conditions of low parental knowledge or high peer deviance (unstandardized regression coefficients (b), p-values (p), and percent of variance (R2) accounted for by interaction terms: b = 1.54, p = 0.02, R2 = 0.33%; b = 0.94, p = 0.04, R2 = 0.30%, respectively). Supplementary set-based analyses indicated that the individual top single nucleotide polymorphisms (SNPs) contributing to the polygenic scores were not individually enriched for gene-environment interaction. Although the magnitude of the observed effects are small, this study illustrates the usefulness of polygenic approaches for understanding the pathways by which measured genetic predispositions come together with environmental factors to predict complex behavioral outcomes.
Polygenic Scores Predict Alcohol Problems in an Independent Sample and Show Moderation by the Environment

Directory of Open Access Journals (Sweden)

Jessica E. Salvatore

2014-04-01

Full Text Available Alcohol problems represent a classic example of a complex behavioral outcome that is likely influenced by many genes of small effect. A polygenic approach, which examines aggregate measured genetic effects, can have predictive power in cases where individual genes or genetic variants do not. In the current study, we first tested whether polygenic risk for alcohol problems—derived from genome-wide association estimates of an alcohol problems factor score from the age 18 assessment of the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 4304 individuals of European descent; 57% female—predicted alcohol problems earlier in development (age 14 in an independent sample (FinnTwin12; n = 1162; 53% female. We then tested whether environmental factors (parental knowledge and peer deviance moderated polygenic risk to predict alcohol problems in the FinnTwin12 sample. We found evidence for both polygenic association and for additive polygene-environment interaction. Higher polygenic scores predicted a greater number of alcohol problems (range of Pearson partial correlations 0.07–0.08, all p-values ≤ 0.01. Moreover, genetic influences were significantly more pronounced under conditions of low parental knowledge or high peer deviance (unstandardized regression coefficients (b, p-values (p, and percent of variance (R2 accounted for by interaction terms: b = 1.54, p = 0.02, R2 = 0.33%; b = 0.94, p = 0.04, R2 = 0.30%, respectively. Supplementary set-based analyses indicated that the individual top single nucleotide polymorphisms (SNPs contributing to the polygenic scores were not individually enriched for gene-environment interaction. Although the magnitude of the observed effects are small, this study illustrates the usefulness of polygenic approaches for understanding the pathways by which measured genetic predispositions come together with environmental factors to predict complex behavioral outcomes.
An auxiliary optimization method for complex public transit route network based on link prediction

Science.gov (United States)

Zhang, Lin; Lu, Jian; Yue, Xianfei; Zhou, Jialin; Li, Yunxuan; Wan, Qian

2018-02-01

Inspired by the missing (new) link prediction and the spurious existing link identification in link prediction theory, this paper establishes an auxiliary optimization method for public transit route network (PTRN) based on link prediction. First, link prediction applied to PTRN is described, and based on reviewing the previous studies, the summary indices set and its algorithms set are collected for the link prediction experiment. Second, through analyzing the topological properties of Jinan’s PTRN established by the Space R method, we found that this is a typical small-world network with a relatively large average clustering coefficient. This phenomenon indicates that the structural similarity-based link prediction will show a good performance in this network. Then, based on the link prediction experiment of the summary indices set, three indices with maximum accuracy are selected for auxiliary optimization of Jinan’s PTRN. Furthermore, these link prediction results show that the overall layout of Jinan’s PTRN is stable and orderly, except for a partial area that requires optimization and reconstruction. The above pattern conforms to the general pattern of the optimal development stage of PTRN in China. Finally, based on the missing (new) link prediction and the spurious existing link identification, we propose optimization schemes that can be used not only to optimize current PTRN but also to evaluate PTRN planning.
G-cimp status prediction of glioblastoma samples using mRNA expression data.

Directory of Open Access Journals (Sweden)

Mehmet Baysan

Full Text Available Glioblastoma Multiforme (GBM is a tumor with high mortality and no known cure. The dramatic molecular and clinical heterogeneity seen in this tumor has led to attempts to define genetically similar subgroups of GBM with the hope of developing tumor specific therapies targeted to the unique biology within each of these subgroups. Recently, a subset of relatively favorable prognosis GBMs has been identified. These glioma CpG island methylator phenotype, or G-CIMP tumors, have distinct genomic copy number aberrations, DNA methylation patterns, and (mRNA expression profiles compared to other GBMs. While the standard method for identifying G-CIMP tumors is based on genome-wide DNA methylation data, such data is often not available compared to the more widely available gene expression data. In this study, we have developed and evaluated a method to predict the G-CIMP status of GBM samples based solely on gene expression data.
Prediction-error of Prediction Error (PPE)-based Reversible Data Hiding

OpenAIRE

Wu, Han-Zhou; Wang, Hong-Xia; Shi, Yun-Qing

2016-01-01

This paper presents a novel reversible data hiding (RDH) algorithm for gray-scaled images, in which the prediction-error of prediction error (PPE) of a pixel is used to carry the secret data. In the proposed method, the pixels to be embedded are firstly predicted with their neighboring pixels to obtain the corresponding prediction errors (PEs). Then, by exploiting the PEs of the neighboring pixels, the prediction of the PEs of the pixels can be determined. And, a sorting technique based on th...
Phosphate-based glasses: Prediction of acoustical properties

Science.gov (United States)

El-Moneim, Amin Abd

2016-04-01

In this work, a comprehensive study has been carried out to predict the composition dependence of bulk modulus and ultrasonic attenuation coefficient in the phosphate-based glass systems PbO-P2O5, Li2O-TeO2-B2O3-P2O5, TiO2-Na2O-CaO-P2O5 and Cr2O3-doped Na2O-ZnO-P2O5 at room temperature. The prediction is based on (i) Makishima-Mackenzie theory, which correlates the bulk modulus with packing density and dissociation energy per unit volume, and (ii) Our recently presented semi-empirical formulas, which correlate the ultrasonic attenuation coefficient with the oxygen density, mean atomic ring size, first-order stretching force constant and experimental bulk modulus. Results revealed that our recently presented semi-empirical formulas can be applied successfully to predict changes of ultrasonic attenuation coefficient in binary PbO-P2O5 glasses at 10 MHz frequency and in quaternary Li2O-TeO2-B2O3-P2O5, TiO2-Na2O-CaO-P2O5 and Cr2O3-Na2O-ZnO-P2O5 glasses at 5 MHz frequency. Also, Makishima-Mackenzie theory appears to be valid for the studied glasses if the effect of the basic structural units that present in the glass network is taken into account.
Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

Science.gov (United States)

Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

2014-08-27

State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.
The Attribute for Hydrocarbon Prediction Based on Attenuation

International Nuclear Information System (INIS)

Hermana, Maman; Harith, Z Z T; Sum, C W; Ghosh, D P

2014-01-01

Hydrocarbon prediction is a crucial issue in the oil and gas industry. Currently, the prediction of pore fluid and lithology are based on amplitude interpretation which has the potential to produce pitfalls in certain conditions of reservoir. Motivated by this fact, this work is directed to find out other attributes that can be used to reduce the pitfalls in the amplitude interpretation. Some seismic attributes were examined and studies showed that the attenuation attribute is a better attribute for hydrocarbon prediction. Theoretically, the attenuation mechanism of wave propagation is associated with the movement of fluid in the pore; hence the existence of hydrocarbon in the pore will be represented by attenuation attribute directly. In this paper we evaluated the feasibility of the quality factor ratio of P-wave and S-wave (Qp/Qs) as hydrocarbon indicator using well data and also we developed a new attribute based on attenuation for hydrocarbon prediction -- Normalized Energy Reduction Stack (NERS). To achieve these goals, this work was divided into 3 main parts; estimating the Qp/Qs on well log data, testing the new attribute in the synthetic data and applying the new attribute on real data in Malay Basin data. The result show that the Qp/Qs is better than Poisson's ratio and Lamda over Mu as hydrocarbon indicator. The curve, trend analysis and contrast of Qp/Qs is more powerful at distinguishing pore fluid than Poisson ratio and Lamda over Mu. The NERS attribute was successful in distinguishing the hydrocarbon from brine on synthetic data. Applying this attribute on real data on Malay basin, the NERS attribute is qualitatively conformable with the structure and location where the gas is predicted. The quantitative interpretation of this attribute for hydrocarbon prediction needs to be investigated further
Using the multi-objective optimization replica exchange Monte Carlo enhanced sampling method for protein-small molecule docking.

Science.gov (United States)

Wang, Hongrui; Liu, Hongwei; Cai, Leixin; Wang, Caixia; Lv, Qiang

2017-07-10

In this study, we extended the replica exchange Monte Carlo (REMC) sampling method to protein-small molecule docking conformational prediction using RosettaLigand. In contrast to the traditional Monte Carlo (MC) and REMC sampling methods, these methods use multi-objective optimization Pareto front information to facilitate the selection of replicas for exchange. The Pareto front information generated to select lower energy conformations as representative conformation structure replicas can facilitate the convergence of the available conformational space, including available near-native structures. Furthermore, our approach directly provides min-min scenario Pareto optimal solutions, as well as a hybrid of the min-min and max-min scenario Pareto optimal solutions with lower energy conformations for use as structure templates in the REMC sampling method. These methods were validated based on a thorough analysis of a benchmark data set containing 16 benchmark test cases. An in-depth comparison between MC, REMC, multi-objective optimization-REMC (MO-REMC), and hybrid MO-REMC (HMO-REMC) sampling methods was performed to illustrate the differences between the four conformational search strategies. Our findings demonstrate that the MO-REMC and HMO-REMC conformational sampling methods are powerful approaches for obtaining protein-small molecule docking conformational predictions based on the binding energy of complexes in RosettaLigand.
The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

Science.gov (United States)

Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

2014-05-01

Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the
Very extensive nonmaternal care predicts mother-infant attachment disorganization: Convergent evidence from two samples.

Science.gov (United States)

Hazen, Nancy L; Allen, Sydnye D; Christopher, Caroline Heaton; Umemura, Tomotaka; Jacobvitz, Deborah B

2015-08-01

We examined whether a maximum threshold of time spent in nonmaternal care exists, beyond which infants have an increased risk of forming a disorganized infant-mother attachment. The hours per week infants spent in nonmaternal care at 7-8 months were examined as a continuous measure and as a dichotomous threshold (over 40, 50 and 60 hr/week) to predict infant disorganization at 12-15 months. Two different samples (Austin and NICHD) were used to replicate findings and control for critical covariates: mothers' unresolved status and frightening behavior (assessed in the Austin sample, N = 125), quality of nonmaternal caregiving (assessed in the NICHD sample, N = 1,135), and family income and infant temperament (assessed in both samples). Only very extensive hours of nonmaternal care (over 60 hr/week) and mothers' frightening behavior independently predicted attachment disorganization. A polynomial logistic regression performed on the larger NICHD sample indicated that the risk of disorganized attachment exponentially increased after exceeding 60 hr/week. In addition, very extensive hours of nonmaternal care only predicted attachment disorganization after age 6 months (not prior). Findings suggest that during a sensitive period of attachment formation, infants who spend more than 60 hr/week in nonmaternal care may be at an increased risk of forming a disorganized attachment.

Prediction of Negative Conversion Days of Childhood Nephrotic Syndrome Based on the Improved Backpropagation Neural Network with Momentum

Directory of Open Access Journals (Sweden)

Yi-jun Liu

2015-12-01

Full Text Available Childhood nephrotic syndrome is a chronic disease harmful to growth of children. Scientific and accurate prediction of negative conversion days for children with nephrotic syndrome offers potential benefits for treatment of patients and helps achieve better cure effect. In this study, the improved backpropagation neural network with momentum is used for prediction. Momentum speeds up convergence and maintains the generalization performance of the neural network, and therefore overcomes weaknesses of the standard backpropagation algorithm. The three-tier network structure is constructed. Eight indicators including age, lgG, lgA and lgM, etc. are selected for network inputs. The scientific computing software of MATLAB and its neural network tools are used to create model and predict. The training sample of twenty-eight cases is used to train the neural network. The test sample of six typical cases belonging to six different age groups respectively is used to test the predictive model. The low mean absolute error of predictive results is achieved at 0.83. The experimental results of the small-size sample show that the proposed approach is to some degree applicable for the prediction of negative conversion days of childhood nephrotic syndrome.
Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures.

Science.gov (United States)

Viswanath, Shruthi; Chemmama, Ilan E; Cimermancic, Peter; Sali, Andrej

2017-12-05

Modeling of macromolecular structures involves structural sampling guided by a scoring function, resulting in an ensemble of good-scoring models. By necessity, the sampling is often stochastic, and must be exhaustive at a precision sufficient for accurate modeling and assessment of model uncertainty. Therefore, the very first step in analyzing the ensemble is an estimation of the highest precision at which the sampling is exhaustive. Here, we present an objective and automated method for this task. As a proxy for sampling exhaustiveness, we evaluate whether two independently and stochastically generated sets of models are sufficiently similar. The protocol includes testing 1) convergence of the model score, 2) whether model scores for the two samples were drawn from the same parent distribution, 3) whether each structural cluster includes models from each sample proportionally to its size, and 4) whether there is sufficient structural similarity between the two model samples in each cluster. The evaluation also provides the sampling precision, defined as the smallest clustering threshold that satisfies the third, most stringent test. We validate the protocol with the aid of enumerated good-scoring models for five illustrative cases of binary protein complexes. Passing the proposed four tests is necessary, but not sufficient for thorough sampling. The protocol is general in nature and can be applied to the stochastic sampling of any set of models, not just structural models. In addition, the tests can be used to stop stochastic sampling as soon as exhaustiveness at desired precision is reached, thereby improving sampling efficiency; they may also help in selecting a model representation that is sufficiently detailed to be informative, yet also sufficiently coarse for sampling to be exhaustive. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Comparison of Comet Enflow and VA One Acoustic-to-Structure Power Flow Predictions

Science.gov (United States)

Grosveld, Ferdinand W.; Schiller, Noah H.; Cabell, Randolph H.

2010-01-01

Comet Enflow is a commercially available, high frequency vibroacoustic analysis software based on the Energy Finite Element Analysis (EFEA). In this method the same finite element mesh used for structural and acoustic analysis can be employed for the high frequency solutions. Comet Enflow is being validated for a floor-equipped composite cylinder by comparing the EFEA vibroacoustic response predictions with Statistical Energy Analysis (SEA) results from the commercial software program VA One from ESI Group. Early in this program a number of discrepancies became apparent in the Enflow predicted response for the power flow from an acoustic space to a structural subsystem. The power flow anomalies were studied for a simple cubic, a rectangular and a cylindrical structural model connected to an acoustic cavity. The current investigation focuses on three specific discrepancies between the Comet Enflow and the VA One predictions: the Enflow power transmission coefficient relative to the VA One coupling loss factor; the importance of the accuracy of the acoustic modal density formulation used within Enflow; and the recommended use of fast solvers in Comet Enflow. The frequency region of interest for this study covers the one-third octave bands with center frequencies from 16 Hz to 4000 Hz.
Doping effect on the structural properties of Cu1-x(Ni, Zn, Al and Fe)xO samples (0

Science.gov (United States)

Amaral, J. B.; Araujo, R. M.; Pedra, P. P.; Meneses, C. T.; Duque, J. G. S.; dos S. Rezende, M. V.

2016-09-01

In this work, the effect of insertion of transition metal, TM (=Ni, Zn, Al and Fe), ions in Cu1-xTMxO samples (0samples present a monoclinic crystal system with space group C2/c and ii) for increasing the TM-doping, Ni and Zn-doped samples show a small amount of spurious phases for concentrations above x=0.05. Based on these results, a defect disorder study for using atomistic computational simulations which is based on the lattice energy minimization technique is employed to predict the location of the dopant ions in the structure. In agreement with XRD data, our computational results indicate that the trivalent (Al and Fe ions) are more favorable to be incorporated into CuO matrix than the divalent (Ni and Zn ions).
The equivalent thermal conductivity of lattice core sandwich structure: A predictive model

International Nuclear Information System (INIS)

Cheng, Xiangmeng; Wei, Kai; He, Rujie; Pei, Yongmao; Fang, Daining

2016-01-01

Highlights: • A predictive model of the equivalent thermal conductivity was established. • Both the heat conduction and radiation were considered. • The predictive results were in good agreement with experiment and FEM. • Some methods for improving the thermal protection performance were proposed. - Abstract: The equivalent thermal conductivity of lattice core sandwich structure was predicted using a novel model. The predictive results were in good agreement with experimental and Finite Element Method results. The thermal conductivity of the lattice core sandwich structure was attributed to both core conduction and radiation. The core conduction caused thermal conductivity only relied on the relative density of the structure. And the radiation caused thermal conductivity increased linearly with the thickness of the core. It was found that the equivalent thermal conductivity of the lattice core sandwich structure showed a highly dependent relationship on temperature. At low temperatures, the structure exhibited a nearly thermal insulated behavior. With the temperature increasing, the thermal conductivity of the structure increased owing to radiation. Therefore, some attempts, such as reducing the emissivity of the core or designing multilayered structure, are believe to be of benefit for improving the thermal protection performance of the structure at high temperatures.
A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data

Directory of Open Access Journals (Sweden)

Olson James M

2006-04-01

Full Text Available Abstract Background Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set, since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data. Results We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic
A sampling approach for predicting the eating quality of apples using visible-near infrared spectroscopy.

Science.gov (United States)

Martínez Vega, Mabel V; Sharifzadeh, Sara; Wulfsohn, Dvoralai; Skov, Thomas; Clemmensen, Line Harder; Toldam-Andersen, Torben B

2013-12-01

Visible-near infrared spectroscopy remains a method of increasing interest as a fast alternative for the evaluation of fruit quality. The success of the method is assumed to be achieved by using large sets of samples to produce robust calibration models. In this study we used representative samples of an early and a late season apple cultivar to evaluate model robustness (in terms of prediction ability and error) on the soluble solids content (SSC) and acidity prediction, in the wavelength range 400-1100 nm. A total of 196 middle-early season and 219 late season apples (Malus domestica Borkh.) cvs 'Aroma' and 'Holsteiner Cox' samples were used to construct spectral models for SSC and acidity. Partial least squares (PLS), ridge regression (RR) and elastic net (EN) models were used to build prediction models. Furthermore, we compared three sub-sample arrangements for forming training and test sets ('smooth fractionator', by date of measurement after harvest and random). Using the 'smooth fractionator' sampling method, fewer spectral bands (26) and elastic net resulted in improved performance for SSC models of 'Aroma' apples, with a coefficient of variation CVSSC = 13%. The model showed consistently low errors and bias (PLS/EN: R(2) cal = 0.60/0.60; SEC = 0.88/0.88°Brix; Biascal = 0.00/0.00; R(2) val = 0.33/0.44; SEP = 1.14/1.03; Biasval = 0.04/0.03). However, the prediction acidity and for SSC (CV = 5%) of the late cultivar 'Holsteiner Cox' produced inferior results as compared with 'Aroma'. It was possible to construct local SSC and acidity calibration models for early season apple cultivars with CVs of SSC and acidity around 10%. The overall model performance of these data sets also depend on the proper selection of training and test sets. The 'smooth fractionator' protocol provided an objective method for obtaining training and test sets that capture the existing variability of the fruit samples for construction of visible-NIR prediction models. The implication
Vertical structure of predictability and information transport over the Northern Hemisphere

International Nuclear Information System (INIS)

Feng Ai-Xia; Wang Qi-Gang; Gong Zhi-Qiang; Feng Guo-Lin

2014-01-01

Based on nonlinear prediction and information theory, vertical heterogeneity of predictability and information loss rate in geopotential height field are obtained over the Northern Hemisphere. On a seasonal-to-interannual time scale, the predictability is low in the lower troposphere and high in the mid-upper troposphere. However, within mid-upper troposphere over the subtropics ocean area, there is a relatively poor predictability. These conclusions also fit the seasonal time scale. Moving to the interannual time scale, the predictability becomes high in the lower troposphere and low in the mid-upper troposphere, contrary to the former case. On the whole the interannual trend is more predictable than the seasonal trend. The average information loss rate is low over the mid-east Pacific, west of North America, Atlantic and Eurasia, and the atmosphere over other places has a relatively high information loss rate on all-time scales. Two channels are found steadily over the Pacific Ocean and Atlantic Ocean in subtropics. There are also unstable channels. The four-season influence on predictability and information communication are studied. The predictability is low, no matter which season data are removed and each season plays an important role in the existence of the channels, except for the winter. The predictability and teleconnections are paramount issues in atmospheric science, and the teleconnections may be established by communication channels. So, this work is interesting since it reveals the vertical structure of predictability distribution, channel locations, and the contributions of different time scales to them and their variations under different seasons. (geophysics, astronomy, and astrophysics)
Prediction of highly expressed genes in microbes based on chromatin accessibility

Directory of Open Access Journals (Sweden)

Ussery David W

2007-02-01

Full Text Available Abstract Background It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. Results We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. Conclusion This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches.
Predicting the Resiliency in Parents with Exceptional Children Based on Their Mindfulness

Science.gov (United States)

Jabbari, Sosan; Firoozabadi, Somayeh Sadati; Rostami, Sedighe

2016-01-01

The purpose of the present study was to predict the resiliency in parents with exceptional children based on their mindfulness. This descriptive correlational study was performed on 260 parents of student (105 male and 159 female) that were selected by cluster sampling method. Family resiliency questionnaire (Sickby, 2005) and five aspect…
Combining neural networks for protein secondary structure prediction

DEFF Research Database (Denmark)

Riis, Søren Kamaric

1995-01-01

In this paper structured neural networks are applied to the problem of predicting the secondary structure of proteins. A hierarchical approach is used where specialized neural networks are designed for each structural class and then combined using another neural network. The submodels are designed...... by using a priori knowledge of the mapping between protein building blocks and the secondary structure and by using weight sharing. Since none of the individual networks have more than 600 adjustable weights over-fitting is avoided. When ensembles of specialized experts are combined the performance...
Inferring microRNA regulation of mRNA with partially ordered samples of paired expression data and exogenous prediction algorithms.

Directory of Open Access Journals (Sweden)

Brian Godsey

Full Text Available MicroRNAs (miRs are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging, there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.
Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

Science.gov (United States)

Abbass, Jad; Nebel, Jean-Christophe

2017-01-01

Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Sampling and Analysis Plan for Release of the 105-C Below-Grade Structures and Underlying Soils

International Nuclear Information System (INIS)

Bauer, R.G.

1998-02-01

This sampling and analysis plan (SAP) presents the rationale and strategies for the sampling, field measurements, and analyses of the below-grade concrete structures from the Hanford 105-C Reactor Building and the underlying soils consistent with the land-use assumptions in the Record of Decision for the U.S. Department of Energy 100-BC-1, 100-DR-1, and 100-HR-1 Operable Units (ROD) (EPA 1995). This structure is one of nine surplus production reactors in the 100 Area of the Hanford Site. This SAP is based on the data quality objectives developed for the 105-C below-grade structures and underlying soils (BHI 1997)
MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

Science.gov (United States)

Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

2018-03-10

Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Predicting Patient-specific Dosimetric Benefits of Proton Therapy for Skull-base Tumors Using a Geometric Knowledge-based Method

Energy Technology Data Exchange (ETDEWEB)

Hall, David C.; Trofimov, Alexei V.; Winey, Brian A.; Liebsch, Norbert J.; Paganetti, Harald, E-mail: hpaganetti@mgh.harvard.edu

2017-04-01

Purpose: To predict the organ at risk (OAR) dose levels achievable with proton beam therapy (PBT), solely based on the geometric arrangement of the target volume in relation to the OARs. A comparison with an alternative therapy yields a prediction of the patient-specific benefits offered by PBT. This could enable physicians at hospitals without proton capabilities to make a better-informed referral decision or aid patient selection in model-based clinical trials. Methods and Materials: Skull-base tumors were chosen to test the method, owing to their geometric complexity and multitude of nearby OARs. By exploiting the correlations between the dose and distance-to-target in existing PBT plans, the models were independently trained for 6 types of OARs: brainstem, cochlea, optic chiasm, optic nerve, parotid gland, and spinal cord. Once trained, the models could estimate the feasible dose–volume histogram and generalized equivalent uniform dose (gEUD) for OAR structures of new patients. The models were trained using 20 patients and validated using an additional 21 patients. Validation was achieved by comparing the predicted gEUD to that of the actual PBT plan. Results: The predicted and planned gEUD were in good agreement. Considering all OARs, the prediction error was +1.4 ± 5.1 Gy (mean ± standard deviation), and Pearson's correlation coefficient was 93%. By comparing with an intensity modulated photon treatment plan, the model could classify whether an OAR structure would experience a gain, with a sensitivity of 93% (95% confidence interval: 87%-97%) and specificity of 63% (95% confidence interval: 38%-84%). Conclusions: We trained and validated models that could quickly and accurately predict the patient-specific benefits of PBT for skull-base tumors. Similar models could be developed for other tumor sites. Such models will be useful when an estimation of the feasible benefits of PBT is desired but the experience and/or resources required for treatment
The prediction of airborne and structure-borne noise potential for a tire

Science.gov (United States)

Sakamoto, Nicholas Y.

Tire/pavement interaction noise is a major component of both exterior pass-by noise and vehicle interior noise. The current testing methods for ranking tires from loud to quiet require expensive equipment, multiple tires, and/or long experimental set-up and run times. If a laboratory based off-vehicle test could be used to identify the airborne and structure-borne potential of a tire from its dynamic characteristics, a relative ranking of a large group of tires could be performed at relatively modest expense. This would provide a smaller sample set of tires for follow-up testing and thus save expense for automobile OEMs. The focus of this research was identifying key noise features from a tire/pavement experiment. These results were compared against a stationary tire test in which the natural response of the tire to a forced input was measured. Since speed was identified as having some effect on the noise, an input function was also developed to allow the tires to be ranked at an appropriate speed. A relative noise model was used on a second sample set of tires to verify if the ranking could be used against interior vehicle measurements. While overall level analysis of the specified spectrum had mixed success, important noise generating features were identified, and the methods used could be improved to develop a standard off-vehicle test to predict a tire's noise potential.
Predicting welding distortion in a panel structure with longitudinal stiffeners using inherent deformations obtained by inverse analysis method.

Science.gov (United States)

Liang, Wei; Murakawa, Hidekazu

2014-01-01

Welding-induced deformation not only negatively affects dimension accuracy but also degrades the performance of product. If welding deformation can be accurately predicted beforehand, the predictions will be helpful for finding effective methods to improve manufacturing accuracy. Till now, there are two kinds of finite element method (FEM) which can be used to simulate welding deformation. One is the thermal elastic plastic FEM and the other is elastic FEM based on inherent strain theory. The former only can be used to calculate welding deformation for small or medium scale welded structures due to the limitation of computing speed. On the other hand, the latter is an effective method to estimate the total welding distortion for large and complex welded structures even though it neglects the detailed welding process. When the elastic FEM is used to calculate the welding-induced deformation for a large structure, the inherent deformations in each typical joint should be obtained beforehand. In this paper, a new method based on inverse analysis was proposed to obtain the inherent deformations for weld joints. Through introducing the inherent deformations obtained by the proposed method into the elastic FEM based on inherent strain theory, we predicted the welding deformation of a panel structure with two longitudinal stiffeners. In addition, experiments were carried out to verify the simulation results.
Image-Based Visual Servoing for Manipulation Via Predictive Control – A Survey of Some Results

Directory of Open Access Journals (Sweden)

Corneliu Lazăr

2016-09-01

Full Text Available In this paper, a review of predictive control algorithms developed by the authors for visual servoing of robots in manipulation applications is presented. Using these algorithms, a control predictive framework was created for image-based visual servoing (IBVS systems. Firstly, considering the point features, in the year 2008 we introduced an internal model predictor based on the interaction matrix. Secondly, distinctly from the set-point trajectory, we introduced in 2011 the reference trajectory using the concept from predictive control. Finally, minimizing a sum of squares of predicted errors, the optimal input trajectory was obtained. The new concept of predictive control for IBVS systems was employed to develop a cascade structure for motion control of robot arms. Simulation results obtained with a simulator for predictive IBVS systems are also presented.
A Bayesian Spatial Model to Predict Disease Status Using Imaging Data From Various Modalities

Directory of Open Access Journals (Sweden)

Wenqiong Xue

2018-03-01

Full Text Available Relating disease status to imaging data stands to increase the clinical significance of neuroimaging studies. Many neurological and psychiatric disorders involve complex, systems-level alterations that manifest in functional and structural properties of the brain and possibly other clinical and biologic measures. We propose a Bayesian hierarchical model to predict disease status, which is able to incorporate information from both functional and structural brain imaging scans. We consider a two-stage whole brain parcellation, partitioning the brain into 282 subregions, and our model accounts for correlations between voxels from different brain regions defined by the parcellations. Our approach models the imaging data and uses posterior predictive probabilities to perform prediction. The estimates of our model parameters are based on samples drawn from the joint posterior distribution using Markov Chain Monte Carlo (MCMC methods. We evaluate our method by examining the prediction accuracy rates based on leave-one-out cross validation, and we employ an importance sampling strategy to reduce the computation time. We conduct both whole-brain and voxel-level prediction and identify the brain regions that are highly associated with the disease based on the voxel-level prediction results. We apply our model to multimodal brain imaging data from a study of Parkinson's disease. We achieve extremely high accuracy, in general, and our model identifies key regions contributing to accurate prediction including caudate, putamen, and fusiform gyrus as well as several sensory system regions.

RaptorX-Property: a web server for protein structure property prediction.

Science.gov (United States)

Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

2016-07-08

RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Comparing methodologies for structural identification and fatigue life prediction of a highway bridge

OpenAIRE

Pai, Sai Ganesh Sarvotham; Nussbaumer, Alain; Smith, Ian F. C.

2018-01-01

Accurate measurement-data interpretation leads to increased understanding of structural behavior and enhanced asset-management decision making. In this paper, four data-interpretation methodologies, residual minimization, traditional Bayesian model updating, modified Bayesian model updating (with an L∞-norm-based Gaussian likelihood function), and error-domain model falsification (EDMF), a method that rejects models that have unlikely differences between predictions and measurements, are comp...
Sediment bacterial community structures and their predicted functions implied the impacts from natural processes and anthropogenic activities in coastal area.

Science.gov (United States)

Su, Zhiguo; Dai, Tianjiao; Tang, Yushi; Tao, Yile; Huang, Bei; Mu, Qinglin; Wen, Donghui

2018-06-01

Coastal ecosystem structures and functions are changing under natural and anthropogenic influences. In this study, surface sediment samples were collected from disturbed zone (DZ), near estuary zone (NEZ), and far estuary zone (FEZ) of Hangzhou Bay, one of the most seriously polluted bays in China. The bacterial community structures and predicted functions varied significantly in different zones. Firmicutes were found most abundantly in DZ, highlighting the impacts of anthropogenic activities. Sediment total phosphorus was most influential on the bacterial community structures. Predicted by PICRUSt analysis, DZ significantly exceeded FEZ and NEZ in the subcategory of Xenobiotics Biodegradation and Metabolism; and DZ enriched all the nitrate reduction related genes, except nrfA gene. Seawater salinity and inorganic nitrogen, respectively as the representative natural and anthropogenic factor, performed exact-oppositely in nitrogen metabolism functions. The changes of bacterial community compositions and predicted functions provide a new insight into human-induced pollution impacts on coastal ecosystem. Copyright © 2018 Elsevier Ltd. All rights reserved.
Prediction based Greedy Perimeter Stateless Routing Protocol for Vehicular Self-organizing Network

Science.gov (United States)

Wang, Chunlin; Fan, Quanrun; Chen, Xiaolin; Xu, Wanjin

2018-03-01

PGPSR (Prediction based Greedy Perimeter Stateless Routing) is based on and extended the GPSR protocol to adapt to the high speed mobility of the vehicle auto organization network (VANET) and the changes in the network topology. GPSR is used in the VANET network environment, the network loss rate and throughput are not ideal, even cannot work. Aiming at the problems of the GPSR, the proposed PGPSR routing protocol, it redefines the hello and query packet structure, in the structure of the new node speed and direction information, which received the next update before you can take advantage of its speed and direction to predict the position of node and new network topology, select the right the next hop routing and path. Secondly, the update of the outdated node information of the neighbor’s table is deleted in time. The simulation experiment shows the performance of PGPSR is better than that of GPSR.
GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING

Directory of Open Access Journals (Sweden)

Christopher Ouma Onyango

2010-09-01

Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

Science.gov (United States)

Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

2017-06-06

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Crystal engineering of ibuprofen compounds: From molecule to crystal structure to morphology prediction by computational simulation and experimental study

Science.gov (United States)

Zhang, Min; Liang, Zuozhong; Wu, Fei; Chen, Jian-Feng; Xue, Chunyu; Zhao, Hong

2017-06-01

We selected the crystal structures of ibuprofen with seven common space groups (Cc, P21/c, P212121, P21, Pbca, Pna21, and Pbcn), which was generated from ibuprofen molecule by molecular simulation. The predicted crystal structures of ibuprofen with space group P21/c has the lowest total energy and the largest density, which is nearly indistinguishable with experimental result. In addition, the XRD patterns for predicted crystal structure are highly consistent with recrystallization from solvent of ibuprofen. That indicates that the simulation can accurately predict the crystal structure of ibuprofen from the molecule. Furthermore, based on this crystal structure, we predicted the crystal habit in vacuum using the attachment energy (AE) method and considered solvent effects in a systematic way using the modified attachment energy (MAE) model. The simulation can accurately construct a complete process from molecule to crystal structure to morphology prediction. Experimentally, we observed crystal morphologies in four different polarity solvents compounds (ethanol, acetonitrile, ethyl acetate, and toluene). We found that the aspect ratio decreases of crystal habits in this ibuprofen system were found to vary with increasing solvent relative polarity. Besides, the modified crystal morphologies are in good agreement with the observed experimental morphologies. Finally, this work may guide computer-aided design of the desirable crystal morphology.
Fragment-based quantitative structure-activity relationship (FB-QSAR) for fragment-based drug design.

Science.gov (United States)

Du, Qi-Shi; Huang, Ri-Bo; Wei, Yu-Tuo; Pang, Zong-Wen; Du, Li-Qin; Chou, Kuo-Chen

2009-01-30

In cooperation with the fragment-based design a new drug design method, the so-called "fragment-based quantitative structure-activity relationship" (FB-QSAR) is proposed. The essence of the new method is that the molecular framework in a family of drug candidates are divided into several fragments according to their substitutes being investigated. The bioactivities of molecules are correlated with the physicochemical properties of the molecular fragments through two sets of coefficients in the linear free energy equations. One coefficient set is for the physicochemical properties and the other for the weight factors of the molecular fragments. Meanwhile, an iterative double least square (IDLS) technique is developed to solve the two sets of coefficients in a training data set alternately and iteratively. The IDLS technique is a feedback procedure with machine learning ability. The standard Two-dimensional quantitative structure-activity relationship (2D-QSAR) is a special case, in the FB-QSAR, when the whole molecule is treated as one entity. The FB-QSAR approach can remarkably enhance the predictive power and provide more structural insights into rational drug design. As an example, the FB-QSAR is applied to build a predictive model of neuraminidase inhibitors for drug development against H5N1 influenza virus. (c) 2008 Wiley Periodicals, Inc.
Short-Term Power Load Point Prediction Based on the Sharp Degree and Chaotic RBF Neural Network

Directory of Open Access Journals (Sweden)

Dongxiao Niu

2015-01-01

Full Text Available In order to realize the predicting and positioning of short-term load inflection point, this paper made reference to related research in the field of computer image recognition. It got a load sharp degree sequence by the transformation of the original load sequence based on the algorithm of sharp degree. Then this paper designed a forecasting model based on the chaos theory and RBF neural network. It predicted the load sharp degree sequence based on the forecasting model to realize the positioning of short-term load inflection point. Finally, in the empirical example analysis, this paper predicted the daily load point of a region using the actual load data of the certain region to verify the effectiveness and applicability of this method. Prediction results showed that most of the test sample load points could be accurately predicted.
Manual for the prediction of blast and fragment loadings on structures

Energy Technology Data Exchange (ETDEWEB)

1980-11-01

The purpose of this manual is to provide Architect-Engineer (AE) firms guidance for the prediction of air blast, ground shock and fragment loadings on structures as a result of accidental explosions in or near these structures. Information in this manual is the result of an extensive literature survey and data gathering effort, supplemented by some original analytical studies on various aspects of blast phenomena. Many prediction equations and graphs are presented, accompanied by numerous example problems illustrating their use. The manual is complementary to existing structural design manuals and is intended to reflect the current state-of-the-art in prediction of blast and fragment loads for accidental explosions of high explosives at the Pantex Plant. In some instances, particularly for explosions within blast-resistant structures of complex geometry, rational estimation of these loads is beyond the current state-of-the-art.
Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

Science.gov (United States)

Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

2012-01-01

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Deficits of entropy modulation in schizophrenia are predicted by functional connectivity strength in the theta band and structural clustering.

Science.gov (United States)

Gomez-Pilar, Javier; de Luis-García, Rodrigo; Lubeiro, Alba; de Uribe, Nieves; Poza, Jesús; Núñez, Pablo; Ayuso, Marta; Hornero, Roberto; Molina, Vicente

2018-01-01

Spectral entropy (SE) allows comparing task-related modulation of electroencephalogram (EEG) between patients and controls, i.e. spectral changes of the EEG associated to task performance. A SE modulation deficit has been replicated in different schizophrenia samples. To investigate the underpinnings of SE modulation deficits in schizophrenia, we applied graph-theory to EEG recordings during a P300 task and fractional anisotropy (FA) data from diffusion tensor imaging in 48 patients (23 first episodes) and 87 healthy controls. Functional connectivity was assessed from phase-locking values among sensors in the theta band, and structural connectivity was based on FA values for the tracts connecting pairs of regions. From those data, averaged clustering coefficient (CLC), characteristic path-length (PL) and connectivity strength (CS, also known as density) were calculated for both functional and structural networks. The corresponding functional modulation values were calculated as the difference in SE and CLC, PL and CS between the pre-stimulus and response windows during the task. The results revealed a higher functional CS in the pre-stimulus window in patients, predictive of smaller modulation of SE in this group. The amount of increase in theta CS from pre-stimulus to response related to SE modulation in patients and controls. Structural CLC was associated with SE modulation in the patients. SE modulation was predictive of negative symptoms, whereas CLC and PL modulation was associated with cognitive performance in the patients. These results support that a hyperactive functional connectivity and/or structural connective deficits in the patients hamper the dynamical modulation of connectivity underlying cognition.
Automatic prediction of facial trait judgments: appearance vs. structural models.

Directory of Open Access Journals (Sweden)

Mario Rojas

Full Text Available Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a derive a facial trait judgment model from training data and b predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations and classification rules (4 rules suggest that a prediction of perception of facial traits is learnable by both holistic and structural approaches; b the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

Science.gov (United States)

Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

2007-12-01

Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and
Investigations of Structural Requirements for BRD4 Inhibitors through Ligand- and Structure-Based 3D QSAR Approaches

Directory of Open Access Journals (Sweden)

Adeena Tahir

2018-06-01

Full Text Available The bromodomain containing protein 4 (BRD4 recognizes acetylated histone proteins and plays numerous roles in the progression of a wide range of cancers, due to which it is under intense investigation as a novel anti-cancer drug target. In the present study, we performed three-dimensional quantitative structure activity relationship (3D-QSAR molecular modeling on a series of 60 inhibitors of BRD4 protein using ligand- and structure-based alignment and different partial charges assignment methods by employing comparative molecular field analysis (CoMFA and comparative molecular similarity indices analysis (CoMSIA approaches. The developed models were validated using various statistical methods, including non-cross validated correlation coefficient (r2, leave-one-out (LOO cross validated correlation coefficient (q2, bootstrapping, and Fisher’s randomization test. The highly reliable and predictive CoMFA (q2 = 0.569, r2 = 0.979 and CoMSIA (q2 = 0.500, r2 = 0.982 models were obtained from a structure-based 3D-QSAR approach using Merck molecular force field (MMFF94. The best models demonstrate that electrostatic and steric fields play an important role in the biological activities of these compounds. Hence, based on the contour maps information, new compounds were designed, and their binding modes were elucidated in BRD4 protein’s active site. Further, the activities and physicochemical properties of the designed molecules were also predicted using the best 3D-QSAR models. We believe that predicted models will help us to understand the structural requirements of BRD4 protein inhibitors that belong to quinolinone and quinazolinone classes for the designing of better active compounds.
Real-time prediction of respiratory motion based on local regression methods

International Nuclear Information System (INIS)

Ruan, D; Fessler, J A; Balter, J M

2007-01-01

Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
ELISA-based assay for IP-10 detection from filter paper samples

DEFF Research Database (Denmark)

Drabe, Camilla Heldbjerg; Blauenfeldt, Thomas; Ruhwald, Morten

2014-01-01

IP-10 is a small pro-inflammatory chemokine secreted primarily from monocytes and fibroblasts. Alterations in IP-10 levels have been associated with inflammatory conditions including viral and bacterial infections, immune dysfunction, and tumor development. IP-10 is increasingly recognized as a b...... as a biomarker that predicts severity of various diseases and can be used in the immunodiagnostics of Mycobacterium tuberculosis and cytomegalovirus infection. Here, we describe an ELISA-based method to detect IP-10 from dried blood and plasma spot samples....
Development of laboratory acceleration test method for service life prediction of concrete structures

International Nuclear Information System (INIS)

Cho, M. S.; Song, Y. C.; Bang, K. S.; Lee, J. S.; Kim, D. K.

1999-01-01

Service life prediction of nuclear power plants depends on the application of history of structures, field inspection and test, the development of laboratory acceleration tests, their analysis method and predictive model. In this study, laboratory acceleration test method for service life prediction of concrete structures and application of experimental test results are introduced. This study is concerned with environmental condition of concrete structures and is to develop the acceleration test method for durability factors of concrete structures e.g. carbonation, sulfate attack, freeze-thaw cycles and shrinkage-expansion etc
Synthesis, characterization and nitrite ion sensing performance of reclaimable composite samples through a core-shell structure

Science.gov (United States)

Cui, Xiao; Yuqing, Zhao; Cui, Jiantao; Zheng, Qian; Bo, Wang

2018-02-01

The following paper reported and discussed a nitrite ion optical sensing platform based on a core-shell structure, using superamagnetic nanoparticles as the core, a silica molecular sieve MCM-41 as the shell and two rhodamine derivatives as probe, respectively. This superamagnetic core made this sensing platform reclaimable after finishing nitrite ion sensing procedure. This sensing platform was carefully characterized by means of electron microscopy images, porous structure analysis, magnetic response, IR spectra and thermal stability analysis. Detailed analysis suggested that the emission of these composite samples was quenchable by nitrite ion, showing emission turn off effect. A static sensing mechanism based on an additive reaction between chemosensors and nitrite ion was proposed. These composite samples followed Demas quenching equation against different nitrite ion concentrations. Limit of detection value was obtained as low as 0.4 μM. It was found that, after being quenched by nitrite ion, these composite samples could be reclaimed and recovered by sulphamic acid, confirming their recyclability.
RNA Secondary Structure Prediction by Using Discrete Mathematics: An Interdisciplinary Research Experience for Undergraduate Students

Science.gov (United States)

Ellington, Roni; Wachira, James

2010-01-01

The focus of this Research Experience for Undergraduates (REU) project was on RNA secondary structure prediction by using a lattice walk approach. The lattice walk approach is a combinatorial and computational biology method used to enumerate possible secondary structures and predict RNA secondary structure from RNA sequences. The method uses discrete mathematical techniques and identifies specified base pairs as parameters. The goal of the REU was to introduce upper-level undergraduate students to the principles and challenges of interdisciplinary research in molecular biology and discrete mathematics. At the beginning of the project, students from the biology and mathematics departments of a mid-sized university received instruction on the role of secondary structure in the function of eukaryotic RNAs and RNA viruses, RNA related to combinatorics, and the National Center for Biotechnology Information resources. The student research projects focused on RNA secondary structure prediction on a regulatory region of the yellow fever virus RNA genome and on an untranslated region of an mRNA of a gene associated with the neurological disorder epilepsy. At the end of the project, the REU students gave poster and oral presentations, and they submitted written final project reports to the program director. The outcome of the REU was that the students gained transferable knowledge and skills in bioinformatics and an awareness of the applications of discrete mathematics to biological research problems. PMID:20810968

Some links on this page may take you to non-federal websites. Their policies may differ from this site.