WorldWideScience

Sample records for search-based structured prediction

  1. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    Science.gov (United States)

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-10-01

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  2. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction.

    Science.gov (United States)

    de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M

    2018-04-01

    Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  3. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    Science.gov (United States)

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  4. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Chira Camelia

    2011-07-01

    Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  5. Pep-3D-Search: a method for B-cell epitope prediction based on mimotope analysis.

    Science.gov (United States)

    Huang, Yan Xin; Bao, Yong Li; Guo, Shu Yan; Wang, Yan; Zhou, Chun Guang; Li, Yu Xin

    2008-12-16

    The prediction of conformational B-cell epitopes is one of the most important goals in immunoinformatics. The solution to this problem, even if approximate, would help in designing experiments to precisely map the residues of interaction between an antigen and an antibody. Consequently, this area of research has received considerable attention from immunologists, structural biologists and computational biologists. Phage-displayed random peptide libraries are powerful tools used to obtain mimotopes that are selected by binding to a given monoclonal antibody (mAb) in a similar way to the native epitope. These mimotopes can be considered as functional epitope mimics. Mimotope analysis based methods can predict not only linear but also conformational epitopes and this has been the focus of much research in recent years. Though some algorithms based on mimotope analysis have been proposed, the precise localization of the interaction site mimicked by the mimotopes is still a challenging task. In this study, we propose a method for B-cell epitope prediction based on mimotope analysis called Pep-3D-Search. Given the 3D structure of an antigen and a set of mimotopes (or a motif sequence derived from the set of mimotopes), Pep-3D-Search can be used in two modes: mimotope or motif. To evaluate the performance of Pep-3D-Search to predict epitopes from a set of mimotopes, 10 epitopes defined by crystallography were compared with the predicted results from a Pep-3D-Search: the average Matthews correlation coefficient (MCC), sensitivity and precision were 0.1758, 0.3642 and 0.6948. Compared with other available prediction algorithms, Pep-3D-Search showed comparable MCC, specificity and precision, and could provide novel, rational results. To verify the capability of Pep-3D-Search to align a motif sequence to a 3D structure for predicting epitopes, 6 test cases were used. The predictive performance of Pep-3D-Search was demonstrated to be superior to that of other similar programs

  6. IRSS: a web-based tool for automatic layout and analysis of IRES secondary structure prediction and searching system in silico

    Directory of Open Access Journals (Sweden)

    Hong Jun-Jie

    2009-05-01

    Full Text Available Abstract Background Internal ribosomal entry sites (IRESs provide alternative, cap-independent translation initiation sites in eukaryotic cells. IRES elements are important factors in viral genomes and are also useful tools for bi-cistronic expression vectors. Most existing RNA structure prediction programs are unable to deal with IRES elements. Results We designed an IRES search system, named IRSS, to obtain better results for IRES prediction. RNA secondary structure prediction and comparison software programs were implemented to construct our two-stage strategy for the IRSS. Two software programs formed the backbone of IRSS: the RNAL fold program, used to predict local RNA secondary structures by minimum free energy method; and the RNA Align program, used to compare predicted structures. After complete viral genome database search, the IRSS have low error rate and up to 72.3% sensitivity in appropriated parameters. Conclusion IRSS is freely available at this website http://140.135.61.9/ires/. In addition, all source codes, precompiled binaries, examples and documentations are downloadable for local execution. This new search approach for IRES elements will provide a useful research tool on IRES related studies.

  7. Snippet-based relevance predictions for federated web search

    NARCIS (Netherlands)

    Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd

    How well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments reported in this paper make use of result snippets and

  8. Search-based model identification of smart-structure damage

    Science.gov (United States)

    Glass, B. J.; Macalou, A.

    1991-01-01

    This paper describes the use of a combined model and parameter identification approach, based on modal analysis and artificial intelligence (AI) techniques, for identifying damage or flaws in a rotating truss structure incorporating embedded piezoceramic sensors. This smart structure example is representative of a class of structures commonly found in aerospace systems and next generation space structures. Artificial intelligence techniques of classification, heuristic search, and an object-oriented knowledge base are used in an AI-based model identification approach. A finite model space is classified into a search tree, over which a variant of best-first search is used to identify the model whose stored response most closely matches that of the input. Newly-encountered models can be incorporated into the model space. This adaptativeness demonstrates the potential for learning control. Following this output-error model identification, numerical parameter identification is used to further refine the identified model. Given the rotating truss example in this paper, noisy data corresponding to various damage configurations are input to both this approach and a conventional parameter identification method. The combination of the AI-based model identification with parameter identification is shown to lead to smaller parameter corrections than required by the use of parameter identification alone.

  9. Ensemble-based prediction of RNA secondary structures.

    Science.gov (United States)

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between

  10. A probabilistic fragment-based protein structure prediction algorithm.

    Directory of Open Access Journals (Sweden)

    David Simoncini

    Full Text Available Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].

  11. SA-Search: a web tool for protein structure mining based on a Structural Alphabet.

    Science.gov (United States)

    Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre

    2004-07-01

    SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search.

  12. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  13. Computational prediction of muon stopping sites using ab initio random structure searching (AIRSS)

    Science.gov (United States)

    Liborio, Leandro; Sturniolo, Simone; Jochym, Dominik

    2018-04-01

    The stopping site of the muon in a muon-spin relaxation experiment is in general unknown. There are some techniques that can be used to guess the muon stopping site, but they often rely on approximations and are not generally applicable to all cases. In this work, we propose a purely theoretical method to predict muon stopping sites in crystalline materials from first principles. The method is based on a combination of ab initio calculations, random structure searching, and machine learning, and it has successfully predicted the MuT and MuBC stopping sites of muonium in Si, diamond, and Ge, as well as the muonium stopping site in LiF, without any recourse to experimental results. The method makes use of Soprano, a Python library developed to aid ab initio computational crystallography, that was publicly released and contains all the software tools necessary to reproduce our analysis.

  14. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    -trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...

  15. Structure prediction of nanoclusters; a direct or a pre-screened search on the DFT energy landscape?

    Science.gov (United States)

    Farrow, M R; Chow, Y; Woodley, S M

    2014-10-21

    The atomic structure of inorganic nanoclusters obtained via a search for low lying minima on energy landscapes, or hypersurfaces, is reported for inorganic binary compounds: zinc oxide (ZnO)n, magnesium oxide (MgO)n, cadmium selenide (CdSe)n, and potassium fluoride (KF)n, where n = 1-12 formula units. The computational cost of each search is dominated by the effort to evaluate each sample point on the energy landscape and the number of required sample points. The effect of changing the balance between these two factors on the success of the search is investigated. The choice of sample points will also affect the number of required data points and therefore the efficiency of the search. Monte Carlo based global optimisation routines (evolutionary and stochastic quenching algorithms) within a new software package, viz. Knowledge Led Master Code (KLMC), are employed to search both directly and after pre-screening on the DFT energy landscape. Pre-screening includes structural relaxation to minimise a cheaper energy function - based on interatomic potentials - and is found to improve significantly the search efficiency, and typically reduces the number of DFT calculations required to locate the local minima by more than an order of magnitude. Although the choice of functional form is important, the approach is robust to small changes to the interatomic potential parameters. The computational cost of initial DFT calculations of each structure is reduced by employing Gaussian smearing to the electronic energy levels. Larger (KF)n nanoclusters are predicted to form cuboid cuts from the rock-salt phase, but also share many structural motifs with (MgO)n for smaller clusters. The transition from 2D rings to 3D (bubble, or fullerene-like) structures occur at a larger cluster size for (ZnO)n and (CdSe)n. Differences between the HOMO and LUMO energies, for all the compounds apart from KF, are in the visible region of the optical spectrum (2-3 eV); KF lies deep in the UV region

  16. SA-Search: a web tool for protein structure mining based on a Structural Alphabet

    OpenAIRE

    Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre

    2004-01-01

    SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of f...

  17. Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

    Science.gov (United States)

    Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

    2005-01-01

    Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.

  18. Searching for an Accurate Marker-Based Prediction of an Individual Quantitative Trait in Molecular Plant Breeding.

    Science.gov (United States)

    Fu, Yong-Bi; Yang, Mo-Hua; Zeng, Fangqin; Biligetu, Bill

    2017-01-01

    Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding.

  19. Predicting the performance of fingerprint similarity searching.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2011-01-01

    Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

  20. Searching for an Accurate Marker-Based Prediction of an Individual Quantitative Trait in Molecular Plant Breeding

    Directory of Open Access Journals (Sweden)

    Yong-Bi Fu

    2017-07-01

    Full Text Available Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding.

  1. Searching for an Accurate Marker-Based Prediction of an Individual Quantitative Trait in Molecular Plant Breeding

    Science.gov (United States)

    Fu, Yong-Bi; Yang, Mo-Hua; Zeng, Fangqin; Biligetu, Bill

    2017-01-01

    Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding. PMID:28729875

  2. A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.

    Science.gov (United States)

    Edvardsson, Sverker; Gardner, Paul P; Poole, Anthony M; Hendy, Michael D; Penny, David; Moulton, Vincent

    2003-05-01

    Noncoding RNA genes produce functional RNA molecules rather than coding for proteins. One such family is the H/ACA snoRNAs. Unlike the related C/D snoRNAs these have resisted automated detection to date. We develop an algorithm to screen the yeast genome for novel H/ACA snoRNAs. To achieve this, we introduce some new methods for facilitating the search for noncoding RNAs in genomic sequences which are based on properties of predicted minimum free-energy (MFE) secondary structures. The algorithm has been implemented and can be generalized to enable screening of other eukaryote genomes. We find that use of primary sequence alone is insufficient for identifying novel H/ACA snoRNAs. Only the use of secondary structure filters reduces the number of candidates to a manageable size. From genomic context, we identify three strong H/ACA snoRNA candidates. These together with a further 47 candidates obtained by our analysis are being experimentally screened.

  3. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  4. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space.

    Science.gov (United States)

    Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng

    2016-01-01

    To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more

  5. Feature-Based and String-Based Models for Predicting RNA-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Donald Adjeroh

    2018-03-01

    Full Text Available In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI. In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences, and structure information (protein and RNA secondary structures. This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.

  6. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    Science.gov (United States)

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on

  7. The Search Performance Evaluation and Prediction in Exploratory Search

    OpenAIRE

    LIU, FEI

    2016-01-01

    The exploratory search for complex search tasks requires an effective search behavior model to evaluate and predict user search performance. Few studies have investigated the relationship between user search behavior and search performance in exploratory search. This research adopts a mixed approach combining search system development, user search experiment, search query log analysis, and multivariate regression analysis to resolve the knowledge gap. Through this study, it is shown that expl...

  8. RNA secondary structure prediction using soft computing.

    Science.gov (United States)

    Ray, Shubhra Sankar; Pal, Sankar K

    2013-01-01

    Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.

  9. The multi-copy simultaneous search methodology: a fundamental tool for structure-based drug design.

    Science.gov (United States)

    Schubert, Christian R; Stultz, Collin M

    2009-08-01

    Fragment-based ligand design approaches, such as the multi-copy simultaneous search (MCSS) methodology, have proven to be useful tools in the search for novel therapeutic compounds that bind pre-specified targets of known structure. MCSS offers a variety of advantages over more traditional high-throughput screening methods, and has been applied successfully to challenging targets. The methodology is quite general and can be used to construct functionality maps for proteins, DNA, and RNA. In this review, we describe the main aspects of the MCSS method and outline the general use of the methodology as a fundamental tool to guide the design of de novo lead compounds. We focus our discussion on the evaluation of MCSS results and the incorporation of protein flexibility into the methodology. In addition, we demonstrate on several specific examples how the information arising from the MCSS functionality maps has been successfully used to predict ligand binding to protein targets and RNA.

  10. Blind Test of Physics-Based Prediction of Protein Structures

    Science.gov (United States)

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  11. Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids.

    Science.gov (United States)

    Raicar, Gaurav; Saini, Harsh; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok

    2016-08-07

    Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Predicting consumer behavior with Web search.

    Science.gov (United States)

    Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J

    2010-10-12

    Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

  13. Knowledge base and neural network approach for protein secondary structure prediction.

    Science.gov (United States)

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Protein structure database search and evolutionary classification.

    Science.gov (United States)

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  15. The experimental search for new predicted binary-alloy structures

    Science.gov (United States)

    Erb, K. C.; Richey, Lauren; Lang, Candace; Campbell, Branton; Hart, Gus

    2010-10-01

    Predicting new ordered phases in metallic alloys is a productive line of inquiry because configurational ordering in an alloy can dramatically alter their useful material properties. One is able to infer the existence of an ordered phase in an alloy using first-principles calculated formation enthalpies.ootnotetextG. L. W. Hart, ``Where are Nature's missing structures?,'' Nature Materials 6 941-945 2007 Using this approach, we have been able to identify stable (i.e. lowest energy) orderings in a variety of binary metallic alloys. Many of these phases have been observed experimentally in the past, though others have not. In pursuit of several of the missing structures, we have characterized potential orderings in PtCd, PtPd and PtMo alloys using synchrotron x-ray powder diffraction and symmetry-analysis tools.ootnotetextB. J. Campbell, H. T. Stokes, D. E. Tanner, and D. M. Hatch, ``ISODISPLACE: a web-based tool for exploring structural distortions,'' J. Appl. Cryst. 39, 607-614 (2006)

  16. Improved hybrid optimization algorithm for 3D protein structure prediction.

    Science.gov (United States)

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  17. Knowledge-based Fragment Binding Prediction

    Science.gov (United States)

    Tang, Grace W.; Altman, Russ B.

    2014-01-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971

  18. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  19. Distance matrix-based approach to protein structure prediction.

    Science.gov (United States)

    Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

    2009-03-01

    Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the

  20. Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

    Science.gov (United States)

    Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S

    2015-11-13

    The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science

  1. Critical Features of Fragment Libraries for Protein Structure Prediction.

    Science.gov (United States)

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  2. Is Internet search better than structured instruction for web-based health education?

    Science.gov (United States)

    Finkelstein, Joseph; Bedra, McKenzie

    2013-01-01

    Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.

  3. Applications of contact predictions to structural biology

    Directory of Open Access Journals (Sweden)

    Felix Simkovic

    2017-05-01

    Full Text Available Evolutionary pressure on residue interactions, intramolecular or intermolecular, that are important for protein structure or function can lead to covariance between the two positions. Recent methodological advances allow much more accurate contact predictions to be derived from this evolutionary covariance signal. The practical application of contact predictions has largely been confined to structural bioinformatics, yet, as this work seeks to demonstrate, the data can be of enormous value to the structural biologist working in X-ray crystallography, cryo-EM or NMR. Integrative structural bioinformatics packages such as Rosetta can already exploit contact predictions in a variety of ways. The contribution of contact predictions begins at construct design, where structural domains may need to be expressed separately and contact predictions can help to predict domain limits. Structure solution by molecular replacement (MR benefits from contact predictions in diverse ways: in difficult cases, more accurate search models can be constructed using ab initio modelling when predictions are available, while intermolecular contact predictions can allow the construction of larger, oligomeric search models. Furthermore, MR using supersecondary motifs or large-scale screens against the PDB can exploit information, such as the parallel or antiparallel nature of any β-strand pairing in the target, that can be inferred from contact predictions. Contact information will be particularly valuable in the determination of lower resolution structures by helping to assign sequence register. In large complexes, contact information may allow the identity of a protein responsible for a certain region of density to be determined and then assist in the orientation of an available model within that density. In NMR, predicted contacts can provide long-range information to extend the upper size limit of the technique in a manner analogous but complementary to experimental

  4. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    Science.gov (United States)

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  5. Systematizing Web Search through a Meta-Cognitive, Systems-Based, Information Structuring Model (McSIS)

    Science.gov (United States)

    Abuhamdieh, Ayman H.; Harder, Joseph T.

    2015-01-01

    This paper proposes a meta-cognitive, systems-based, information structuring model (McSIS) to systematize online information search behavior based on literature review of information-seeking models. The General Systems Theory's (GST) prepositions serve as its framework. Factors influencing information-seekers, such as the individual learning…

  6. Addressing special structure in the relevance feedback learning problem through aspect-based image search

    NARCIS (Netherlands)

    M.J. Huiskes (Mark)

    2004-01-01

    textabstractIn this paper we focus on a number of issues regarding special structure in the relevance feedback learning problem, most notably the effects of image selection based on partial relevance on the clustering behavior of examples. We propose a simple scheme, aspect-based image search, which

  7. Contingency Table Browser - prediction of early stage protein structure.

    Science.gov (United States)

    Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

    2015-01-01

    The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.

  8. Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015

    Science.gov (United States)

    Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin

    2017-08-01

    The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.

  9. Analysis of energy-based algorithms for RNA secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Hajiaghayi Monir

    2012-02-01

    Full Text Available Abstract Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA or pseudo-expected accuracy (pseudo-MEA methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms

  10. Querying archetype-based EHRs by search ontology-based XPath engineering.

    Science.gov (United States)

    Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich

    2018-05-11

    Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.

  11. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model.

    Science.gov (United States)

    Xin, Jingzhou; Zhou, Jianting; Yang, Simon X; Li, Xiaoqing; Wang, Yu

    2018-01-19

    Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing

  12. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model

    Directory of Open Access Journals (Sweden)

    Jingzhou Xin

    2018-01-01

    Full Text Available Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA, and generalized autoregressive conditional heteroskedasticity (GARCH. Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS deformation monitoring system demonstrated that: (1 the Kalman filter is capable of denoising the bridge deformation monitoring data; (2 the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3 in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity; the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data

  13. Building a better fragment library for de novo protein structure prediction.

    Directory of Open Access Journals (Sweden)

    Saulo H P de Oliveira

    Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".

  14. Building a Better Fragment Library for De Novo Protein Structure Prediction

    Science.gov (United States)

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  15. Mutagenesis Objective Search and Selection Tool (MOSST: an algorithm to predict structure-function related mutations in proteins

    Directory of Open Access Journals (Sweden)

    Asenjo Juan A

    2011-04-01

    Full Text Available Abstract Background Functionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships. Results A statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein

  16. RAG-3D: a search tool for RNA 3D substructures

    Science.gov (United States)

    Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; Schlick, Tamar

    2015-01-01

    To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding. PMID:26304547

  17. Perceived breast cancer risk: heuristic reasoning and search for a dominance structure.

    Science.gov (United States)

    Katapodi, Maria C; Facione, Noreen C; Humphreys, Janice C; Dodd, Marylin J

    2005-01-01

    Studies suggest that people construct their risk perceptions by using inferential rules called heuristics. The purpose of this study was to identify heuristics that influence perceived breast cancer risk. We examined 11 interviews from women of diverse ethnic/cultural backgrounds who were recruited from community settings. Narratives in which women elaborated about their own breast cancer risk were analyzed with Argument and Heuristic Reasoning Analysis methodology, which is based on applied logic. The availability, simulation, representativeness, affect, and perceived control heuristics, and search for a dominance structure were commonly used for making risk assessments. Risk assessments were based on experiences with an abnormal breast symptom, experiences with affected family members and friends, beliefs about living a healthy lifestyle, and trust in health providers. Assessment of the potential threat of a breast symptom was facilitated by the search for a dominance structure. Experiences with family members and friends were incorporated into risk assessments through the availability, simulation, representativeness, and affect heuristics. Mistrust in health providers led to an inappropriate dependence on the perceived control heuristic. Identified heuristics appear to create predictable biases and suggest that perceived breast cancer risk is based on common cognitive patterns.

  18. Three dimensional pattern recognition using feature-based indexing and rule-based search

    Science.gov (United States)

    Lee, Jae-Kyu

    In flexible automated manufacturing, robots can perform routine operations as well as recover from atypical events, provided that process-relevant information is available to the robot controller. Real time vision is among the most versatile sensing tools, yet the reliability of machine-based scene interpretation can be questionable. The effort described here is focused on the development of machine-based vision methods to support autonomous nuclear fuel manufacturing operations in hot cells. This thesis presents a method to efficiently recognize 3D objects from 2D images based on feature-based indexing. Object recognition is the identification of correspondences between parts of a current scene and stored views of known objects, using chains of segments or indexing vectors. To create indexed object models, characteristic model image features are extracted during preprocessing. Feature vectors representing model object contours are acquired from several points of view around each object and stored. Recognition is the process of matching stored views with features or patterns detected in a test scene. Two sets of algorithms were developed, one for preprocessing and indexed database creation, and one for pattern searching and matching during recognition. At recognition time, those indexing vectors with the highest match probability are retrieved from the model image database, using a nearest neighbor search algorithm. The nearest neighbor search predicts the best possible match candidates. Extended searches are guided by a search strategy that employs knowledge-base (KB) selection criteria. The knowledge-based system simplifies the recognition process and minimizes the number of iterations and memory usage. Novel contributions include the use of a feature-based indexing data structure together with a knowledge base. Both components improve the efficiency of the recognition process by improved structuring of the database of object features and reducing data base size

  19. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

    Science.gov (United States)

    Xu, Dong; Zhang, Yang

    2012-07-01

    Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.

  20. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....

  1. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  2. Evolving stochastic context-free grammars for RNA secondary structure prediction

    DEFF Research Database (Denmark)

    Anderson, James WJ; Tataru, Paula Cristina; Stains, Joe

    2012-01-01

    Background Stochastic Context-Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few...... to structure prediction as has been previously suggested. Results These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars...... with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars. Conclusions Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many...

  3. Predicting protein structures with a multiplayer online game.

    Science.gov (United States)

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-05

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  4. Similarity relations in visual search predict rapid visual categorization

    Science.gov (United States)

    Mohan, Krithika; Arun, S. P.

    2012-01-01

    How do we perform rapid visual categorization?It is widely thought that categorization involves evaluating the similarity of an object to other category items, but the underlying features and similarity relations remain unknown. Here, we hypothesized that categorization performance is based on perceived similarity relations between items within and outside the category. To this end, we measured the categorization performance of human subjects on three diverse visual categories (animals, vehicles, and tools) and across three hierarchical levels (superordinate, basic, and subordinate levels among animals). For the same subjects, we measured their perceived pair-wise similarities between objects using a visual search task. Regardless of category and hierarchical level, we found that the time taken to categorize an object could be predicted using its similarity to members within and outside its category. We were able to account for several classic categorization phenomena, such as (a) the longer times required to reject category membership; (b) the longer times to categorize atypical objects; and (c) differences in performance across tasks and across hierarchical levels. These categorization times were also accounted for by a model that extracts coarse structure from an image. The striking agreement observed between categorization and visual search suggests that these two disparate tasks depend on a shared coarse object representation. PMID:23092947

  5. Protein Tertiary Structure Prediction Based on Main Chain Angle Using a Hybrid Bees Colony Optimization Algorithm

    Science.gov (United States)

    Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen

    Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.

  6. Predictive Methods for Dense Polymer Networks: Combating Bias with Bio-Based Structures

    Science.gov (United States)

    2016-03-16

    Combating bias with bio - based structures 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Andrew J. Guenthner...unlimited. PA Clearance 16152 Integrity  Service  Excellence Predictive methods for dense polymer networks: Combating bias with bio -based...Architectural Bias • Comparison of Petroleum-Based and Bio -Based Chemical Architectures • Continuing Research on Structure-Property Relationships using

  7. Alpha complexes in protein structure prediction

    DEFF Research Database (Denmark)

    Winter, Pawel; Fonseca, Rasmus

    2015-01-01

    Reducing the computational effort and increasing the accuracy of potential energy functions is of utmost importance in modeling biological systems, for instance in protein structure prediction, docking or design. Evaluating interactions between nonbonded atoms is the bottleneck of such computations......-complexes from scratch for every configuration encountered during the search for the native structure would make this approach hopelessly slow. However, it is argued that kinetic a-complexes can be used to reduce the computational effort of determining the potential energy when "moving" from one configuration...... to a neighboring one. As a consequence, relatively expensive (initial) construction of an a-complex is expected to be compensated by subsequent fast kinetic updates during the search process. Computational results presented in this paper are limited. However, they suggest that the applicability of a...

  8. IMPROVING PERSONALIZED WEB SEARCH USING BOOKSHELF DATA STRUCTURE

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2012-10-01

    Full Text Available Search engines are playing a vital role in retrieving relevant information for the web user. In this research work a user profile based web search is proposed. So the web user from different domain may receive different set of results. The main challenging work is to provide relevant results at the right level of reading difficulty. Estimating user expertise and re-ranking the results are the main aspects of this paper. The retrieved results are arranged in Bookshelf Data Structure for easy access. Better presentation of search results hence increases the usability of web search engines significantly in visual mode.

  9. From structure prediction to genomic screens for novel non-coding RNAs.

    Science.gov (United States)

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  10. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

    Science.gov (United States)

    Zheng, Ce; Kurgan, Lukasz

    2008-10-10

    beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction

  11. Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.

    Science.gov (United States)

    Agarwal, Vibhu; Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H

    2016-09-21

    By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted

  12. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    Directory of Open Access Journals (Sweden)

    Kurgan Lukasz

    2008-10-01

    Full Text Available Abstract Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM values serve as an input to the support vector machine (SVM predictor. Results We show that (1 all four predicted secondary structures are useful; (2 the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3 the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an

  13. HDAPD: a web tool for searching the disease-associated protein structures

    Science.gov (United States)

    2010-01-01

    Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919

  14. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  15. Predicting Drug Recalls From Internet Search Engine Queries.

    Science.gov (United States)

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  16. Novel prediction- and subblock-based algorithm for fractal image compression

    International Nuclear Information System (INIS)

    Chung, K.-L.; Hsu, C.-H.

    2006-01-01

    Fractal encoding is the most consuming part in fractal image compression. In this paper, a novel two-phase prediction- and subblock-based fractal encoding algorithm is presented. Initially the original gray image is partitioned into a set of variable-size blocks according to the S-tree- and interpolation-based decomposition principle. In the first phase, each current block of variable-size range block tries to find the best matched domain block based on the proposed prediction-based search strategy which utilizes the relevant neighboring variable-size domain blocks. The first phase leads to a significant computation-saving effect. If the domain block found within the predicted search space is unacceptable, in the second phase, a subblock strategy is employed to partition the current variable-size range block into smaller blocks to improve the image quality. Experimental results show that our proposed prediction- and subblock-based fractal encoding algorithm outperforms the conventional full search algorithm and the recently published spatial-correlation-based algorithm by Truong et al. in terms of encoding time and image quality. In addition, the performance comparison among our proposed algorithm and the other two algorithms, the no search-based algorithm and the quadtree-based algorithm, are also investigated

  17. A hydronitrogen solid: high pressure ab initio evolutionary structure searches

    International Nuclear Information System (INIS)

    Hu Anguang; Zhang Fan

    2011-01-01

    High pressure ab initio evolutionary structure searches resulted in a hydronitrogen solid with a composition of (NH) 4 . The structure searches also provided two molecular isomers, ammonium azide (AA) and trans-tetrazene (TTZ) which were previously discovered experimentally and can be taken as molecular precursors for high pressure synthesis of the hydronitrogen solid. The computed pressure versus enthalpy diagram showed that the transformation pressure to the hydronitrogen solid is 36 GPa from AA and 75 GPa from TTZ. Its metastability was analyzed by the phonon dispersion spectrum and room-temperature vibrational density of state together with the transformation energy barrier back to molecular phases at 298 K. The predicted energy barrier of 0.21 eV/atom means that the proposed hydronitrogen solid should be very stable at ambient conditions. (fast track communication)

  18. Viral IRES prediction system - a web server for prediction of the IRES secondary structure in silico.

    Directory of Open Access Journals (Sweden)

    Jun-Jie Hong

    Full Text Available The internal ribosomal entry site (IRES functions as cap-independent translation initiation sites in eukaryotic cells. IRES elements have been applied as useful tools for bi-cistronic expression vectors. Current RNA structure prediction programs are unable to predict precisely the potential IRES element. We have designed a viral IRES prediction system (VIPS to perform the IRES secondary structure prediction. In order to obtain better results for the IRES prediction, the VIPS can evaluate and predict for all four different groups of IRESs with a higher accuracy. RNA secondary structure prediction, comparison, and pseudoknot prediction programs were implemented to form the three-stage procedure for the VIPS. The backbone of VIPS includes: the RNAL fold program, aimed to predict local RNA secondary structures by minimum free energy method; the RNA Align program, intended to compare predicted structures; and pknotsRG program, used to calculate the pseudoknot structure. VIPS was evaluated by using UTR database, IRES database and Virus database, and the accuracy rate of VIPS was assessed as 98.53%, 90.80%, 82.36% and 80.41% for IRES groups 1, 2, 3, and 4, respectively. This advance useful search approach for IRES structures will facilitate IRES related studies. The VIPS on-line website service is available at http://140.135.61.250/vips/.

  19. Prediction of molecular crystal structures

    International Nuclear Information System (INIS)

    Beyer, Theresa

    2001-01-01

    The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol -1 of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy

  20. Prediction of molecular crystal structures

    Energy Technology Data Exchange (ETDEWEB)

    Beyer, Theresa

    2001-07-01

    The ab initio prediction of molecular crystal structures is a scientific challenge. Reliability of first-principle prediction calculations would show a fundamental understanding of crystallisation. Crystal structure prediction is also of considerable practical importance as different crystalline arrangements of the same molecule in the solid state (polymorphs)are likely to have different physical properties. A method of crystal structure prediction based on lattice energy minimisation has been developed in this work. The choice of the intermolecular potential and of the molecular model is crucial for the results of such studies and both of these criteria have been investigated. An empirical atom-atom repulsion-dispersion potential for carboxylic acids has been derived and applied in a crystal structure prediction study of formic, benzoic and the polymorphic system of tetrolic acid. As many experimental crystal structure determinations at different temperatures are available for the polymorphic system of paracetamol (acetaminophen), the influence of the variations of the molecular model on the crystal structure lattice energy minima, has also been studied. The general problem of prediction methods based on the assumption that the experimental thermodynamically stable polymorph corresponds to the global lattice energy minimum, is that more hypothetical low lattice energy structures are found within a few kJ mol{sup -1} of the global minimum than are likely to be experimentally observed polymorphs. This is illustrated by the results for molecule I, 3-oxabicyclo(3.2.0)hepta-1,4-diene, studied for the first international blindtest for small organic crystal structures organised by the Cambridge Crystallographic Data Centre (CCDC) in May 1999. To reduce the number of predicted polymorphs, additional factors to thermodynamic criteria have to be considered. Therefore the elastic constants and vapour growth morphologies have been calculated for the lowest lattice energy

  1. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    Science.gov (United States)

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  2. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  3. LoopIng: a template-based tool for predicting the structure of protein loops.

    KAUST Repository

    Messih, Mario Abdel

    2015-08-06

    Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4-10 residues) and significant enhancements for long loops (11-20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop).www.biocomputing.it/loopinganna.tramontano@uniroma1.itSupplementary data are available at Bioinformatics online.

  4. Structural prediction in aphasia

    Directory of Open Access Journals (Sweden)

    Tessa Warren

    2015-05-01

    had varying levels of sentence-comprehension impairment read sentences where an upcoming disjunction either could (1b or could not (1a be predicted, based on the presence of either (Staub & Clifton, 2006; see Figure 1 for example. If either spurs PWA to predict an upcoming disjunction and this prediction is facilitative, then reading times on the or and second disjunct (or a beautiful portrait should be faster in the Either condition than in the No Either condition. Results confirmed this prediction (see Figure 1; β=352, t=2.36. The magnitude of this facilitation was related to overall language-impairment severity on the Comprehensive Aphasia Test (CAT: Swinburn, et al., 2004: PWA with milder language impairments showed more facilitation for either than PWA with more severe language impairments (r=.594, p.05. This finding represents strong and novel evidence that PWA can use a lexical cue to predict the structural form of upcoming material during comprehension. However, the lack of relation between these PWA’s degree of structural facilitation and their sentence comprehension ability may indicate that structural predictions could speed reading without improving comprehension.

  5. Fast Structural Search in Phylogenetic Databases

    Directory of Open Access Journals (Sweden)

    William H. Piel

    2005-01-01

    Full Text Available As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P . The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

  6. Can Internet search queries help to predict stock market volatility?

    OpenAIRE

    Dimpfl, Thomas; Jank, Stephan

    2011-01-01

    This paper studies the dynamics of stock market volatility and retail investor attention measured by internet search queries. We find a strong co-movement of stock market indices’ realized volatility and the search queries for their names. Furthermore, Granger causality is bi-directional: high searches follow high volatility, and high volatility follows high searches. Using the latter feedback effect to predict volatility we find that search queries contain additional information about market...

  7. The role of space and time in object-based visual search

    NARCIS (Netherlands)

    Schreij, D.B.B.; Olivers, C.N.L.

    2013-01-01

    Recently we have provided evidence that observers more readily select a target from a visual search display if the motion trajectory of the display object suggests that the observer has dealt with it before. Here we test the prediction that this object-based memory effect on search breaks down if

  8. Search Path Evaluation Incorporating Object Placement Structure

    National Research Council Canada - National Science Library

    Baylog, John G; Wettergren, Thomas A

    2007-01-01

    This report describes a computationally robust approach to search path performance evaluation where the objects of search interest exhibit structure in the way in which they occur within the search space...

  9. Robust object tacking based on self-adaptive search area

    Science.gov (United States)

    Dong, Taihang; Zhong, Sheng

    2018-02-01

    Discriminative correlation filter (DCF) based trackers have recently achieved excellent performance with great computational efficiency. However, DCF based trackers suffer boundary effects, which result in the unstable performance in challenging situations exhibiting fast motion. In this paper, we propose a novel method to mitigate this side-effect in DCF based trackers. We change the search area according to the prediction of target motion. When the object moves fast, broad search area could alleviate boundary effects and reserve the probability of locating object. When the object moves slowly, narrow search area could prevent effect of useless background information and improve computational efficiency to attain real-time performance. This strategy can impressively soothe boundary effects in situations exhibiting fast motion and motion blur, and it can be used in almost all DCF based trackers. The experiments on OTB benchmark show that the proposed framework improves the performance compared with the baseline trackers.

  10. Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

    Directory of Open Access Journals (Sweden)

    Taigang Liu

    2015-12-01

    Full Text Available The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE. These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

  11. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction

    Science.gov (United States)

    Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.

    2013-01-01

    In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594

  12. PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

    Science.gov (United States)

    Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

    2001-01-01

    Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681

  13. Protein structure based prediction of catalytic residues.

    Science.gov (United States)

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  14. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

    Science.gov (United States)

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

    2017-05-01

    Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  15. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

    Science.gov (United States)

    Ain, Qurrat Ul; Aleksandrova, Antoniya; Roessler, Florian D; Ballester, Pedro J

    2015-01-01

    Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.

  16. Evidence-based Medicine Search: a customizable federated search engine.

    Science.gov (United States)

    Bracke, Paul J; Howse, David K; Keim, Samuel M

    2008-04-01

    This paper reports on the development of a tool by the Arizona Health Sciences Library (AHSL) for searching clinical evidence that can be customized for different user groups. The AHSL provides services to the University of Arizona's (UA's) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM) Search, that provides users with a simple search interface to EBM resources and presents results organized according to an evidence pyramid. EBM Search was developed with a web-based configuration component that allows the tool to be customized for different specialties. Informal and anecdotal feedback from physicians indicates that EBM Search is a useful tool with potential in teaching evidence-based decision making. While formal evaluation is still being planned, a tool such as EBM Search, which can be configured for specific user populations, may help lower barriers to information resources in an academic health sciences center.

  17. Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?

    Science.gov (United States)

    Skolnick, Jeffrey; Zhou, Hongyi

    2017-04-20

    Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.

  18. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    Science.gov (United States)

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  19. Conceptual Web Users' Actions Prediction for Ontology-Based Browsing Recommendations

    Science.gov (United States)

    Robal, Tarmo; Kalja, Ahto

    The Internet consists of thousands of web sites with different kinds of structures. However, users are browsing the web according to their informational expectations towards the web site searched, having an implicit conceptual model of the domain in their minds. Nevertheless, people tend to repeat themselves and have partially shared conceptual views while surfing the web, finding some areas of web sites more interesting than others. Herein, we take advantage of the latter and provide a model and a study on predicting users' actions based on the web ontology concepts and their relations.

  20. Optimum Design of Braced Steel Space Frames including Soil-Structure Interaction via Teaching-Learning-Based Optimization and Harmony Search Algorithms

    OpenAIRE

    Ayse T. Daloglu; Musa Artar; Korhan Ozgan; Ali İ. Karakas

    2018-01-01

    Optimum design of braced steel space frames including soil-structure interaction is studied by using harmony search (HS) and teaching-learning-based optimization (TLBO) algorithms. A three-parameter elastic foundation model is used to incorporate the soil-structure interaction effect. A 10-storey braced steel space frame example taken from literature is investigated according to four different bracing types for the cases with/without soil-structure interaction. X, V, Z, and eccentric V-shaped...

  1. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  2. RNA-SSPT: RNA Secondary Structure Prediction Tools.

    Science.gov (United States)

    Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

    2013-01-01

    The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.

  3. Structure-based function prediction of the expanding mollusk tyrosinase family

    Science.gov (United States)

    Huang, Ronglian; Li, Li; Zhang, Guofan

    2017-11-01

    Tyrosinase (Ty) is a common enzyme found in many different animal groups. In our previous study, genome sequencing revealed that the Ty family is expanded in the Pacific oyster ( Crassostrea gigas). Here, we examine the larger number of Ty family members in the Pacific oyster by high-level structure prediction to obtain more information about their function and evolution, especially the unknown role in biomineralization. We verified 12 Ty gene sequences from Crassostrea gigas genome and Pinctada fucata martensii transcriptome. By using phylogenetic analysis of these Tys with functionally known Tys from other molluscan species, eight subgroups were identified (CgTy_s1, CgTy_s2, MolTy_s1, MolTy-s2, MolTy-s3, PinTy-s1, PinTy-s2 and PviTy). Structural data and surface pockets of the dinuclear copper center in the eight subgroups of molluscan Ty were obtained using the latest versions of prediction online servers. Structural comparison with other Ty proteins from the protein databank revealed functionally important residues (HA1, HA2, HA3, HB1, HB2, HB3, Z1-Z9) and their location within these protein structures. The structural and chemical features of these pockets which may related to the substrate binding showed considerable variability among mollusks, which undoubtedly defines Ty substrate binding. Finally, we discuss the potential driving forces of Ty family evolution in mollusks. Based on these observations, we conclude that the Ty family has rapidly evolved as a consequence of substrate adaptation in mollusks.

  4. Parallel protein secondary structure prediction based on neural networks.

    Science.gov (United States)

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  5. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

    Directory of Open Access Journals (Sweden)

    Huiying Zhao

    Full Text Available As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions. A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC of 0.77 with high precision (94% and high sensitivity (65%. We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA] is available as an on-line server at http://sparks-lab.org.

  6. [Modeling of a three-dimensional structure of cytochrome P-450 1A2 and search for its new ligands].

    Science.gov (United States)

    Belkina, N V; Skvortsov, V S; Ivanov, A S; Archakov, A I

    1998-01-01

    The substances inhibiting cytochrome P450 1A2 (CYP1A2) represent a perspective class of new drugs, which application in clinical practice can become the important part in preventive maintenance in oncology. The present work is devoted to computer modelling of 3-D structure of CYP1A2 and searching of new inhibitors by database mining. The modelling of CYP1A2 was done based on homology with 4 bacterial cytochromes P450 with known 3-D structure. For optimization of CYP1A2 active site structure the models of its complexes with characteristic substrates (caffeine and 7-ethoxyresorufin) were designed. These complexes were optimized by molecular dynamics simulation in water. The models of 24 complexes of CYP1A2 with known ligands with known Kd were designed by means of DockSearch and LeapFrog programs. 3D-QSAR model with good predictive force was created based on these complexes. On a final stage the search of knew CYP1A2 ligands in testing database (more than 23.000 substances from database Maybridge and 112 known CYP1A2 ligands from database Metabolite, MDL) was executed. 680 potential ligands of CYP1A2 with Kd values, comparable with known ones were obtained. This number has included 73 compounds from 112 known ligands, introduced in tested database as the internal control.

  7. Improving the accuracy of protein secondary structure prediction using structural alignment

    Directory of Open Access Journals (Sweden)

    Gallin Warren J

    2006-06-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3 of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences, the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25% onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs

  8. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    Science.gov (United States)

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  9. Web-page Prediction for Domain Specific Web-search using Boolean Bit Mask

    OpenAIRE

    Sinha, Sukanta; Duttagupta, Rana; Mukhopadhyay, Debajyoti

    2012-01-01

    Search Engine is a Web-page retrieval tool. Nowadays Web searchers utilize their time using an efficient search engine. To improve the performance of the search engine, we are introducing a unique mechanism which will give Web searchers more prominent search results. In this paper, we are going to discuss a domain specific Web search prototype which will generate the predicted Web-page list for user given search string using Boolean bit mask.

  10. Predicting protein structures with a multiplayer online game

    OpenAIRE

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran

    2010-01-01

    People exert significant amounts of problem solving effort playing computer games. Simple image- and text-recognition tasks have been successfully crowd-sourced through gamesi, ii, iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search sp...

  11. LigSearch: a knowledge-based web server to identify likely ligands for a protein target

    Energy Technology Data Exchange (ETDEWEB)

    Beer, Tjaart A. P. de; Laskowski, Roman A. [European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD (United Kingdom); Duban, Mark-Eugene [Northwestern University Feinberg School of Medicine, Chicago, Illinois (United States); Chan, A. W. Edith [University College London, London WC1E 6BT (United Kingdom); Anderson, Wayne F. [Northwestern University Feinberg School of Medicine, Chicago, Illinois (United States); Thornton, Janet M., E-mail: thornton@ebi.ac.uk [European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD (United Kingdom)

    2013-12-01

    LigSearch is a web server for identifying ligands likely to bind to a given protein. Identifying which ligands might bind to a protein before crystallization trials could provide a significant saving in time and resources. LigSearch, a web server aimed at predicting ligands that might bind to and stabilize a given protein, has been developed. Using a protein sequence and/or structure, the system searches against a variety of databases, combining available knowledge, and provides a clustered and ranked output of possible ligands. LigSearch can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/LigSearch.

  12. Can Google Searches Predict the Popularity and Harm of Psychoactive Agents?

    Science.gov (United States)

    Jankowski, Wojciech; Hoffmann, Marcin

    2016-02-25

    Predicting the popularity of and harm caused by psychoactive agents is a serious problem that would be difficult to do by a single simple method. However, because of the growing number of drugs it is very important to provide a simple and fast tool for predicting some characteristics of these substances. We were inspired by the Google Flu Trends study on the activity of the influenza virus, which showed that influenza virus activity worldwide can be monitored based on queries entered into the Google search engine. Our aim was to propose a fast method for ranking the most popular and most harmful drugs based on easily available data gathered from the Internet. We used the Google search engine to acquire data for the ranking lists. Subsequently, using the resulting list and the frequency of hits for the respective psychoactive drugs combined with the word "harm" or "harmful", we estimated quickly how much harm is associated with each drug. We ranked the most popular and harmful psychoactive drugs. As we conducted the research over a period of several months, we noted that the relative popularity indexes tended to change depending on when we obtained them. This suggests that the data may be useful in monitoring changes over time in the use of each of these psychoactive agents. Our data correlate well with the results from a multicriteria decision analysis of drug harms in the United Kingdom. We showed that Google search data can be a valuable source of information to assess the popularity of and harm caused by psychoactive agents and may help in monitoring drug use trends.

  13. Using Search Engine Data as a Tool to Predict Syphilis.

    Science.gov (United States)

    Young, Sean D; Torrone, Elizabeth A; Urata, John; Aral, Sevgi O

    2018-07-01

    Researchers have suggested that social media and online search data might be used to monitor and predict syphilis and other sexually transmitted diseases. Because people at risk for syphilis might seek sexual health and risk-related information on the internet, we investigated associations between internet state-level search query data (e.g., Google Trends) and reported weekly syphilis cases. We obtained weekly counts of reported primary and secondary syphilis for 50 states from 2012 to 2014 from the US Centers for Disease Control and Prevention. We collected weekly internet search query data regarding 25 risk-related keywords from 2012 to 2014 for 50 states using Google Trends. We joined 155 weeks of Google Trends data with 1-week lag to weekly syphilis data for a total of 7750 data points. Using the least absolute shrinkage and selection operator, we trained three linear mixed models on the first 10 weeks of each year. We validated models for 2012 and 2014 for the following 52 weeks and the 2014 model for the following 42 weeks. The models, consisting of different sets of keyword predictors for each year, accurately predicted 144 weeks of primary and secondary syphilis counts for each state, with an overall average R of 0.9 and overall average root mean squared error of 4.9. We used Google Trends search data from the prior week to predict cases of syphilis in the following weeks for each state. Further research could explore how search data could be integrated into public health monitoring systems.

  14. A Combination of Terrain Prediction and Correction for Search and Rescue Robot Autonomous Navigation

    Directory of Open Access Journals (Sweden)

    Yan Guo

    2009-09-01

    Full Text Available This paper presents a novel two-step autonomous navigation method for search and rescue robot. The algorithm based on the vision is proposed for terrain identification to give a prediction of the safest path with the support vector regression machine (SVRM trained off-line with the texture feature and color features. And correction algorithm of the prediction based the vibration information is developed during the robot traveling, using the judgment function given in the paper. The region with fault prediction will be corrected with the real traversability value and be used to update the SVRM. The experiment demonstrates that this method could help the robot to find the optimal path and be protected from the trap brought from the error between prediction and the real environment.

  15. Novel structures of oxygen adsorbed on a Zr(0001) surface predicted from first principles

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Bo [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China); Wang, Jianyun [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Lv, Jian [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); College of Materials Science and Engineering, Jilin University, Changchun, 130012 (China); Gao, Xingyu [Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, 100088 (China); CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Zhao, Yafan [CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Wang, Yanchao, E-mail: wyc@calypso.cn [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China); College of Materials Science and Engineering, Jilin University, Changchun, 130012 (China); Song, Haifeng, E-mail: song_haifeng@iapcm.ac.cn [Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, 100088 (China); CAEP Software Center for High Performance Numerical Simulation, Beijing, 100088 (China); Ma, Yanming [State Key Laboratory of Superhard Materials, Jilin University, Changchun, 130012 (China); Beijing computational science research center, Beijing,100084 (China)

    2017-01-30

    Highlights: • Two stable structures of O adsorbed on a Zr(0001) surface are predicted with SLAM. • A stable structure of O adsorbed on a Zr(0001) surface is proposed with MLAM. • The calculated work function change is agreement with experimental value. - Abstract: The structures of O atoms adsorbed on a metal surface influence the metal properties significantly. Thus, studying O chemisorption on a Zr surface is of great interest. We investigated O adsorption on a Zr(0001) surface using our newly developed structure-searching method combined with first-principles calculations. A novel structural prototype with a unique combination of surface face-centered cubic (SFCC) and surface hexagonal close-packed (SHCP) O adsorption sites was predicted using a single-layer adsorption model (SLAM) for a 0.5 and 1.0 monolayer (ML) O coverage. First-principles calculations based on the SLAM revealed that the new predicted structures are energetically favorable compared with the well-known SFCC structures for a low O coverage (0.5 and 1.0 ML). Furthermore, on basis of our predicted SFCC + SHCP structures, a new structure within multi-layer adsorption model (MLAM) was proposed to be more stable at the O coverage of 1.0 ML, in which adsorbed O atoms occupy the SFCC + SHCP sites and the substitutional octahedral sites. The calculated work functions indicate that the SFCC + SHCP configuration has the lowest work function of all known structures at an O coverage of 0.5 ML within the SLAM, which agrees with the experimental trend of work function with variation in O coverage.

  16. Mixing Energy Models in Genetic Algorithms for On-Lattice Protein Structure Prediction

    Directory of Open Access Journals (Sweden)

    Mahmood A. Rashid

    2013-01-01

    Full Text Available Protein structure prediction (PSP is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20×20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.

  17. UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

    Science.gov (United States)

    Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam

    2018-04-30

    A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.

  18. Contextual remapping in visual search after predictable target-location changes.

    Science.gov (United States)

    Conci, Markus; Sun, Luning; Müller, Hermann J

    2011-07-01

    Invariant spatial context can facilitate visual search. For instance, detection of a target is faster if it is presented within a repeatedly encountered, as compared to a novel, layout of nontargets, demonstrating a role of contextual learning for attentional guidance ('contextual cueing'). Here, we investigated how context-based learning adapts to target location (and identity) changes. Three experiments were performed in which, in an initial learning phase, observers learned to associate a given context with a given target location. A subsequent test phase then introduced identity and/or location changes to the target. The results showed that contextual cueing could not compensate for target changes that were not 'predictable' (i.e. learnable). However, for predictable changes, contextual cueing remained effective even immediately after the change. These findings demonstrate that contextual cueing is adaptive to predictable target location changes. Under these conditions, learned contextual associations can be effectively 'remapped' to accommodate new task requirements.

  19. Prediction of Seismic Damage-Based Degradation in RC Structures

    DEFF Research Database (Denmark)

    Kirkegaard, Poul Henning; Gupta, Vinay K.; Nielsen, Søren R.K.

    Estimation of structural damage from known increase in the fundamental period of a structure after an earthquake or prediction of degradation of stiffness and strength for known damage requires reliable correlations between these response functionals. This study proposes a modified Clough-Johnsto...

  20. Short-term wind power prediction based on LSSVM–GSA model

    International Nuclear Information System (INIS)

    Yuan, Xiaohui; Chen, Chen; Yuan, Yanbin; Huang, Yuehua; Tan, Qingxiong

    2015-01-01

    Highlights: • A hybrid model is developed for short-term wind power prediction. • The model is based on LSSVM and gravitational search algorithm. • Gravitational search algorithm is used to optimize parameters of LSSVM. • Effect of different kernel function of LSSVM on wind power prediction is discussed. • Comparative studies show that prediction accuracy of wind power is improved. - Abstract: Wind power forecasting can improve the economical and technical integration of wind energy into the existing electricity grid. Due to its intermittency and randomness, it is hard to forecast wind power accurately. For the purpose of utilizing wind power to the utmost extent, it is very important to make an accurate prediction of the output power of a wind farm under the premise of guaranteeing the security and the stability of the operation of the power system. In this paper, a hybrid model (LSSVM–GSA) based on the least squares support vector machine (LSSVM) and gravitational search algorithm (GSA) is proposed to forecast the short-term wind power. As the kernel function and the related parameters of the LSSVM have a great influence on the performance of the prediction model, the paper establishes LSSVM model based on different kernel functions for short-term wind power prediction. And then an optimal kernel function is determined and the parameters of the LSSVM model are optimized by using GSA. Compared with the Back Propagation (BP) neural network and support vector machine (SVM) model, the simulation results show that the hybrid LSSVM–GSA model based on exponential radial basis kernel function and GSA has higher accuracy for short-term wind power prediction. Therefore, the proposed LSSVM–GSA is a better model for short-term wind power prediction

  1. Modeling and simulation of adaptive Neuro-fuzzy based intelligent system for predictive stabilization in structured overlay networks

    Directory of Open Access Journals (Sweden)

    Ramanpreet Kaur

    2017-02-01

    Full Text Available Intelligent prediction of neighboring node (k well defined neighbors as specified by the dht protocol dynamism is helpful to improve the resilience and can reduce the overhead associated with topology maintenance of structured overlay networks. The dynamic behavior of overlay nodes depends on many factors such as underlying user’s online behavior, geographical position, time of the day, day of the week etc. as reported in many applications. We can exploit these characteristics for efficient maintenance of structured overlay networks by implementing an intelligent predictive framework for setting stabilization parameters appropriately. Considering the fact that human driven behavior usually goes beyond intermittent availability patterns, we use a hybrid Neuro-fuzzy based predictor to enhance the accuracy of the predictions. In this paper, we discuss our predictive stabilization approach, implement Neuro-fuzzy based prediction in MATLAB simulation and apply this predictive stabilization model in a chord based overlay network using OverSim as a simulation tool. The MATLAB simulation results present that the behavior of neighboring nodes is predictable to a large extent as indicated by the very small RMSE. The OverSim based simulation results also observe significant improvements in the performance of chord based overlay network in terms of lookup success ratio, lookup hop count and maintenance overhead as compared to periodic stabilization approach.

  2. Tchebichef image moment approach to the prediction of protein secondary structures based on circular dichroism.

    Science.gov (United States)

    Li, Sha Sha; Li, Bao Qiong; Liu, Jin Jin; Lu, Shao Hua; Zhai, Hong Lin

    2018-04-20

    Circular dichroism (CD) spectroscopy is a widely used technique for the evaluation of protein secondary structures that has a significant impact for the understanding of molecular biology. However, the quantitative analysis of protein secondary structures based on CD spectra is still a hard work due to the serious overlap of the spectra corresponding to different structural motifs. Here, Tchebichef image moment (TM) approach is introduced for the first time, which can effectively extract the chemical features in CD spectra for the quantitative analysis of protein secondary structures. The proposed approach was applied to analyze reference set. and the obtained results were evaluated by the strict statistical parameters such as correlation coefficient, cross-validation correlation coefficient and root mean squared error. Compared with several specialized prediction methods, TM approach provided satisfactory results, especially for turns and unordered structures. Our study indicates that TM approach can be regarded as a feasible tool for the analysis of the secondary structures of proteins based on CD spectra. An available TMs package is provided and can be used directly for secondary structures prediction. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  3. An adaptive bin framework search method for a beta-sheet protein homopolymer model

    Directory of Open Access Journals (Sweden)

    Hoos Holger H

    2007-04-01

    Full Text Available Abstract Background The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. Results In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. Conclusion We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.

  4. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China

    Science.gov (United States)

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-01-01

    Objectives Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Design Ecological study. Setting and participants Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011–2014. Analyses were conducted at aggregate level and no confidential information was involved. Outcome measures A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. Results A high correlation between HFMD incidence and BDI (r=0.794, pmodel. Compared with the model based on surveillance data only, the ARIMAX model including BDI reached the best goodness-of-fit with an Akaike information criterion (AIC) value of −345.332, whereas the model including both BDI and temperature had the most accurate prediction in terms of the mean absolute percentage error (MAPE) of 101.745%. Conclusions An ARIMAX model incorporating search engine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. PMID:28988169

  5. Structure-Based Search for New Inhibitors of Cholinesterases

    Directory of Open Access Journals (Sweden)

    Barbara Malawska

    2013-03-01

    Full Text Available Cholinesterases are important biological targets responsible for regulation of cholinergic transmission, and their inhibitors are used for the treatment of Alzheimer’s disease. To design new cholinesterase inhibitors, of different structure-based design strategies was followed, including the modification of compounds from a previously developed library and a fragment-based design approach. This led to the selection of heterodimeric structures as potential inhibitors. Synthesis and biological evaluation of selected candidates confirmed that the designed compounds were acetylcholinesterase inhibitors with IC50 values in the mid-nanomolar to low micromolar range, and some of them were also butyrylcholinesterase inhibitors.

  6. Prediction of RNA secondary structures: from theory to models and real molecules

    International Nuclear Information System (INIS)

    Schuster, Peter

    2006-01-01

    RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA

  7. Content Based Searching for INIS

    International Nuclear Information System (INIS)

    Jain, V.; Jain, R.K.

    2016-01-01

    Full text: Whatever a user wants is available on the internet, but to retrieve the information efficiently, a multilingual and most-relevant document search engine is a must. Most current search engines are word based or pattern based. They do not consider the meaning of the query posed to them; purely based on the keywords of the query; no support of multilingual query and and dismissal of nonrelevant results. Current information-retrieval techniques either rely on an encoding process, using a certain perspective or classification scheme, to describe a given item, or perform a full-text analysis, searching for user-specified words. Neither case guarantees content matching because an encoded description might reflect only part of the content and the mere occurrence of a word does not necessarily reflect the document’s content. For general documents, there doesn’t yet seem to be a much better option than lazy full-text analysis, by manually going through those endless results pages. In contrast to this, new search engine should extract the meaning of the query and then perform the search based on this extracted meaning. New search engine should also employ Interlingua based machine translation technology to present information in the language of choice of the user. (author

  8. Adiabatic quantum search algorithm for structured problems

    International Nuclear Information System (INIS)

    Roland, Jeremie; Cerf, Nicolas J.

    2003-01-01

    The study of quantum computation has been motivated by the hope of finding efficient quantum algorithms for solving classically hard problems. In this context, quantum algorithms by local adiabatic evolution have been shown to solve an unstructured search problem with a quadratic speedup over a classical search, just as Grover's algorithm. In this paper, we study how the structure of the search problem may be exploited to further improve the efficiency of these quantum adiabatic algorithms. We show that by nesting a partial search over a reduced set of variables into a global search, it is possible to devise quantum adiabatic algorithms with a complexity that, although still exponential, grows with a reduced order in the problem size

  9. Computer predictions on Rh-based double perovskites with unusual electronic and magnetic properties

    Science.gov (United States)

    Halder, Anita; Nafday, Dhani; Sanyal, Prabuddha; Saha-Dasgupta, Tanusri

    2018-03-01

    In search for new magnetic materials, we make computer prediction of structural, electronic and magnetic properties of yet-to-be synthesized Rh-based double perovskite compounds, Sr(Ca)2BRhO6 (B=Cr, Mn, Fe). We use combination of evolutionary algorithm, density functional theory, and statistical-mechanical tool for this purpose. We find that the unusual valence of Rh5+ may be stabilized in these compounds through formation of oxygen ligand hole. Interestingly, while the Cr-Rh and Mn-Rh compounds are predicted to be ferromagnetic half-metals, the Fe-Rh compounds are found to be rare examples of antiferromagnetic and metallic transition-metal oxide with three-dimensional electronic structure. The computed magnetic transition temperatures of the predicted compounds, obtained from finite temperature Monte Carlo study of the first principles-derived model Hamiltonian, are found to be reasonably high. The prediction of favorable growth condition of the compounds, reported in our study, obtained through extensive thermodynamic analysis should be useful for future synthesize of this interesting class of materials with intriguing properties.

  10. Facilitating RNA structure prediction with microarrays.

    Science.gov (United States)

    Kierzek, Elzbieta; Kierzek, Ryszard; Turner, Douglas H; Catrina, Irina E

    2006-01-17

    Determining RNA secondary structure is important for understanding structure-function relationships and identifying potential drug targets. This paper reports the use of microarrays with heptamer 2'-O-methyl oligoribonucleotides to probe the secondary structure of an RNA and thereby improve the prediction of that secondary structure. When experimental constraints from hybridization results are added to a free-energy minimization algorithm, the prediction of the secondary structure of Escherichia coli 5S rRNA improves from 27 to 92% of the known canonical base pairs. Optimization of buffer conditions for hybridization and application of 2'-O-methyl-2-thiouridine to enhance binding and improve discrimination between AU and GU pairs are also described. The results suggest that probing RNA with oligonucleotide microarrays can facilitate determination of secondary structure.

  11. Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna

    2014-01-01

    Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful...... applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min...... on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together....

  12. Constraint Logic Programming approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Fogolari Federico

    2004-11-01

    Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  13. Constraint Logic Programming approach to protein structure prediction.

    Science.gov (United States)

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  14. Chemical structure-based predictive model for methanogenic anaerobic biodegradation potential.

    Science.gov (United States)

    Meylan, William; Boethling, Robert; Aronson, Dallas; Howard, Philip; Tunkel, Jay

    2007-09-01

    Many screening-level models exist for predicting aerobic biodegradation potential from chemical structure, but anaerobic biodegradation generally has been ignored by modelers. We used a fragment contribution approach to develop a model for predicting biodegradation potential under methanogenic anaerobic conditions. The new model has 37 fragments (substructures) and classifies a substance as either fast or slow, relative to the potential to be biodegraded in the "serum bottle" anaerobic biodegradation screening test (Organization for Economic Cooperation and Development Guideline 311). The model correctly classified 90, 77, and 91% of the chemicals in the training set (n = 169) and two independent validation sets (n = 35 and 23), respectively. Accuracy of predictions of fast and slow degradation was equal for training-set chemicals, but fast-degradation predictions were less accurate than slow-degradation predictions for the validation sets. Analysis of the signs of the fragment coefficients for this and the other (aerobic) Biowin models suggests that in the context of simple group contribution models, the majority of positive and negative structural influences on ultimate degradation are the same for aerobic and methanogenic anaerobic biodegradation.

  15. Vfold: a web server for RNA structure and folding thermodynamics prediction.

    Science.gov (United States)

    Xu, Xiaojun; Zhao, Peinan; Chen, Shi-Jie

    2014-01-01

    The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu".

  16. Structured prediction models for RNN based sequence labeling in clinical text.

    Science.gov (United States)

    Jagannatha, Abhyuday N; Yu, Hong

    2016-11-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.

  17. Cognitive factors predicting intentions to search for health information: an application of the theory of planned behaviour.

    Science.gov (United States)

    Austvoll-Dahlgren, Astrid; Falk, Ragnhild S; Helseth, Sølvi

    2012-12-01

    Peoples' ability to obtain health information is a precondition for their effective participation in decision making about health. However, there is limited evidence describing which cognitive factors can predict the intention of people to search for health information. To test the utility of a questionnaire in predicting intentions to search for health information, and to identify important predictors associated with this intention such that these could be targeted in an Intervention. A questionnaire was developed based on the Theory of Planned Behaviour and tested on both a mixed population sample (n=30) and a sample of parents (n = 45). The questionnaire was explored by testing for internal consistency, calculating inter-correlations between theoretically-related constructs, and by using multiple regression analysis. The reliability and validity of the questionnaire were found to be satisfactory and consistent across the two samples. The questionnaires' direct measures prediction of intention was high and accounted for 47% and 55% of the variance in behavioural intentions. Attitudes and perceived behavioural control were identified as important predictors to intention for search for health information. The questionnaire may be a useful tool for understanding and evaluating behavioural intentions and beliefs related to searches for health information. © 2012 The authors. Health Information and Libraries Journal © 2012 Health Libraries Group.

  18. Antibody structural modeling with prediction of immunoglobulin structure (PIGS)

    KAUST Repository

    Marcatili, Paolo

    2014-11-06

    © 2014 Nature America, Inc. All rights reserved. Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (~10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.

  19. Characteristics and Prediction of RNA Structure

    Directory of Open Access Journals (Sweden)

    Hengwu Li

    2014-01-01

    Full Text Available RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA STRAND, which are validated by NMR or X-ray. The length ratios of domains in these sequences are approximately 0.382L, 0.5L, 0.618L, and L, where L is the sequence length. These points are just the important golden sections of sequence. With this characteristic, an algorithm is designed to predict RNA hierarchical structures and simulate RNA folding by dynamically folding RNA structures according to the above golden section points. The sensitivity and number of predicted pseudoknots of our algorithm are better than those of the Mfold, HotKnots, McQfold, ProbKnot, and Lhw-Zhu algorithms. Experimental results reflect the folding rules of RNA from a new angle that is close to natural folding.

  20. Modeling and prediction of human word search behavior in interactive machine translation

    Science.gov (United States)

    Ji, Duo; Yu, Bai; Ma, Bin; Ye, Na

    2017-12-01

    As a kind of computer aided translation method, Interactive Machine Translation technology reduced manual translation repetitive and mechanical operation through a variety of methods, so as to get the translation efficiency, and played an important role in the practical application of the translation work. In this paper, we regarded the behavior of users' frequently searching for words in the translation process as the research object, and transformed the behavior to the translation selection problem under the current translation. The paper presented a prediction model, which is a comprehensive utilization of alignment model, translation model and language model of the searching words behavior. It achieved a highly accurate prediction of searching words behavior, and reduced the switching of mouse and keyboard operations in the users' translation process.

  1. Hidden policy ciphertext-policy attribute-based encryption with keyword search against keyword guessing attack

    Institute of Scientific and Technical Information of China (English)

    Shuo; QIU; Jiqiang; LIU; Yanfeng; SHI; Rui; ZHANG

    2017-01-01

    Attribute-based encryption with keyword search(ABKS) enables data owners to grant their search capabilities to other users by enforcing an access control policy over the outsourced encrypted data. However,existing ABKS schemes cannot guarantee the privacy of the access structures, which may contain some sensitive private information. Furthermore, resulting from the exposure of the access structures, ABKS schemes are susceptible to an off-line keyword guessing attack if the keyword space has a polynomial size. To solve these problems, we propose a novel primitive named hidden policy ciphertext-policy attribute-based encryption with keyword search(HP-CPABKS). With our primitive, the data user is unable to search on encrypted data and learn any information about the access structure if his/her attribute credentials cannot satisfy the access control policy specified by the data owner. We present a rigorous selective security analysis of the proposed HP-CPABKS scheme, which simultaneously keeps the indistinguishability of the keywords and the access structures. Finally,the performance evaluation verifies that our proposed scheme is efficient and practical.

  2. Evolutionary rate variation and RNA secondary structure prediction

    DEFF Research Database (Denmark)

    Knudsen, B.; Andersen, E.S.; Damgaard, C.

    2004-01-01

    Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type...... by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions...

  3. XTALOPT: An open-source evolutionary algorithm for crystal structure prediction

    Science.gov (United States)

    Lonie, David C.; Zurek, Eva

    2011-02-01

    The implementation and testing of XTALOPT, an evolutionary algorithm for crystal structure prediction, is outlined. We present our new periodic displacement (ripple) operator which is ideally suited to extended systems. It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search. This allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions. A continuous workflow, which makes better use of computational resources as compared to traditional generation based algorithms, is employed. Various parameters in XTALOPT are optimized using a novel benchmarking scheme. XTALOPT is available under the GNU Public License, has been interfaced with various codes commonly used to study extended systems, and has an easy to use, intuitive graphical interface. Program summaryProgram title:XTALOPT Catalogue identifier: AEGX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v2.1 or later [1] No. of lines in distributed program, including test data, etc.: 36 849 No. of bytes in distributed program, including test data, etc.: 1 149 399 Distribution format: tar.gz Programming language: C++ Computer: PCs, workstations, or clusters Operating system: Linux Classification: 7.7 External routines: QT [2], OpenBabel [3], AVOGADRO [4], SPGLIB [8] and one of: VASP [5], PWSCF [6], GULP [7]. Nature of problem: Predicting the crystal structure of a system from its stoichiometry alone remains a grand challenge in computational materials science, chemistry, and physics. Solution method: Evolutionary algorithms are stochastic search techniques which use concepts from biological evolution in order to locate the global minimum on their potential energy surface. Our evolutionary algorithm, XTALOPT, is freely

  4. Structure-based methods to predict mutational resistance to diarylpyrimidine non-nucleoside reverse transcriptase inhibitors.

    Science.gov (United States)

    Azeem, Syeda Maryam; Muwonge, Alecia N; Thakkar, Nehaben; Lam, Kristina W; Frey, Kathleen M

    2018-01-01

    Resistance to non-nucleoside reverse transcriptase inhibitors (NNRTIs) is a leading cause of HIV treatment failure. Often included in antiviral therapy, NNRTIs are chemically diverse compounds that bind an allosteric pocket of enzyme target reverse transcriptase (RT). Several new NNRTIs incorporate flexibility in order to compensate for lost interactions with amino acid conferring mutations in RT. Unfortunately, even successful inhibitors such as diarylpyrimidine (DAPY) inhibitor rilpivirine are affected by mutations in RT that confer resistance. In order to aid drug design efforts, it would be efficient and cost effective to pre-evaluate NNRTI compounds in development using a structure-based computational approach. As proof of concept, we applied a residue scan and molecular dynamics strategy using RT crystal structures to predict mutations that confer resistance to DAPYs rilpivirine, etravirine, and investigational microbicide dapivirine. Our predictive values, changes in affinity and stability, are correlative with fold-resistance data for several RT mutants. Consistent with previous studies, mutation K101P is predicted to confer high-level resistance to DAPYs. These findings were further validated using structural analysis, molecular dynamics, and an enzymatic reverse transcription assay. Our results confirm that changes in affinity and stability for mutant complexes are predictive parameters of resistance as validated by experimental and clinical data. In future work, we believe that this computational approach may be useful to predict resistance mutations for inhibitors in development. Published by Elsevier Inc.

  5. Effects prediction guidelines for structures subjected to ground motion

    International Nuclear Information System (INIS)

    1975-07-01

    Part of the planning for an underground nuclear explosion (UNE) is determining the effects of expected ground motion on exposed structures. Because of the many types of structures and the wide variation in ground motion intensity typically encountered, no single prediction method is both adequate and feasible for a complete evaluation. Furthermore, the nature and variability of ground motion and structure damage prescribe effects predictions that are made probabilistically. Initially, prediction for a UNE involves a preliminary assessment of damage to establish overall project feasibility. Subsequent efforts require more detailed damage evaluations, based on structure inventories and analyses of specific structures, so that safety problems can be identified and safety and remedial measures can be recommended. To cover this broad range of effects prediction needs for a typical UNE project, three distinct but interrelated methods have been developed and are described. First, the fundamental practical and theoretical aspects of predicting the effects of dynamic ground motion on structures are summarized. Next, experimentally derived and theoretically determined observations of the behavior of typical structures subjected to ground motion are presented. Then, based on these fundamental considerations and on the observed behavior of structures, the formulation of the three effects prediction procedures is described, along with guidelines regarding their applicability. Example damage predictions for hypothetical UNEs demonstrate these procedures. To aid in identifying the vibration properties of complex structures, one chapter discusses alternatives in vibration testing, instrumentation, and data analysis. Finally, operational guidelines regarding data acquisition procedures, safety criteria, and remedial measures involved in conducting structure effects evaluations are discussed. (U.S.)

  6. Top-k Keyword Search Over Graphs Based On Backward Search

    Directory of Open Access Journals (Sweden)

    Zeng Jia-Hui

    2017-01-01

    Full Text Available Keyword search is one of the most friendly and intuitive information retrieval methods. Using the keyword search to get the connected subgraph has a lot of application in the graph-based cognitive computation, and it is a basic technology. This paper focuses on the top-k keyword searching over graphs. We implemented a keyword search algorithm which applies the backward search idea. The algorithm locates the keyword vertices firstly, and then applies backward search to find rooted trees that contain query keywords. The experiment shows that query time is affected by the iteration number of the algorithm.

  7. A domain-based approach to predict protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Resat Haluk

    2007-06-01

    Full Text Available Abstract Background Knowing which proteins exist in a certain organism or cell type and how these proteins interact with each other are necessary for the understanding of biological processes at the whole cell level. The determination of the protein-protein interaction (PPI networks has been the subject of extensive research. Despite the development of reasonably successful methods, serious technical difficulties still exist. In this paper we present DomainGA, a quantitative computational approach that uses the information about the domain-domain interactions to predict the interactions between proteins. Results DomainGA is a multi-parameter optimization method in which the available PPI information is used to derive a quantitative scoring scheme for the domain-domain pairs. Obtained domain interaction scores are then used to predict whether a pair of proteins interacts. Using the yeast PPI data and a series of tests, we show the robustness and insensitivity of the DomainGA method to the selection of the parameter sets, score ranges, and detection rules. Our DomainGA method achieves very high explanation ratios for the positive and negative PPIs in yeast. Based on our cross-verification tests on human PPIs, comparison of the optimized scores with the structurally observed domain interactions obtained from the iPFAM database, and sensitivity and specificity analysis; we conclude that our DomainGA method shows great promise to be applicable across multiple organisms. Conclusion We envision the DomainGA as a first step of a multiple tier approach to constructing organism specific PPIs. As it is based on fundamental structural information, the DomainGA approach can be used to create potential PPIs and the accuracy of the constructed interaction template can be further improved using complementary methods. Explanation ratios obtained in the reported test case studies clearly show that the false prediction rates of the template networks constructed

  8. Cloud4Psi: cloud computing for 3D protein structure similarity searching.

    Science.gov (United States)

    Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur

    2014-10-01

    Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.

  9. COGNAC: a web server for searching and annotating hydrogen-bonded base interactions in RNA three-dimensional structures.

    Science.gov (United States)

    Firdaus-Raih, Mohd; Hamdani, Hazrina Yusof; Nadzirin, Nurul; Ramlan, Effirul Ikhwan; Willett, Peter; Artymiuk, Peter J

    2014-07-01

    Hydrogen bonds are crucial factors that stabilize a complex ribonucleic acid (RNA) molecule's three-dimensional (3D) structure. Minute conformational changes can result in variations in the hydrogen bond interactions in a particular structure. Furthermore, networks of hydrogen bonds, especially those found in tight clusters, may be important elements in structure stabilization or function and can therefore be regarded as potential tertiary motifs. In this paper, we describe a graph theoretical algorithm implemented as a web server that is able to search for unbroken networks of hydrogen-bonded base interactions and thus provide an accounting of such interactions in RNA 3D structures. This server, COGNAC (COnnection tables Graphs for Nucleic ACids), is also able to compare the hydrogen bond networks between two structures and from such annotations enable the mapping of atomic level differences that may have resulted from conformational changes due to mutations or binding events. The COGNAC server can be accessed at http://mfrlab.org/grafss/cognac. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Dyniqx: a novel meta-search engine for metadata based cross search

    OpenAIRE

    Zhu, Jianhan; Song, Dawei; Eisenstadt, Marc; Barladeanu, Cristi; Rüger, Stefan

    2008-01-01

    The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based cross search. Dyniqx exploits the availability of metadata in academic search services such as PubMed and Google Scholar etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are ...

  11. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs.

    Directory of Open Access Journals (Sweden)

    Michael F Sloma

    2017-11-01

    Full Text Available Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package.

  12. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs.

    Science.gov (United States)

    Sloma, Michael F; Mathews, David H

    2017-11-01

    Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package.

  13. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    Science.gov (United States)

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  14. Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

    International Nuclear Information System (INIS)

    Shen Yang; Bax, Ad

    2007-01-01

    Chemical shifts of nuclei in or attached to a protein backbone are exquisitely sensitive to their local environment. A computer program, SPARTA, is described that uses this correlation with local structure to predict protein backbone chemical shifts, given an input three-dimensional structure, by searching a newly generated database for triplets of adjacent residues that provide the best match in φ/ψ/χ 1 torsion angles and sequence similarity to the query triplet of interest. The database contains 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C' chemical shifts for 200 proteins for which a high resolution X-ray (≤2.4 A) structure is available. The relative importance of the weighting factors for the φ/ψ/χ 1 angles and sequence similarity was optimized empirically. The weighted, average secondary shifts of the central residues in the 20 best-matching triplets, after inclusion of nearest neighbor, ring current, and hydrogen bonding effects, are used to predict chemical shifts for the protein of known structure. Validation shows good agreement between the SPARTA-predicted and experimental shifts, with standard deviations of 2.52, 0.51, 0.27, 0.98, 1.07 and 1.08 ppm for 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C', respectively, including outliers

  15. Direct glycan structure determination of intact N-linked glycopeptides by low-energy collision-induced dissociation tandem mass spectrometry and predicted spectral library searching.

    Science.gov (United States)

    Pai, Pei-Jing; Hu, Yingwei; Lam, Henry

    2016-08-31

    Intact glycopeptide MS analysis to reveal site-specific protein glycosylation is an important frontier of proteomics. However, computational tools for analyzing MS/MS spectra of intact glycopeptides are still limited and not well-integrated into existing workflows. In this work, a new computational tool which combines the spectral library building/searching tool, SpectraST (Lam et al. Nat. Methods2008, 5, 873-875), and the glycopeptide fragmentation prediction tool, MassAnalyzer (Zhang et al. Anal. Chem.2010, 82, 10194-10202) for intact glycopeptide analysis has been developed. Specifically, this tool enables the determination of the glycan structure directly from low-energy collision-induced dissociation (CID) spectra of intact glycopeptides. Given a list of possible glycopeptide sequences as input, a sample-specific spectral library of MassAnalyzer-predicted spectra is built using SpectraST. Glycan identification from CID spectra is achieved by spectral library searching against this library, in which both m/z and intensity information of the possible fragmentation ions are taken into consideration for improved accuracy. We validated our method using a standard glycoprotein, human transferrin, and evaluated its potential to be used in site-specific glycosylation profiling of glycoprotein datasets from LC-MS/MS. In addition, we further applied our method to reveal, for the first time, the site-specific N-glycosylation profile of recombinant human acetylcholinesterase expressed in HEK293 cells. For maximum usability, SpectraST is developed as part of the Trans-Proteomic Pipeline (TPP), a freely available and open-source software suite for MS data analysis. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. A Kernel for Protein Secondary Structure Prediction

    OpenAIRE

    Guermeur , Yann; Lifchitz , Alain; Vert , Régis

    2004-01-01

    http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10338&mode=toc; International audience; Multi-class support vector machines have already proved efficient in protein secondary structure prediction as ensemble methods, to combine the outputs of sets of classifiers based on different principles. In this chapter, their implementation as basic prediction methods, processing the primary structure or the profile of multiple alignments, is investigated. A kernel devoted to the task is in...

  17. Expedite random structure searching using objects from Wyckoff positions

    Science.gov (United States)

    Wang, Shu-Wei; Hsing, Cheng-Rong; Wei, Ching-Ming

    2018-02-01

    Random structure searching has been proved to be a powerful approach to search and find the global minimum and the metastable structures. A true random sampling is in principle needed yet it would be highly time-consuming and/or practically impossible to find the global minimum for the complicated systems in their high-dimensional configuration space. Thus the implementations of reasonable constraints, such as adopting system symmetries to reduce the independent dimension in structural space and/or imposing chemical information to reach and relax into low-energy regions, are the most essential issues in the approach. In this paper, we propose the concept of "object" which is either an atom or composed of a set of atoms (such as molecules or carbonates) carrying a symmetry defined by one of the Wyckoff positions of space group and through this process it allows the searching of global minimum for a complicated system to be confined in a greatly reduced structural space and becomes accessible in practice. We examined several representative materials, including Cd3As2 crystal, solid methanol, high-pressure carbonates (FeCO3), and Si(111)-7 × 7 reconstructed surface, to demonstrate the power and the advantages of using "object" concept in random structure searching.

  18. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  19. RNAstructure: software for RNA secondary structure prediction and analysis.

    Science.gov (United States)

    Reuter, Jessica S; Mathews, David H

    2010-03-15

    To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.

  20. From cheminformatics to structure-based design: Web services and desktop applications based on the NAOMI library.

    Science.gov (United States)

    Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias

    2017-11-10

    Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  1. Linguistic Structure Prediction

    CERN Document Server

    Smith, Noah A

    2011-01-01

    A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. W

  2. Retrospective group fusion similarity search based on eROCE evaluation metric.

    Science.gov (United States)

    Avram, Sorin I; Crisan, Luminita; Bora, Alina; Pacureanu, Liliana M; Avram, Stefana; Kurunczi, Ludovic

    2013-03-01

    In this study, a simple evaluation metric, denoted as eROCE was proposed to measure the early enrichment of predictive methods. We demonstrated the superior robustness of eROCE compared to other known metrics throughout several active to inactive ratios ranging from 1:10 to 1:1000. Group fusion similarity search was investigated by varying 16 similarity coefficients, five molecular representations (binary and non-binary) and two group fusion rules using two reference structure set sizes. We used a dataset of 3478 actives and 43,938 inactive molecules and the enrichment was analyzed by means of eROCE. This retrospective study provides optimal similarity search parameters in the case of ALDH1A1 inhibitors. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-07-01

    Full Text Available Abstract Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes. While in Ab- dataset (antigen-antibody complexes are excluded, there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs. The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine

  4. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Science.gov (United States)

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  5. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China.

    Science.gov (United States)

    Du, Zhicheng; Xu, Lin; Zhang, Wangjian; Zhang, Dingmei; Yu, Shicheng; Hao, Yuantao

    2017-10-06

    Hand, foot, and mouth disease (HFMD) has caused a substantial burden in China, especially in Guangdong Province. Based on the enhanced surveillance system, we aimed to explore whether the addition of temperate and search engine query data improves the risk prediction of HFMD. Ecological study. Information on the confirmed cases of HFMD, climate parameters and search engine query logs was collected. A total of 1.36 million HFMD cases were identified from the surveillance system during 2011-2014. Analyses were conducted at aggregate level and no confidential information was involved. A seasonal autoregressive integrated moving average (ARIMA) model with external variables (ARIMAX) was used to predict the HFMD incidence from 2011 to 2014, taking into account temperature and search engine query data (Baidu Index, BDI). Statistics of goodness-of-fit and precision of prediction were used to compare models (1) based on surveillance data only, and with the addition of (2) temperature, (3) BDI, and (4) both temperature and BDI. A high correlation between HFMD incidence and BDI ( r =0.794, pengine query data significantly improved the prediction of HFMD. Further studies are warranted to examine whether including search engine query data also improves the prediction of other infectious diseases in other settings. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database

    Energy Technology Data Exchange (ETDEWEB)

    Ginzinger, Simon W. [Center of Applied Molecular Engineering, University of Salzburg, Department of Molecular Biology, Division of Bioinformatics (Austria)], E-mail: simon@came.sbg.ac.at; Coles, Murray [Max-Planck-Institute for Developmental Biology, Department of Protein Evolution (Germany)], E-mail: Murray.Coles@tuebingen.mpg.de

    2009-03-15

    We present SimShiftDB, a new program to extract conformational data from protein chemical shifts using structural alignments. The alignments are obtained in searches of a large database containing 13,000 structures and corresponding back-calculated chemical shifts. SimShiftDB makes use of chemical shift data to provide accurate results even in the case of low sequence similarity, and with even coverage of the conformational search space. We compare SimShiftDB to HHSearch, a state-of-the-art sequence-based search tool, and to TALOS, the current standard tool for the task. We show that for a significant fraction of the predicted similarities, SimShiftDB outperforms the other two methods. Particularly, the high coverage afforded by the larger database often allows predictions to be made for residues not involved in canonical secondary structure, where TALOS predictions are both less frequent and more error prone. Thus SimShiftDB can be seen as a complement to currently available methods.

  7. SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database

    International Nuclear Information System (INIS)

    Ginzinger, Simon W.; Coles, Murray

    2009-01-01

    We present SimShiftDB, a new program to extract conformational data from protein chemical shifts using structural alignments. The alignments are obtained in searches of a large database containing 13,000 structures and corresponding back-calculated chemical shifts. SimShiftDB makes use of chemical shift data to provide accurate results even in the case of low sequence similarity, and with even coverage of the conformational search space. We compare SimShiftDB to HHSearch, a state-of-the-art sequence-based search tool, and to TALOS, the current standard tool for the task. We show that for a significant fraction of the predicted similarities, SimShiftDB outperforms the other two methods. Particularly, the high coverage afforded by the larger database often allows predictions to be made for residues not involved in canonical secondary structure, where TALOS predictions are both less frequent and more error prone. Thus SimShiftDB can be seen as a complement to currently available methods

  8. Prediction of the Secondary Structure of HIV-1 gp120

    DEFF Research Database (Denmark)

    Hansen, Jan; Lund, Ole; Nielsen, Jens O.

    1996-01-01

    Fourier transform infrared spectroscopy. The predicted secondary structure of gp120 compared well with data from NMR analysis of synthetic peptides from the V3 loop and the C4 region. As a first step towards modeling the tertiary structure of gp120, the predicted secondary structure may guide the design......The secondary structure of HIV-1 gp120 was predicted using multiple alignment and a combination of two independent methods based on neural network and nearest-neighbor algorithms. The methods agreed on the secondary structure for 80% of the residues in BH10 gp120. Six helices were predicted in HIV...

  9. SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations.

    Directory of Open Access Journals (Sweden)

    Steven N Hart

    Full Text Available BACKGROUND: Structural variation (SV represents a significant, yet poorly understood contribution to an individual's genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. RESULTS: We developed and validated SoftSearch using real and synthetic datasets. SoftSearch's key features are 1 not requiring secondary (or exhaustive primary alignment, 2 portability into established sequencing workflows, and 3 is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.. SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. CONCLUSIONS: We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

  10. Gas Emission Prediction Model of Coal Mine Based on CSBP Algorithm

    Directory of Open Access Journals (Sweden)

    Xiong Yan

    2016-01-01

    Full Text Available In view of the nonlinear characteristics of gas emission in a coal working face, a prediction method is proposed based on cuckoo search algorithm optimized BP neural network (CSBP. In the CSBP algorithm, the cuckoo search is adopted to optimize weight and threshold parameters of BP network, and obtains the global optimal solutions. Furthermore, the twelve main affecting factors of the gas emission in the coal working face are taken as input vectors of CSBP algorithm, the gas emission is acted as output vector, and then the prediction model of BP neural network with optimal parameters is established. The results show that the CSBP algorithm has batter generalization ability and higher prediction accuracy, and can be utilized effectively in the prediction of coal mine gas emission.

  11. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    Science.gov (United States)

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  12. Quantitative structure-activity relationship (QSAR) for insecticides: development of predictive in vivo insecticide activity models.

    Science.gov (United States)

    Naik, P K; Singh, T; Singh, H

    2009-07-01

    Quantitative structure-activity relationship (QSAR) analyses were performed independently on data sets belonging to two groups of insecticides, namely the organophosphates and carbamates. Several types of descriptors including topological, spatial, thermodynamic, information content, lead likeness and E-state indices were used to derive quantitative relationships between insecticide activities and structural properties of chemicals. A systematic search approach based on missing value, zero value, simple correlation and multi-collinearity tests as well as the use of a genetic algorithm allowed the optimal selection of the descriptors used to generate the models. The QSAR models developed for both organophosphate and carbamate groups revealed good predictability with r(2) values of 0.949 and 0.838 as well as [image omitted] values of 0.890 and 0.765, respectively. In addition, a linear correlation was observed between the predicted and experimental LD(50) values for the test set data with r(2) of 0.871 and 0.788 for both the organophosphate and carbamate groups, indicating that the prediction accuracy of the QSAR models was acceptable. The models were also tested successfully from external validation criteria. QSAR models developed in this study should help further design of novel potent insecticides.

  13. Physics-Based Hazard Assessment for Critical Structures Near Large Earthquake Sources

    Science.gov (United States)

    Hutchings, L.; Mert, A.; Fahjan, Y.; Novikova, T.; Golara, A.; Miah, M.; Fergany, E.; Foxall, W.

    2017-09-01

    We argue that for critical structures near large earthquake sources: (1) the ergodic assumption, recent history, and simplified descriptions of the hazard are not appropriate to rely on for earthquake ground motion prediction and can lead to a mis-estimation of the hazard and risk to structures; (2) a physics-based approach can address these issues; (3) a physics-based source model must be provided to generate realistic phasing effects from finite rupture and model near-source ground motion correctly; (4) wave propagations and site response should be site specific; (5) a much wider search of possible sources of ground motion can be achieved computationally with a physics-based approach; (6) unless one utilizes a physics-based approach, the hazard and risk to structures has unknown uncertainties; (7) uncertainties can be reduced with a physics-based approach, but not with an ergodic approach; (8) computational power and computer codes have advanced to the point that risk to structures can be calculated directly from source and site-specific ground motions. Spanning the variability of potential ground motion in a predictive situation is especially difficult for near-source areas, but that is the distance at which the hazard is the greatest. The basis of a "physical-based" approach is ground-motion syntheses derived from physics and an understanding of the earthquake process. This is an overview paper and results from previous studies are used to make the case for these conclusions. Our premise is that 50 years of strong motion records is insufficient to capture all possible ranges of site and propagation path conditions, rupture processes, and spatial geometric relationships between source and site. Predicting future earthquake scenarios is necessary; models that have little or no physical basis but have been tested and adjusted to fit available observations can only "predict" what happened in the past, which should be considered description as opposed to prediction

  14. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  15. A predictive framework for evaluating models of semantic organization in free recall

    Science.gov (United States)

    Morton, Neal W; Polyn, Sean M.

    2016-01-01

    Research in free recall has demonstrated that semantic associations reliably influence the organization of search through episodic memory. However, the specific structure of these associations and the mechanisms by which they influence memory search remain unclear. We introduce a likelihood-based model-comparison technique, which embeds a model of semantic structure within the context maintenance and retrieval (CMR) model of human memory search. Within this framework, model variants are evaluated in terms of their ability to predict the specific sequence in which items are recalled. We compare three models of semantic structure, latent semantic analysis (LSA), global vectors (GloVe), and word association spaces (WAS), and find that models using WAS have the greatest predictive power. Furthermore, we find evidence that semantic and temporal organization is driven by distinct item and context cues, rather than a single context cue. This finding provides important constraint for theories of memory search. PMID:28331243

  16. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  17. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

    DEFF Research Database (Denmark)

    Will, Sebastian; Joshi, Tejal; Hofacker, Ivo L.

    2012-01-01

    Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing...... methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based...... on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used nc...

  18. A real-time all-atom structural search engine for proteins.

    Science.gov (United States)

    Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F

    2014-07-01

    Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new "designability"-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).

  19. Structural MRI-Based Predictions in Patients with Treatment-Refractory Depression (TRD.

    Directory of Open Access Journals (Sweden)

    Blair A Johnston

    Full Text Available The application of machine learning techniques to psychiatric neuroimaging offers the possibility to identify robust, reliable and objective disease biomarkers both within and between contemporary syndromal diagnoses that could guide routine clinical practice. The use of quantitative methods to identify psychiatric biomarkers is consequently important, particularly with a view to making predictions relevant to individual patients, rather than at a group-level. Here, we describe predictions of treatment-refractory depression (TRD diagnosis using structural T1-weighted brain scans obtained from twenty adult participants with TRD and 21 never depressed controls. We report 85% accuracy of individual subject diagnostic prediction. Using an automated feature selection method, the major brain regions supporting this significant classification were in the caudate, insula, habenula and periventricular grey matter. It was not, however, possible to predict the degree of 'treatment resistance' in individual patients, at least as quantified by the Massachusetts General Hospital (MGH-S clinical staging method; but the insula was again identified as a region of interest. Structural brain imaging data alone can be used to predict diagnostic status, but not MGH-S staging, with a high degree of accuracy in patients with TRD.

  20. PSPP: a protein structure prediction pipeline for computing clusters.

    Directory of Open Access Journals (Sweden)

    Michael S Lee

    2009-07-01

    Full Text Available Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes.The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform

  1. HTTP-based Search and Ordering Using ECHO's REST-based and OpenSearch APIs

    Science.gov (United States)

    Baynes, K.; Newman, D. J.; Pilone, D.

    2012-12-01

    Metadata is an important entity in the process of cataloging, discovering, and describing Earth science data. NASA's Earth Observing System (EOS) ClearingHOuse (ECHO) acts as the core metadata repository for EOSDIS data centers, providing a centralized mechanism for metadata and data discovery and retrieval. By supporting both the ESIP's Federated Search API and its own search and ordering interfaces, ECHO provides multiple capabilities that facilitate ease of discovery and access to its ever-increasing holdings. Users are able to search and export metadata in a variety of formats including ISO 19115, json, and ECHO10. This presentation aims to inform technically savvy clients interested in automating search and ordering of ECHO's metadata catalog. The audience will be introduced to practical and applicable examples of end-to-end workflows that demonstrate finding, sub-setting and ordering data that is bound by keyword, temporal and spatial constraints. Interaction with the ESIP OpenSearch Interface will be highlighted, as will ECHO's own REST-based API.

  2. Using internet search queries to predict human mobility in social events

    DEFF Research Database (Denmark)

    Borysov, Stanislav; Lourenco, Mariana; Rodrigues, Filipe

    2016-01-01

    While our transport systems are generally designed for habitual behavior, the dynamics of large and mega cities systematically push it to its limits. Particularly, transport planning and operations in large events are well known to be a challenge. Not only they imply stress to the system...... on an irregular basis, their associated mobility behavior is also difficult to predict. Previous studies have shown a strong correlation between number of public transport arrivals with the semi-structured data mined from online announcement websites. However, these models tend to be complex in form and demand...... substantial information retrieval, extraction and data cleaning work, and so they are difficult to generalize from city to city. In contrast, this paper focuses on enriching previously mined information about special events using automated web search queries. Since this context data comes in unstructured...

  3. SHOP: scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known...... scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures...

  4. Protein structure prediction using bee colony optimization metaheuristic

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

    2010-01-01

    of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... our BCO method to generate good solutions to the protein structure prediction problem. The results show that BCO generally ¿nds better solutions than simulated annealing which so far has been the metaheuristic of choice for this problem....

  5. Prediction of Global Damage and Reliability Based Upon Sequential Identification and Updating of RC Structures Subject to Earthquakes

    DEFF Research Database (Denmark)

    Nielsen, Søren R.K.; Skjærbæk, P. S.; Köylüoglu, H. U.

    The paper deals with the prediction of global damage and future structural reliability with special emphasis on sensitivity, bias and uncertainty of these predictions dependent on the statistically equivalent realizations of the future earthquake. The predictions are based on a modified Clough......-Johnston single-degree-of-freedom (SDOF) oscillator with three parameters which are calibrated to fit the displacement response and the damage development in the past earthquake....

  6. Search performance is better predicted by tileability than presence of a unique basic feature

    Science.gov (United States)

    Chang, Honghua; Rosenholtz, Ruth

    2016-01-01

    Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a “basic feature” not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search. PMID:27548090

  7. Model-Based Prediction of Pulsed Eddy Current Testing Signals from Stratified Conductive Structures

    International Nuclear Information System (INIS)

    Zhang, Jian Hai; Song, Sung Jin; Kim, Woong Ji; Kim, Hak Joon; Chung, Jong Duk

    2011-01-01

    Excitation and propagation of electromagnetic field of a cylindrical coil above an arbitrary number of conductive plates for pulsed eddy current testing(PECT) are very complex problems due to their complicated physical properties. In this paper, analytical modeling of PECT is established by Fourier series based on truncated region eigenfunction expansion(TREE) method for a single air-cored coil above stratified conductive structures(SCS) to investigate their integrity. From the presented expression of PECT, the coil impedance due to SCS is calculated based on analytical approach using the generalized reflection coefficient in series form. Then the multilayered structures manufactured by non-ferromagnetic (STS301L) and ferromagnetic materials (SS400) are investigated by the developed PECT model. Good prediction of analytical model of PECT not only contributes to the development of an efficient solver but also can be applied to optimize the conditions of experimental setup in PECT

  8. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    Science.gov (United States)

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  9. MicroRNA prediction using a fixed-order Markov model based on the secondary structure pattern.

    Directory of Open Access Journals (Sweden)

    Wei Shen

    Full Text Available Predicting miRNAs is an arduous task, due to the diversity of the precursors and complexity of enzyme processes. Although several prediction approaches have reached impressive performances, few of them could achieve a full-function recognition of mature miRNA directly from the candidate hairpins across species. Therefore, researchers continue to seek a more powerful model close to biological recognition to miRNA structure. In this report, we describe a novel miRNA prediction algorithm, known as FOMmiR, using a fixed-order Markov model based on the secondary structural pattern. For a training dataset containing 809 human pre-miRNAs and 6441 human pseudo-miRNA hairpins, the model's parameters were defined and evaluated. The results showed that FOMmiR reached 91% accuracy on the human dataset through 5-fold cross-validation. Moreover, for the independent test datasets, the FOMmiR presented an outstanding prediction in human and other species including vertebrates, Drosophila, worms and viruses, even plants, in contrast to the well-known algorithms and models. Especially, the FOMmiR was not only able to distinguish the miRNA precursors from the hairpins, but also locate the position and strand of the mature miRNA. Therefore, this study provides a new generation of miRNA prediction algorithm, which successfully realizes a full-function recognition of the mature miRNAs directly from the hairpin sequences. And it presents a new understanding of the biological recognition based on the strongest signal's location detected by FOMmiR, which might be closely associated with the enzyme cleavage mechanism during the miRNA maturation.

  10. Differential Search Coils Based Magnetometers: Conditioning, Magnetic Sensitivity, Spatial Resolution

    Directory of Open Access Journals (Sweden)

    Timofeeva Maria

    2012-03-01

    Full Text Available A theoretical and experimental comparison of optimized search coils based magnetometers, operating either in the Flux mode or in the classical Lenz-Faraday mode, is presented. The improvements provided by the Flux mode in terms of bandwidth and measuring range of the sensor are detailed. Theory, SPICE model and measurements are in good agreement. The spatial resolution of the sensor is studied which is an important parameter for applications in non destructive evaluation. A general expression of the magnetic sensitivity of search coils sensors is derived. Solutions are proposed to design magnetometers with reduced weight and volume without degrading the magnetic sensitivity. An original differential search coil based magnetometer, made of coupled coils, operating in flux mode and connected to a differential transimpedance amplifier is proposed. It is shown that this structure is better in terms of volume occupancy than magnetometers using two separated coils without any degradation in magnetic sensitivity. Experimental results are in good agreement with calculations.

  11. Concomitant prediction of function and fold at the domain level with GO-based profiles.

    Science.gov (United States)

    Lopez, Daniel; Pazos, Florencio

    2013-01-01

    Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.

  12. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    Science.gov (United States)

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  13. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

    Science.gov (United States)

    Rysavy, Steven J; Beck, David A C; Daggett, Valerie

    2014-11-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.

  14. Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.

    Science.gov (United States)

    Balfer, Jenny; Hu, Ye; Bajorath, Jürgen

    2014-08-01

    Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Mathematical programming solver based on local search

    CERN Document Server

    Gardi, Frédéric; Darlay, Julien; Estellon, Bertrand; Megel, Romain

    2014-01-01

    This book covers local search for combinatorial optimization and its extension to mixed-variable optimization. Although not yet understood from the theoretical point of view, local search is the paradigm of choice for tackling large-scale real-life optimization problems. Today's end-users demand interactivity with decision support systems. For optimization software, this means obtaining good-quality solutions quickly. Fast iterative improvement methods, like local search, are suited to satisfying such needs. Here the authors show local search in a new light, in particular presenting a new kind of mathematical programming solver, namely LocalSolver, based on neighborhood search. First, an iconoclast methodology is presented to design and engineer local search algorithms. The authors' concern about industrializing local search approaches is of particular interest for practitioners. This methodology is applied to solve two industrial problems with high economic stakes. Software based on local search induces ex...

  16. Cooperative mobile agents search using beehive partitioned structure and Tabu Random search algorithm

    Science.gov (United States)

    Ramazani, Saba; Jackson, Delvin L.; Selmic, Rastko R.

    2013-05-01

    In search and surveillance operations, deploying a team of mobile agents provides a robust solution that has multiple advantages over using a single agent in efficiency and minimizing exploration time. This paper addresses the challenge of identifying a target in a given environment when using a team of mobile agents by proposing a novel method of mapping and movement of agent teams in a cooperative manner. The approach consists of two parts. First, the region is partitioned into a hexagonal beehive structure in order to provide equidistant movements in every direction and to allow for more natural and flexible environment mapping. Additionally, in search environments that are partitioned into hexagons, mobile agents have an efficient travel path while performing searches due to this partitioning approach. Second, we use a team of mobile agents that move in a cooperative manner and utilize the Tabu Random algorithm to search for the target. Due to the ever-increasing use of robotics and Unmanned Aerial Vehicle (UAV) platforms, the field of cooperative multi-agent search has developed many applications recently that would benefit from the use of the approach presented in this work, including: search and rescue operations, surveillance, data collection, and border patrol. In this paper, the increased efficiency of the Tabu Random Search algorithm method in combination with hexagonal partitioning is simulated, analyzed, and advantages of this approach are presented and discussed.

  17. Unbiased structural search of small copper clusters within DFT

    Energy Technology Data Exchange (ETDEWEB)

    Cogollo-Olivo, Beatriz H., E-mail: bcogolloo@unicartagena.edu.co [Maestría en Ciencias Físicas, Universidad de Cartagena, 130001 Cartagena de Indias, Bolívar (Colombia); Seriani, Nicola, E-mail: nseriani@ictp.it [Condensed Matter and Statistical Physics Section, The Abdus Salam ICTP, Strada Costiera 11, 34151 Trieste (Italy); Montoya, Javier A., E-mail: jmontoyam@unicartagena.edu.co [Instituto de Matemáticas Aplicadas, Universidad de Cartagena, 130001 Cartagena de Indias, Bolívar (Colombia); Associates Program, The Abdus Salam ICTP, Strada Costiera 11, 34151 Trieste (Italy)

    2015-11-05

    Highlights: • We have been able to identify novel metastable structures for small Cu clusters. • We have shown that a linear structure reported for Cu{sub 3} is actually a local maximum. • Some of the structures reported in literature are actually unstable within DFT. • Some of the isomer structures found shows the limits of educated guesses. - Abstract: The atomic structure of small Cu clusters with 3–6 atoms has been investigated by density functional theory and random search algorithm. New metastable structures have been found that lie merely tens of meV/atom above the corresponding ground state, and could therefore be present at thermodynamic equilibrium at room temperature or slightly above. Moreover, we show that the previously proposed linear configuration for Cu{sub 3} is in fact a local maximum of the energy. Finally, we argue that the random search algorithm also provides qualitative information about the attraction basin of each structure in the energy landscape.

  18. Unbiased structural search of small copper clusters within DFT

    International Nuclear Information System (INIS)

    Cogollo-Olivo, Beatriz H.; Seriani, Nicola; Montoya, Javier A.

    2015-01-01

    Highlights: • We have been able to identify novel metastable structures for small Cu clusters. • We have shown that a linear structure reported for Cu_3 is actually a local maximum. • Some of the structures reported in literature are actually unstable within DFT. • Some of the isomer structures found shows the limits of educated guesses. - Abstract: The atomic structure of small Cu clusters with 3–6 atoms has been investigated by density functional theory and random search algorithm. New metastable structures have been found that lie merely tens of meV/atom above the corresponding ground state, and could therefore be present at thermodynamic equilibrium at room temperature or slightly above. Moreover, we show that the previously proposed linear configuration for Cu_3 is in fact a local maximum of the energy. Finally, we argue that the random search algorithm also provides qualitative information about the attraction basin of each structure in the energy landscape.

  19. Signature-based global searches at CDF

    International Nuclear Information System (INIS)

    Hocker, James Andrew

    2008-01-01

    Data collected in Run II of the Fermilab Tevatron are searched for indications of new electroweak scale physics. Rather than focusing on particular new physics scenarios, CDF data are analyzed for discrepancies with respect to the Standard Model prediction. Gross features of the data, mass bumps, and significant excesses of events with large summed transverse momentum are examined in a model-independent and quasi-model-independent approach. This global search for new physics in over three hundred exclusive final states in 2 fb -1 of p(bar p) collisions at √s = 1.96 TeV reveals no significant indication of physics beyond the Standard Model

  20. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

    Science.gov (United States)

    Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

    2009-04-21

    In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

  1. Global search in photoelectron diffraction structure determination using genetic algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Viana, M L [Departamento de Fisica, Icex, UFMG, Belo Horizonte, Minas Gerais (Brazil); Muino, R Diez [Donostia International Physics Center DIPC, Paseo Manuel de Lardizabal 4, 20018 San Sebastian (Spain); Soares, E A [Departamento de Fisica, Icex, UFMG, Belo Horizonte, Minas Gerais (Brazil); Hove, M A Van [Department of Physics and Materials Science, City University of Hong Kong, Hong Kong (China); Carvalho, V E de [Departamento de Fisica, Icex, UFMG, Belo Horizonte, Minas Gerais (Brazil)

    2007-11-07

    Photoelectron diffraction (PED) is an experimental technique widely used to perform structural determinations of solid surfaces. Similarly to low-energy electron diffraction (LEED), structural determination by PED requires a fitting procedure between the experimental intensities and theoretical results obtained through simulations. Multiple scattering has been shown to be an effective approach for making such simulations. The quality of the fit can be quantified through the so-called R-factor. Therefore, the fitting procedure is, indeed, an R-factor minimization problem. However, the topography of the R-factor as a function of the structural and non-structural surface parameters to be determined is complex, and the task of finding the global minimum becomes tough, particularly for complex structures in which many parameters have to be adjusted. In this work we investigate the applicability of the genetic algorithm (GA) global optimization method to this problem. The GA is based on the evolution of species, and makes use of concepts such as crossover, elitism and mutation to perform the search. We show results of its application in the structural determination of three different systems: the Cu(111) surface through the use of energy-scanned experimental curves; the Ag(110)-c(2 x 2)-Sb system, in which a theory-theory fit was performed; and the Ag(111) surface for which angle-scanned experimental curves were used. We conclude that the GA is a highly efficient method to search for global minima in the optimization of the parameters that best fit the experimental photoelectron diffraction intensities to the theoretical ones.

  2. IBRI-CASONTO: Ontology-based semantic search engine

    Directory of Open Access Journals (Sweden)

    Awny Sayed

    2017-11-01

    Full Text Available The vast availability of information, that added in a very fast pace, in the data repositories creates a challenge in extracting correct and accurate information. Which has increased the competition among developers in order to gain access to technology that seeks to understand the intent researcher and contextual meaning of terms. While the competition for developing an Arabic Semantic Search systems are still in their infancy, and the reason could be traced back to the complexity of Arabic Language. It has a complex morphological, grammatical and semantic aspects, as it is a highly inflectional and derivational language. In this paper, we try to highlight and present an Ontological Search Engine called IBRI-CASONTO for Colleges of Applied Sciences, Oman. Our proposed engine supports both Arabic and English language. It is also employed two types of search which are a keyword-based search and a semantics-based search. IBRI-CASONTO is based on different technologies such as Resource Description Framework (RDF data and Ontological graph. The experiments represent in two sections, first it shows a comparison among Entity-Search and the Classical-Search inside the IBRI-CASONTO itself, second it compares the Entity-Search of IBRI-CASONTO with currently used search engines, such as Kngine, Wolfram Alpha and the most popular engine nowadays Google, in order to measure their performance and efficiency.

  3. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Amy L Bauer

    2010-11-01

    Full Text Available An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF. Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

  4. Exploiting the Past and the Future in Protein Secondary Structure Prediction

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren; Frasconi, P

    1999-01-01

    predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary......Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network...

  5. BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

    Directory of Open Access Journals (Sweden)

    Jiang Hualiang

    2010-01-01

    Full Text Available Abstract Background Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology. Results Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function, which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins. Conclusions This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.

  6. Automated Clustering Analysis of Immunoglobulin Sequences in Chronic Lymphocytic Leukemia Based on 3D Structural Descriptors

    DEFF Research Database (Denmark)

    Marcatili, Paolo; Mochament, Konstantinos; Agathangelidis, Andreas

    2016-01-01

    study, we used the structure prediction tools PIGS and I-TASSER for creating the 3D models and the TM-align algorithm to superpose them. The innovation of the current methodology resides in the usage of methods adapted from 3D content-based search methodologies to determine the local structural...... determine it are extremely laborious and demanding. Hence, the ability to gain insight into the structure of Igs at large relies on the availability of tools and algorithms for producing accurate Ig structural models based on their primary sequence alone. These models can then be used to determine...... to achieve an optimal solution to this task yet their results were hindered mainly due to the lack of efficient clustering methods based on the similarity of 3D structure descriptors. Here, we present a novel workflow for robust Ig 3D modeling and automated clustering. We validated our protocol in chronic...

  7. Literature-based condition-specific miRNA-mRNA target prediction.

    Directory of Open Access Journals (Sweden)

    Minsik Oh

    Full Text Available miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction

  8. Structure-activity relationship study of oxindole-based inhibitors of cyclin-dependent kinases based on least-squares support vector machines

    International Nuclear Information System (INIS)

    Li Jiazhong; Liu Huanxiang; Yao Xiaojun; Liu Mancang; Hu Zhide; Fan Botao

    2007-01-01

    The least-squares support vector machines (LS-SVMs), as an effective modified algorithm of support vector machine, was used to build structure-activity relationship (SAR) models to classify the oxindole-based inhibitors of cyclin-dependent kinases (CDKs) based on their activity. Each compound was depicted by the structural descriptors that encode constitutional, topological, geometrical, electrostatic and quantum-chemical features. The forward-step-wise linear discriminate analysis method was used to search the descriptor space and select the structural descriptors responsible for activity. The linear discriminant analysis (LDA) and nonlinear LS-SVMs method were employed to build classification models, and the best results were obtained by the LS-SVMs method with prediction accuracy of 100% on the test set and 90.91% for CDK1 and CDK2, respectively, as well as that of LDA models 95.45% and 86.36%. This paper provides an effective method to screen CDKs inhibitors

  9. Structural changes and out-of-sample prediction of realized range-based variance in the stock market

    Science.gov (United States)

    Gong, Xu; Lin, Boqiang

    2018-03-01

    This paper aims to examine the effects of structural changes on forecasting the realized range-based variance in the stock market. Considering structural changes in variance in the stock market, we develop the HAR-RRV-SC model on the basis of the HAR-RRV model. Subsequently, the HAR-RRV and HAR-RRV-SC models are used to forecast the realized range-based variance of S&P 500 Index. We find that there are many structural changes in variance in the U.S. stock market, and the period after the financial crisis contains more structural change points than the period before the financial crisis. The out-of-sample results show that the HAR-RRV-SC model significantly outperforms the HAR-BV model when they are employed to forecast the 1-day, 1-week, and 1-month realized range-based variances, which means that structural changes can improve out-of-sample prediction of realized range-based variance. The out-of-sample results remain robust across the alternative rolling fixed-window, the alternative threshold value in ICSS algorithm, and the alternative benchmark models. More importantly, we believe that considering structural changes can help improve the out-of-sample performances of most of other existing HAR-RRV-type models in addition to the models used in this paper.

  10. Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens

    Directory of Open Access Journals (Sweden)

    Gomez Shawn M

    2010-04-01

    Full Text Available Abstract Background In the course of infection, viruses such as HIV-1 must enter a cell, travel to sites where they can hijack host machinery to transcribe their genes and translate their proteins, assemble, and then leave the cell again, all while evading the host immune system. Thus, successful infection depends on the pathogen's ability to manipulate the biological pathways and processes of the organism it infects. Interactions between HIV-encoded and human proteins provide one means by which HIV-1 can connect into cellular pathways to carry out these survival processes. Results We developed and applied a computational approach to predict interactions between HIV and human proteins based on structural similarity of 9 HIV-1 proteins to human proteins having known interactions. Using functional data from RNAi studies as a filter, we generated over 2000 interaction predictions between HIV proteins and 406 unique human proteins. Additional filtering based on Gene Ontology cellular component annotation reduced the number of predictions to 502 interactions involving 137 human proteins. We find numerous known interactions as well as novel interactions showing significant functional relevance based on supporting Gene Ontology and literature evidence. Conclusions Understanding the interplay between HIV-1 and its human host will help in understanding the viral lifecycle and the ways in which this virus is able to manipulate its host. The results shown here provide a potential set of interactions that are amenable to further experimental manipulation as well as potential targets for therapeutic intervention.

  11. Computer-based literature search in medical institutions in India

    Directory of Open Access Journals (Sweden)

    Kalita Jayantee

    2007-01-01

    Full Text Available Aim: To study the use of computer-based literature search and its application in clinical training and patient care as a surrogate marker of evidence-based medicine. Materials and Methods: A questionnaire comprising of questions on purpose (presentation, patient management, research, realm (site accessed, nature and frequency of search, effect, infrastructure, formal training in computer based literature search and suggestions for further improvement were sent to residents and faculty of a Postgraduate Medical Institute (PGI and a Medical College. The responses were compared amongst different subgroups of respondents. Results: Out of 300 subjects approached 194 responded; of whom 103 were from PGI and 91 from Medical College. There were 97 specialty residents, 58 super-specialty residents and 39 faculty members. Computer-based literature search was done at least once a month by 89% though there was marked variability in frequency and extent. The motivation for computer-based literature search was for presentation in 90%, research in 65% and patient management in 60.3%. The benefit of search was acknowledged in learning and teaching by 80%, research by 65% and patient care by 64.4% of respondents. Formal training in computer based literature search was received by 41% of whom 80% were residents. Residents from PGI did more frequent and more extensive computer-based literature search, which was attributed to better infrastructure and training. Conclusion: Training and infrastructure both are crucial for computer-based literature search, which may translate into evidence based medicine.

  12. G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.

    Science.gov (United States)

    Lee, Hui Sun; Im, Wonpil

    2017-01-01

    Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.

  13. (PS)2: protein structure prediction server version 3.0.

    Science.gov (United States)

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  15. Multi-objective Search-based Mobile Testing

    OpenAIRE

    Mao, K.

    2017-01-01

    Despite the tremendous popularity of mobile applications, mobile testing still relies heavily on manual testing. This thesis presents mobile test automation approaches based on multi-objective search. We introduce three approaches: Sapienz (for native Android app testing), Octopuz (for hybrid/web JavaScript app testing) and Polariz (for using crowdsourcing to support search-based mobile testing). These three approaches represent the primary scientific and technical contributions of the thesis...

  16. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Aboul-Magd Mohammed O

    2009-07-01

    Full Text Available Abstract Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures from primary sequence data which makes use of Parallel Cascade Identification (PCI, a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input

  17. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    Science.gov (United States)

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  18. Prediction of RNA secondary structure using generalized centroid estimators.

    Science.gov (United States)

    Hamada, Michiaki; Kiryu, Hisanori; Sato, Kengo; Mituyama, Toutai; Asai, Kiyoshi

    2009-02-15

    Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics. Supporting information and the CentroidFold software are available online at: http://www.ncrna.org/software/centroidfold/.

  19. Protein Loop Structure Prediction Using Conformational Space Annealing.

    Science.gov (United States)

    Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung

    2017-05-22

    We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.

  20. Causal gene identification using combinatorial V-structure search.

    Science.gov (United States)

    Cai, Ruichu; Zhang, Zhenjie; Hao, Zhifeng

    2013-07-01

    With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models

    Directory of Open Access Journals (Sweden)

    Borodovsky Mark

    2006-03-01

    Full Text Available Abstract Background The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present. Results In this paper, we further refine and extend the hidden semi-Markov model (HSMM initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition. Conclusions We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable

  2. Alephweb: a search engine based on the federated structure ...

    African Journals Online (AJOL)

    Revue d'Information Scientifique et Technique. Journal Home · ABOUT THIS JOURNAL · Advanced Search · Current Issue · Archives · Journal Home > Vol 7, No 1 (1997) >. Log in or Register to get access to full text downloads.

  3. Size-based predictions of food web patterns

    DEFF Research Database (Denmark)

    Zhang, Lai; Hartvig, Martin; Knudsen, Kim

    2014-01-01

    We employ size-based theoretical arguments to derive simple analytic predictions of ecological patterns and properties of natural communities: size-spectrum exponent, maximum trophic level, and susceptibility to invasive species. The predictions are brought about by assuming that an infinite number...... of species are continuously distributed on a size-trait axis. It is, however, an open question whether such predictions are valid for a food web with a finite number of species embedded in a network structure. We address this question by comparing the size-based predictions to results from dynamic food web...... simulations with varying species richness. To this end, we develop a new size- and trait-based food web model that can be simplified into an analytically solvable size-based model. We confirm existing solutions for the size distribution and derive novel predictions for maximum trophic level and invasion...

  4. Category Theory Approach to Solution Searching Based on Photoexcitation Transfer Dynamics

    Directory of Open Access Journals (Sweden)

    Makoto Naruse

    2017-07-01

    Full Text Available Solution searching that accompanies combinatorial explosion is one of the most important issues in the age of artificial intelligence. Natural intelligence, which exploits natural processes for intelligent functions, is expected to help resolve or alleviate the difficulties of conventional computing paradigms and technologies. In fact, we have shown that a single-celled organism such as an amoeba can solve constraint satisfaction problems and related optimization problems as well as demonstrate experimental systems based on non-organic systems such as optical energy transfer involving near-field interactions. However, the fundamental mechanisms and limitations behind solution searching based on natural processes have not yet been understood. Herein, we present a theoretical background of solution searching based on optical excitation transfer from a category-theoretic standpoint. One important indication inspired by the category theory is that the satisfaction of short exact sequences is critical for an adequate computational operation that determines the flow of time for the system and is termed as “short-exact-sequence-based time.” In addition, the octahedral and braid structures known in triangulated categories provide a clear understanding of the underlying mechanisms, including a quantitative indication of the difficulties of obtaining solutions based on homology dimension. This study contributes to providing a fundamental background of natural intelligence.

  5. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    Science.gov (United States)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  6. Prediction of CO2 Emission in China’s Power Generation Industry with Gauss Optimized Cuckoo Search Algorithm and Wavelet Neural Network Based on STIRPAT model with Ridge Regression

    Directory of Open Access Journals (Sweden)

    Weibo Zhao

    2017-12-01

    Full Text Available Power generation industry is the key industry of carbon dioxide (CO2 emission in China. Assessing its future CO2 emissions is of great significance to the formulation and implementation of energy saving and emission reduction policies. Based on the Stochastic Impacts by Regression on Population, Affluence and Technology model (STIRPAT, the influencing factors analysis model of CO2 emission of power generation industry is established. The ridge regression (RR method is used to estimate the historical data. In addition, a wavelet neural network (WNN prediction model based on Cuckoo Search algorithm optimized by Gauss (GCS is put forward to predict the factors in the STIRPAT model. Then, the predicted values are substituted into the regression model, and the CO2 emission estimation values of the power generation industry in China are obtained. It’s concluded that population, per capita Gross Domestic Product (GDP, standard coal consumption and thermal power specific gravity are the key factors affecting the CO2 emission from the power generation industry. Besides, the GCS-WNN prediction model has higher prediction accuracy, comparing with other models. Moreover, with the development of science and technology in the future, the CO2 emission growth in the power generation industry will gradually slow down according to the prediction results.

  7. Efficient and accurate Greedy Search Methods for mining functional modules in protein interaction networks.

    Science.gov (United States)

    He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei

    2012-06-25

    Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the

  8. Search for an interstellar Si2C molecule: A theoretical prediction

    Indian Academy of Sciences (India)

    63, No. 3. — journal of. September 2004 physics pp. 627–631. Search for an interstellar Si2C molecule: A theoretical prediction. SURESH CHANDRA. School of ... top molecule as its electric dipole moment µ lies along the axis of intermediate moment of inertia. Because of differences between the molecular parameters of.

  9. Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

    Science.gov (United States)

    Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

    2018-01-01

    T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95

  10. Study of Fuze Structure and Reliability Design Based on the Direct Search Method

    Science.gov (United States)

    Lin, Zhang; Ning, Wang

    2017-03-01

    Redundant design is one of the important methods to improve the reliability of the system, but mutual coupling of multiple factors is often involved in the design. In my study, Direct Search Method is introduced into the optimum redundancy configuration for design optimization, in which, the reliability, cost, structural weight and other factors can be taken into account simultaneously, and the redundant allocation and reliability design of aircraft critical system are computed. The results show that this method is convenient and workable, and applicable to the redundancy configurations and optimization of various designs upon appropriate modifications. And this method has a good practical value.

  11. Prediction of retention in micellar electrokinetic chromatography based on molecular structural descriptors by using the heuristic method

    International Nuclear Information System (INIS)

    Liu Huanxiang; Yao Xiaojun; Liu Mancang; Hu Zhide; Fan Botao

    2006-01-01

    Based on calculated molecular descriptors from the solutes' structure alone, the micelle-water partition coefficients of 103 solutes in micellar electrokinetic chromatography (MEKC) were predicted using the heuristic method (HM). At the same time, in order to show the influence of different molecular descriptors on the micelle-water partition of solute and to well understand the retention mechanism in MEKC, HM was used to build several multivariable linear models using different numbers of molecular descriptors. The best 6-parameter model gave the following results: the square of correlation coefficient R 2 was 0.958 and the mean relative error was 3.98%, which proved that the predictive values were in good agreement with the experimental results. From the built model, it can be concluded that the hydrophobic, H-bond, polar interactions of solutes with the micellar and aqueous phases are the main factors that determine their partitioning behavior. In addition, this paper provided a simple, fast and effective method for predicting the retention of the solutes in MEKC from their structures and gave some insight into structural features related to the retention of the solutes

  12. 15 CFR 50.5 - Fee structure for age search and citizenship information.

    Science.gov (United States)

    2010-01-01

    ... THE CENSUS § 50.5 Fee structure for age search and citizenship information. Type of service Fee... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Fee structure for age search and citizenship information. 50.5 Section 50.5 Commerce and Foreign Trade Regulations Relating to Commerce and...

  13. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.

    Science.gov (United States)

    Will, Sebastian; Jabbari, Hosna

    2016-01-01

    RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free

  14. A semi-supervised learning approach for RNA secondary structure prediction.

    Science.gov (United States)

    Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki

    2015-08-01

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Crack Growth-Based Predictive Methodology for the Maintenance of the Structural Integrity of Repaired and Nonrepaired Aging Engine Stationary Components

    National Research Council Canada - National Science Library

    Barron, Michael

    1999-01-01

    .... Specifically, the FAA's goal was to develop "Crack Growth-Based Predictive Methodologies for the Maintenance of the Structural Integrity of Repaired and Nonrepaired Aging Engine Stationary Components...

  16. Search for shape coexistence in {sup 188,190}Pb via fine structure in the alpha decay of {sup 192,194}Po

    Energy Technology Data Exchange (ETDEWEB)

    Ahmad, I.; Davids, C.; Janssens, R.V.F. [and others

    1995-08-01

    The interaction between coexisting shapes in nuclei near closed shells was of great interest in the past decade. Excited 0{sup +} states at low energy can often be identified as the bandheads of structures with differing shapes built on those states, These structures were identified in {sup 190-198}Pb via beta decay and alpha decay {open_quotes}fine structure{close_quotes} studies. Coexistence of different shapes in Pb nuclei was predicted by Nilsson-Strutinsky calculations, in which both the oblate and prolate minima were predicted to have excitation energies near 1 MeV. It was our intention to continue the systematic study of the Pb nuclides by searching for excited O{sup +} states in {sup 188}Pb by observing the fine structure in the alpha decay of {sup 192}Po.

  17. BSM searches (SUSY and Exotic) from ATLAS

    CERN Document Server

    ATLAS Collaboration; The ATLAS collaboration

    2015-01-01

    Searches for new physics beyond the Standard Model (SM) at the LHC are mainly driven by two approaches: a signature-based search where one looks for a deviation from the SM prediction in event yield or kinematic properties, and a more theory-oriented approach where the search is designed to look for specific signatures/topologies predicted by certain beyond standard model (BSM ) scenarios. Typical examples for the latter are searches for Supersymmetry and other BSM theories with extended symmetries. Supersymmetry predicts a new partner for every SM particle. An extension to the SM by introducing new gauge or global symmetries (including in Hidden Sector) usually leads to the presence of new heavy gauge bosons. Extensive searches for such particles have been performed in ATLAS at LHC in the context of Supersymmetry, Extended Gauge models, Technicolor, Little Higgs, Extra Dimensions, Left-Right symmetric models, and many other BSM scenarios. Highlights from these searches are presented.

  18. New tips for structure prediction by comparative modeling

    Science.gov (United States)

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence identity and model quality, we carried out an analysis of a set of 4753 sequence and structure alignments. Throughout this research, the model accuracy was measured by root mean square deviations of Cα atoms of the target-template structures. Surprisingly, the results show that sequence identity of the target protein to the template is not a good descriptor to predict the accuracy of the 3-D structure model. However, in a large number of cases, comparative modelling with lower sequence identity of target to template proteins led to more accurate 3-D structure model. As a consequence of this study, we suggest new tips for improving the quality of omparative models, particularly for models whose target-template sequence identity is below 50%. PMID:19255646

  19. Object-based implicit learning in visual search: perceptual segmentation constrains contextual cueing.

    Science.gov (United States)

    Conci, Markus; Müller, Hermann J; von Mühlenen, Adrian

    2013-07-09

    In visual search, detection of a target is faster when it is presented within a spatial layout of repeatedly encountered nontarget items, indicating that contextual invariances can guide selective attention (contextual cueing; Chun & Jiang, 1998). However, perceptual regularities may interfere with contextual learning; for instance, no contextual facilitation occurs when four nontargets form a square-shaped grouping, even though the square location predicts the target location (Conci & von Mühlenen, 2009). Here, we further investigated potential causes for this interference-effect: We show that contextual cueing can reliably occur for targets located within the region of a segmented object, but not for targets presented outside of the object's boundaries. Four experiments demonstrate an object-based facilitation in contextual cueing, with a modulation of context-based learning by relatively subtle grouping cues including closure, symmetry, and spatial regularity. Moreover, the lack of contextual cueing for targets located outside the segmented region was due to an absence of (latent) learning of contextual layouts, rather than due to an attentional bias towards the grouped region. Taken together, these results indicate that perceptual segmentation provides a basic structure within which contextual scene regularities are acquired. This in turn argues that contextual learning is constrained by object-based selection.

  20. XSemantic: An Extension of LCA Based XML Semantic Search

    Science.gov (United States)

    Supasitthimethee, Umaporn; Shimizu, Toshiyuki; Yoshikawa, Masatoshi; Porkaew, Kriengkrai

    One of the most convenient ways to query XML data is a keyword search because it does not require any knowledge of XML structure or learning a new user interface. However, the keyword search is ambiguous. The users may use different terms to search for the same information. Furthermore, it is difficult for a system to decide which node is likely to be chosen as a return node and how much information should be included in the result. To address these challenges, we propose an XML semantic search based on keywords called XSemantic. On the one hand, we give three definitions to complete in terms of semantics. Firstly, the semantic term expansion, our system is robust from the ambiguous keywords by using the domain ontology. Secondly, to return semantic meaningful answers, we automatically infer the return information from the user queries and take advantage of the shortest path to return meaningful connections between keywords. Thirdly, we present the semantic ranking that reflects the degree of similarity as well as the semantic relationship so that the search results with the higher relevance are presented to the users first. On the other hand, in the LCA and the proximity search approaches, we investigated the problem of information included in the search results. Therefore, we introduce the notion of the Lowest Common Element Ancestor (LCEA) and define our simple rule without any requirement on the schema information such as the DTD or XML Schema. The first experiment indicated that XSemantic not only properly infers the return information but also generates compact meaningful results. Additionally, the benefits of our proposed semantics are demonstrated by the second experiment.

  1. Predicting RNA Structure Using Mutual Information

    DEFF Research Database (Denmark)

    Freyhult, E.; Moulton, V.; Gardner, P. P.

    2005-01-01

    , to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. Results: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall...... package. Conclusion: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam. Availability: MIfold is freely available from http......Background: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure...

  2. Perspective: Role of structure prediction in materials discovery and design

    Directory of Open Access Journals (Sweden)

    Richard J. Needs

    2016-05-01

    Full Text Available Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.

  3. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    Science.gov (United States)

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  4. Similarity search processing. Paralelization and indexing technologies.

    Directory of Open Access Journals (Sweden)

    Eder Dos Santos

    2015-08-01

    The next Scientific-Technical Report addresses the similarity search and the implementation of metric structures on parallel environments. It also presents the state of the art related to similarity search on metric structures and parallelism technologies. Comparative analysis are also proposed, seeking to identify the behavior of a set of metric spaces and metric structures over processing platforms multicore-based and GPU-based.

  5. Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and knowledge-based information.

    Science.gov (United States)

    Karczyńska, Agnieszka S; Mozolewska, Magdalena A; Krupa, Paweł; Giełdoń, Artur; Liwo, Adam; Czaplewski, Cezary

    2018-03-01

    A new approach to assisted protein-structure prediction has been proposed, which is based on running multiplexed replica exchange molecular dynamics simulations with the coarse-grained UNRES force field with restraints derived from knowledge-based models and distance distribution from small angle X-ray scattering (SAXS) measurements. The latter restraints are incorporated into the target function as a maximum-likelihood term that guides the shape of the simulated structures towards that defined by SAXS. The approach was first verified with the 1KOY protein, for which the distance distribution was calculated from the experimental structure, and subsequently used to predict the structures of 11 data-assisted targets in the CASP12 experiment. Major improvement of the GDT_TS was obtained for 2 targets, minor improvement for other 2 while, for 6 target GDT_TS deteriorated compared with that calculated for predictions without the SAXS data, partly because of assuming a wrong multimeric state (for Ts866) or because the crystal conformation was more compact than the solution conformation (for Ts942). Particularly good results were obtained for Ts909, in which use of SAXS data resulted in the selection of a correctly packed trimer and, subsequently, increased the GDT_TS of monomer prediction. It was found that running simulations with correct oligomeric state is essential for the success in SAXS-data-assisted prediction. © 2017 Wiley Periodicals, Inc.

  6. A Hybrid Neural Network Model for Sales Forecasting Based on ARIMA and Search Popularity of Article Titles.

    Science.gov (United States)

    Omar, Hani; Hoang, Van Hai; Liu, Duen-Ren

    2016-01-01

    Enhancing sales and operations planning through forecasting analysis and business intelligence is demanded in many industries and enterprises. Publishing industries usually pick attractive titles and headlines for their stories to increase sales, since popular article titles and headlines can attract readers to buy magazines. In this paper, information retrieval techniques are adopted to extract words from article titles. The popularity measures of article titles are then analyzed by using the search indexes obtained from Google search engine. Backpropagation Neural Networks (BPNNs) have successfully been used to develop prediction models for sales forecasting. In this study, we propose a novel hybrid neural network model for sales forecasting based on the prediction result of time series forecasting and the popularity of article titles. The proposed model uses the historical sales data, popularity of article titles, and the prediction result of a time series, Autoregressive Integrated Moving Average (ARIMA) forecasting method to learn a BPNN-based forecasting model. Our proposed forecasting model is experimentally evaluated by comparing with conventional sales prediction techniques. The experimental result shows that our proposed forecasting method outperforms conventional techniques which do not consider the popularity of title words.

  7. A Hybrid Neural Network Model for Sales Forecasting Based on ARIMA and Search Popularity of Article Titles

    Science.gov (United States)

    Omar, Hani; Hoang, Van Hai; Liu, Duen-Ren

    2016-01-01

    Enhancing sales and operations planning through forecasting analysis and business intelligence is demanded in many industries and enterprises. Publishing industries usually pick attractive titles and headlines for their stories to increase sales, since popular article titles and headlines can attract readers to buy magazines. In this paper, information retrieval techniques are adopted to extract words from article titles. The popularity measures of article titles are then analyzed by using the search indexes obtained from Google search engine. Backpropagation Neural Networks (BPNNs) have successfully been used to develop prediction models for sales forecasting. In this study, we propose a novel hybrid neural network model for sales forecasting based on the prediction result of time series forecasting and the popularity of article titles. The proposed model uses the historical sales data, popularity of article titles, and the prediction result of a time series, Autoregressive Integrated Moving Average (ARIMA) forecasting method to learn a BPNN-based forecasting model. Our proposed forecasting model is experimentally evaluated by comparing with conventional sales prediction techniques. The experimental result shows that our proposed forecasting method outperforms conventional techniques which do not consider the popularity of title words. PMID:27313605

  8. Hybrid Multiple Soft-Sensor Models of Grinding Granularity Based on Cuckoo Searching Algorithm and Hysteresis Switching Strategy

    Directory of Open Access Journals (Sweden)

    Jie-Sheng Wang

    2015-01-01

    Full Text Available According to the characteristics of grinding process and accuracy requirements of technical indicators, a hybrid multiple soft-sensor modeling method of grinding granularity is proposed based on cuckoo searching (CS algorithm and hysteresis switching (HS strategy. Firstly, a mechanism soft-sensor model of grinding granularity is deduced based on the technique characteristics and a lot of experimental data of grinding process. Meanwhile, the BP neural network soft-sensor model and wavelet neural network (WNN soft-sensor model are set up. Then, the hybrid multiple soft-sensor model based on the hysteresis switching strategy is realized. That is to say, the optimum model is selected as the current predictive model according to the switching performance index at each sampling instant. Finally the cuckoo searching algorithm is adopted to optimize the performance parameters of hysteresis switching strategy. Simulation results show that the proposed model has better generalization results and prediction precision, which can satisfy the real-time control requirements of grinding classification process.

  9. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  10. Prediction and analysis of structure, stability and unfolding of thermolysin-like proteases

    Science.gov (United States)

    Vriend, Gert; Eijsink, Vincent

    1993-08-01

    Bacillus neutral proteases (NPs) form a group of well-characterized homologous enzymes, that exhibit large differences in thermostability. The three-dimensional (3D) structures of several of these enzymes have been modelled on the basis of the crystal structures of the NPs of B. thermoproteolyticus (thermolysin) and B. cercus. Several new techniques have been developed to improve the model-building procedures. Also a model-building by mutagenesis' strategy was used, in which mutants were designed just to shed light on parts of the structures that were particularly hard to model. The NP models have been used for the prediction of site-directed mutations aimed at improving the thermostability of the enzymes. Predictions were made using several novel computational techniques, such as position-specific rotamer searching, packing quality analysis and property-profile database searches. Many stabilizing mutations were predicted and produced: improvement of hydrogen bonding, exclusion of buried water molecules, capping helices, improvement of hydrophobic interactions and entropic stabilization have been applied successfully. At elevated temperatures NPs are irreversibly inactivated as a result of autolysis. It has been shown that this denaturation process is independent of the protease activity and concentration and that the inactivation follows first-order kinetics. From this it has been conjectured that local unfolding of (surface) loops, which renders the protein susceptible to autolysis, is the rate-limiting step. Despite the particular nature of the thermal denaturation process, normal rules for protein stability can be applied to NPs. However, rather than stabilizing the whole protein against global unfolding, only a small region has to be protected against local unfolding. In contrast to proteins in general, mutational effects in proteases are not additive and their magnitude is strongly dependent on the location of the mutation. Mutations that alter the stability

  11. Attribute-based proxy re-encryption with keyword search.

    Science.gov (United States)

    Shi, Yanfeng; Liu, Jiqiang; Han, Zhen; Zheng, Qingji; Zhang, Rui; Qiu, Shuo

    2014-01-01

    Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (ABRKS), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, ABRKS allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for ABRKS, and propose two concrete constructions for ABRKS: key-policy ABRKS and ciphertext-policy ABRKS. In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography.

  12. Attribute-Based Proxy Re-Encryption with Keyword Search

    Science.gov (United States)

    Shi, Yanfeng; Liu, Jiqiang; Han, Zhen; Zheng, Qingji; Zhang, Rui; Qiu, Shuo

    2014-01-01

    Keyword search on encrypted data allows one to issue the search token and conduct search operations on encrypted data while still preserving keyword privacy. In the present paper, we consider the keyword search problem further and introduce a novel notion called attribute-based proxy re-encryption with keyword search (), which introduces a promising feature: In addition to supporting keyword search on encrypted data, it enables data owners to delegate the keyword search capability to some other data users complying with the specific access control policy. To be specific, allows (i) the data owner to outsource his encrypted data to the cloud and then ask the cloud to conduct keyword search on outsourced encrypted data with the given search token, and (ii) the data owner to delegate other data users keyword search capability in the fine-grained access control manner through allowing the cloud to re-encrypted stored encrypted data with a re-encrypted data (embedding with some form of access control policy). We formalize the syntax and security definitions for , and propose two concrete constructions for : key-policy and ciphertext-policy . In the nutshell, our constructions can be treated as the integration of technologies in the fields of attribute-based cryptography and proxy re-encryption cryptography. PMID:25549257

  13. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

    Science.gov (United States)

    Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

    2017-09-01

    α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD PHP, MySQL and Apache, with all major browsers supported.

  14. Assessment and Comparison of Search capabilities of Web-based Meta-Search Engines: A Checklist Approach

    Directory of Open Access Journals (Sweden)

    Alireza Isfandiyari Moghadam

    2010-03-01

    Full Text Available   The present investigation concerns evaluation, comparison and analysis of search options existing within web-based meta-search engines. 64 meta-search engines were identified. 19 meta-search engines that were free, accessible and compatible with the objectives of the present study were selected. An author’s constructed check list was used for data collection. Findings indicated that all meta-search engines studied used the AND operator, phrase search, number of results displayed setting, previous search query storage and help tutorials. Nevertheless, none of them demonstrated any search options for hypertext searching and displaying the size of the pages searched. 94.7% support features such as truncation, keywords in title and URL search and text summary display. The checklist used in the study could serve as a model for investigating search options in search engines, digital libraries and other internet search tools.

  15. Promoting evidence based medicine in preclinical medical students via a federated literature search tool.

    Science.gov (United States)

    Keim, Samuel Mark; Howse, David; Bracke, Paul; Mendoza, Kathryn

    2008-01-01

    Medical educators are increasingly faced with directives to teach Evidence Based Medicine (EBM) skills. Because of its nature, integrating fundamental EBM educational content is a challenge in the preclinical years. To analyse preclinical medical student user satisfaction and feedback regarding a clinical EBM search strategy. The authors introduced a custom EBM search option with a self-contained education structure to first-year medical students. The implementation took advantage of a major curricular change towards case-based instruction. Medical student views and experiences were studied regarding the tool's convenience, problems and the degree to which they used it to answer questions raised by case-based instruction. Surveys were completed by 70% of the available first-year students. Student satisfaction and experiences were strongly positive towards the EBM strategy, especially of the tool's convenience and utility for answering issues raised during case-based learning sessions. About 90% of the students responded that the tool was easy to use, productive and accessed for half or more of their search needs. This study provides evidence that the integration of an educational EBM search tool can be positively received by preclinical medical students.

  16. On the network-based emulation of human visual search

    NARCIS (Netherlands)

    Gerrissen, J.F.

    1991-01-01

    We describe the design of a computer emulator of human visual search. The emulator mechanism is eventually meant to support ergonomic assessment of the effect of display structure and protocol on search performance. As regards target identification and localization, it mimics a number of

  17. Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions

    KAUST Repository

    Abdelaziz, Ibrahim; Fokoue, Achille; Hassanzadeh, Oktie; Zhang, Ping; Sadoghi, Mohammad

    2017-01-01

    Drug-Drug Interactions (DDIs) are a major cause of preventable Adverse Drug Reactions (ADRs), causing a significant burden on the patients’ health and the healthcare system. It is widely known that clinical studies cannot sufficiently and accurately identify DDIs for new drugs before they are made available on the market. In addition, existing public and proprietary sources of DDI information are known to be incomplete and/or inaccurate and so not reliable. As a result, there is an emerging body of research on in-silico prediction of drug-drug interactions. In this paper, we present Tiresias, a large-scale similarity-based framework that predicts DDIs through link prediction. Tiresias takes in various sources of drug-related data and knowledge as inputs, and provides DDI predictions as outputs. The process starts with semantic integration of the input data that results in a knowledge graph describing drug attributes and relationships with various related entities such as enzymes, chemical structures, and pathways. The knowledge graph is then used to compute several similarity measures between all the drugs in a scalable and distributed framework. In particular, Tiresias utilizes two classes of features in a knowledge graph: local and global features. Local features are derived from the information directly associated to each drug (i.e., one hop away) while global features are learnt by minimizing a global loss function that considers the complete structure of the knowledge graph. The resulting similarity metrics are used to build features for a large-scale logistic regression model to predict potential DDIs. We highlight the novelty of our proposed Tiresias and perform thorough evaluation of the quality of the predictions. The results show the effectiveness of Tiresias in both predicting new interactions among existing drugs as well as newly developed drugs.

  18. Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions

    KAUST Repository

    Abdelaziz, Ibrahim

    2017-06-12

    Drug-Drug Interactions (DDIs) are a major cause of preventable Adverse Drug Reactions (ADRs), causing a significant burden on the patients’ health and the healthcare system. It is widely known that clinical studies cannot sufficiently and accurately identify DDIs for new drugs before they are made available on the market. In addition, existing public and proprietary sources of DDI information are known to be incomplete and/or inaccurate and so not reliable. As a result, there is an emerging body of research on in-silico prediction of drug-drug interactions. In this paper, we present Tiresias, a large-scale similarity-based framework that predicts DDIs through link prediction. Tiresias takes in various sources of drug-related data and knowledge as inputs, and provides DDI predictions as outputs. The process starts with semantic integration of the input data that results in a knowledge graph describing drug attributes and relationships with various related entities such as enzymes, chemical structures, and pathways. The knowledge graph is then used to compute several similarity measures between all the drugs in a scalable and distributed framework. In particular, Tiresias utilizes two classes of features in a knowledge graph: local and global features. Local features are derived from the information directly associated to each drug (i.e., one hop away) while global features are learnt by minimizing a global loss function that considers the complete structure of the knowledge graph. The resulting similarity metrics are used to build features for a large-scale logistic regression model to predict potential DDIs. We highlight the novelty of our proposed Tiresias and perform thorough evaluation of the quality of the predictions. The results show the effectiveness of Tiresias in both predicting new interactions among existing drugs as well as newly developed drugs.

  19. New tips for structure prediction by comparative modeling

    OpenAIRE

    Rayan, Anwar

    2009-01-01

    Comparative modelling is utilized to predict the 3-dimensional conformation of a given protein (target) based on its sequence alignment to experimentally determined protein structure (template). The use of such technique is already rewarding and increasingly widespread in biological research and drug development. The accuracy of the predictions as commonly accepted depends on the score of sequence identity of the target protein to the template. To assess the relationship between sequence iden...

  20. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    Directory of Open Access Journals (Sweden)

    Johannes Eichner

    Full Text Available One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1 discriminates TFs from other proteins, (2 determines the structural superclass of TFs, (3 identifies the DNA-binding domains of TFs and (4 predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  1. Towards a unified fatigue life prediction method for marine structures

    CERN Document Server

    Cui, Weicheng; Wang, Fang

    2014-01-01

    In order to apply the damage tolerance design philosophy to design marine structures, accurate prediction of fatigue crack growth under service conditions is required. Now, more and more people have realized that only a fatigue life prediction method based on fatigue crack propagation (FCP) theory has the potential to explain various fatigue phenomena observed. In this book, the issues leading towards the development of a unified fatigue life prediction (UFLP) method based on FCP theory are addressed. Based on the philosophy of the UFLP method, the current inconsistency between fatigue design and inspection of marine structures could be resolved. This book presents the state-of-the-art and recent advances, including those by the authors, in fatigue studies. It is designed to lead the future directions and to provide a useful tool in many practical applications. It is intended to address to engineers, naval architects, research staff, professionals and graduates engaged in fatigue prevention design and survey ...

  2. Model-free and model-based reward prediction errors in EEG.

    Science.gov (United States)

    Sambrook, Thomas D; Hardwick, Ben; Wills, Andy J; Goslin, Jeremy

    2018-05-24

    Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain. However, recent fMRI evidence suggests that the brain may compute reward prediction errors to both model-free and model-based estimates of value, signalling the possibility that these systems interact. Because of its poor temporal resolution, fMRI risks confounding reward prediction errors with other feedback-related neural activity. In the present study, EEG was used to show the presence of both model-based and model-free reward prediction errors and their place in a temporal sequence of events including state prediction errors and action value updates. This demonstration of model-based prediction errors questions a long-held assumption that model-free and model-based learning are dissociated in the brain. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Information search with situation-specific reward functions

    Directory of Open Access Journals (Sweden)

    Bjorn Meder

    2012-03-01

    Full Text Available can strongly conflict with the goal of obtaining information for improving payoffs. Two environments with such a conflict were identified through computer optimization. Three subsequent experiments investigated people's search behavior in these environments. Experiments 1 and 2 used a multiple-cue probabilistic category-learning task to convey environmental probabilities. In a subsequent search task subjects could query only a single feature before making a classification decision. The crucial manipulation concerned the search-task reward structure. The payoffs corresponded either to accuracy, with equal rewards associated with the two categories, or to an asymmetric payoff function, with different rewards associated with each category. In Experiment 1, in which learning-task feedback corresponded to the true category, people later preferentially searched the accuracy-maximizing feature, whether or not this would improve monetary rewards. In Experiment 2, an asymmetric reward structure was used during learning. Subjects searched the reward-maximizing feature when asymmetric payoffs were preserved in the search task. However, if search-task payoffs corresponded to accuracy, subjects preferentially searched a feature that was suboptimal for reward and accuracy alike. Importantly, this feature would have been most useful, under the learning-task payoff structure. Experiment 3 found that, if words and numbers are used to convey environmental probabilities, neither reward nor accuracy consistently predicts search. These findings emphasize the necessity of taking into account people's goals and search-and-decision processes during learning, thereby challenging current models of information search.

  4. Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.

    Science.gov (United States)

    Legendre, Audrey; Angel, Eric; Tahi, Fariza

    2018-01-15

    RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure. We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic bi-objective integer programming algorithm allowing to return optimal and sub-optimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F 1 -score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F 1 -scores are always higher than 70% for any number of solutions returned. The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several sub-optimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better mono-criterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multi-objective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univ-evry.fr .

  5. Swarm Intelligence-Based Hybrid Models for Short-Term Power Load Prediction

    Directory of Open Access Journals (Sweden)

    Jianzhou Wang

    2014-01-01

    Full Text Available Swarm intelligence (SI is widely and successfully applied in the engineering field to solve practical optimization problems because various hybrid models, which are based on the SI algorithm and statistical models, are developed to further improve the predictive abilities. In this paper, hybrid intelligent forecasting models based on the cuckoo search (CS as well as the singular spectrum analysis (SSA, time series, and machine learning methods are proposed to conduct short-term power load prediction. The forecasting performance of the proposed models is augmented by a rolling multistep strategy over the prediction horizon. The test results are representative of the out-performance of the SSA and CS in tuning the seasonal autoregressive integrated moving average (SARIMA and support vector regression (SVR in improving load forecasting, which indicates that both the SSA-based data denoising and SI-based intelligent optimization strategy can effectively improve the model’s predictive performance. Additionally, the proposed CS-SSA-SARIMA and CS-SSA-SVR models provide very impressive forecasting results, demonstrating their strong robustness and universal forecasting capacities in terms of short-term power load prediction 24 hours in advance.

  6. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Science.gov (United States)

    Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  7. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    Science.gov (United States)

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  8. JNSViewer-A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures.

    Directory of Open Access Journals (Sweden)

    Jieming Shi

    Full Text Available Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html.

  9. Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.

    Science.gov (United States)

    Agoritsas, Thomas; Merglen, Arnaud; Courvoisier, Delphine S; Combescure, Christophe; Garin, Nicolas; Perrier, Arnaud; Perneger, Thomas V

    2012-06-12

    Clinicians perform searches in PubMed daily, but retrieving relevant studies is challenging due to the rapid expansion of medical knowledge. Little is known about the performance of search strategies when they are applied to answer specific clinical questions. To compare the performance of 15 PubMed search strategies in retrieving relevant clinical trials on therapeutic interventions. We used Cochrane systematic reviews to identify relevant trials for 30 clinical questions. Search terms were extracted from the abstract using a predefined procedure based on the population, interventions, comparison, outcomes (PICO) framework and combined into queries. We tested 15 search strategies that varied in their query (PIC or PICO), use of PubMed's Clinical Queries therapeutic filters (broad or narrow), search limits, and PubMed links to related articles. We assessed sensitivity (recall) and positive predictive value (precision) of each strategy on the first 2 PubMed pages (40 articles) and on the complete search output. The performance of the search strategies varied widely according to the clinical question. Unfiltered searches and those using the broad filter of Clinical Queries produced large outputs and retrieved few relevant articles within the first 2 pages, resulting in a median sensitivity of only 10%-25%. In contrast, all searches using the narrow filter performed significantly better, with a median sensitivity of about 50% (all P PubMed pages. These results can help clinicians apply effective strategies to answer their questions at the point of care.

  10. Structural Reliability: An Assessment Using a New and Efficient Two-Phase Method Based on Artificial Neural Network and a Harmony Search Algorithm

    Directory of Open Access Journals (Sweden)

    Naser Kazemi Elaki

    2016-06-01

    Full Text Available In this research, a two-phase algorithm based on the artificial neural network (ANN and a harmony search (HS algorithm has been developed with the aim of assessing the reliability of structures with implicit limit state functions. The proposed method involves the generation of datasets to be used specifically for training by Finite Element analysis, to establish an ANN model using a proven ANN model in the reliability assessment process as an analyzer for structures, and finally estimate the reliability index and failure probability by using the HS algorithm, without any requirements for the explicit form of limit state function. The proposed algorithm is investigated here, and its accuracy and efficiency are demonstrated by using several numerical examples. The results obtained show that the proposed algorithm gives an appropriate estimate for the assessment of reliability of structures.

  11. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools.

    Directory of Open Access Journals (Sweden)

    Lei Jia

    Full Text Available Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find "hot spots" in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html is a public database that consists of thousands of protein mutants' experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG and melting temperature change (dTm were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.

  12. Molecule database framework: a framework for creating database applications with chemical structure search capability.

    Science.gov (United States)

    Kiener, Joos

    2013-12-11

    Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework

  13. Fast protein tertiary structure retrieval based on global surface shape similarity.

    Science.gov (United States)

    Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke

    2008-09-01

    Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.

  14. Towards cheminformatics-based estimation of drug therapeutic index: Predicting the protective index of anticonvulsants using a new quantitative structure-index relationship approach.

    Science.gov (United States)

    Chen, Shangying; Zhang, Peng; Liu, Xin; Qin, Chu; Tao, Lin; Zhang, Cheng; Yang, Sheng Yong; Chen, Yu Zong; Chui, Wai Keung

    2016-06-01

    The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. Copyright © 2016. Published by Elsevier Inc.

  15. I-TASSER server for protein 3D structure prediction

    Directory of Open Access Journals (Sweden)

    Zhang Yang

    2008-01-01

    Full Text Available Abstract Background Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. Results An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1] of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD. Conclusion The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available

  16. Soft Computing Methods for Disulfide Connectivity Prediction.

    Science.gov (United States)

    Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S

    2015-01-01

    The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.

  17. A compound structure of ELM based on feature selection and parameter optimization using hybrid backtracking search algorithm for wind speed forecasting

    International Nuclear Information System (INIS)

    Zhang, Chu; Zhou, Jianzhong; Li, Chaoshun; Fu, Wenlong; Peng, Tian

    2017-01-01

    Highlights: • A novel hybrid approach is proposed for wind speed forecasting. • The variational mode decomposition (VMD) is optimized to decompose the original wind speed series. • The input matrix and parameters of ELM are optimized simultaneously by using a hybrid BSA. • Results show that OVMD-HBSA-ELM achieves better performance in terms of prediction accuracy. - Abstract: Reliable wind speed forecasting is essential for wind power integration in wind power generation system. The purpose of paper is to develop a novel hybrid model for short-term wind speed forecasting and demonstrates its efficiency. In the proposed model, a compound structure of extreme learning machine (ELM) based on feature selection and parameter optimization using hybrid backtracking search algorithm (HBSA) is employed as the predictor. The real-valued BSA (RBSA) is exploited to search for the optimal combination of weights and bias of ELM while the binary-valued BSA (BBSA) is exploited as a feature selection method applying on the candidate inputs predefined by partial autocorrelation function (PACF) values to reconstruct the input-matrix. Due to the volatility and randomness of wind speed signal, an optimized variational mode decomposition (OVMD) is employed to eliminate the redundant noises. The parameters of the proposed OVMD are determined according to the center frequencies of the decomposed modes and the residual evaluation index (REI). The wind speed signal is decomposed into a few modes via OVMD. The aggregation of the forecasting results of these modes constructs the final forecasting result of the proposed model. The proposed hybrid model has been applied on the mean half-hour wind speed observation data from two wind farms in Inner Mongolia, China and 10-min wind speed data from the Sotavento Galicia wind farm are studied as an additional case. Parallel experiments have been designed to compare with the proposed model. Results obtained from this study indicate that the

  18. Evaluation and use of in-silico structure-based epitope prediction with foot-and-mouth disease virus.

    Directory of Open Access Journals (Sweden)

    Daryl W Borley

    Full Text Available Understanding virus antigenicity is of fundamental importance for the development of better, more cross-reactive vaccines. However, as far as we are aware, no systematic work has yet been conducted using the 3D structure of a virus to identify novel epitopes. Therefore we have extended several existing structural prediction algorithms to build a method for identifying epitopes on the appropriate outer surface of intact virus capsids (which are structurally different from globular proteins in both shape and arrangement of multiple repeated elements and applied it here as a proof of principle concept to the capsid of foot-and-mouth disease virus (FMDV. We have analysed how reliably several freely available structure-based B cell epitope prediction programs can identify already known viral epitopes of FMDV in the context of the viral capsid. To do this we constructed a simple objective metric to measure the sensitivity and discrimination of such algorithms. After optimising the parameters for five methods using an independent training set we used this measure to evaluate the methods. Individually any one algorithm performed rather poorly (three performing better than the other two suggesting that there may be value in developing virus-specific software. Taking a very conservative approach requiring a consensus between all three top methods predicts a number of previously described antigenic residues as potential epitopes on more than one serotype of FMDV, consistent with experimental results. The consensus results identified novel residues as potential epitopes on more than one serotype. These include residues 190-192 of VP2 (not previously determined to be antigenic, residues 69-71 and 193-197 of VP3 spanning the pentamer-pentamer interface, and another region incorporating residues 83, 84 and 169-174 of VP1 (all only previously experimentally defined on serotype A. The computer programs needed to create a semi-automated procedure for carrying out

  19. Simultaneous determination of aquifer parameters and zone structures with fuzzy c-means clustering and meta-heuristic harmony search algorithm

    Science.gov (United States)

    Ayvaz, M. Tamer

    2007-11-01

    This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means ( FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an optimization model. The meta-heuristic harmony search ( HS) algorithm, which is conceptualized using the musical process of searching for a perfect state of harmony, is used as an optimization technique. The optimum parameter zone structure is identified based on three criteria which are the residual error, parameter uncertainty, and structure discrimination. A numerical example given in the literature is solved to demonstrate the performance of the proposed algorithm. Also, a sensitivity analysis is performed to test the performance of the HS algorithm for different sets of solution parameters. Results indicate that the proposed solution algorithm is an effective way in the simultaneous identification of aquifer parameters and their corresponding zone structures.

  20. Fractional order PID control design for semi-active control of smart base-isolated structures: A multi-objective cuckoo search approach.

    Science.gov (United States)

    Zamani, Abbas-Ali; Tavakoli, Saeed; Etedali, Sadegh

    2017-03-01

    Fractional order PID (FOPID) controllers are introduced as a general form of classical PID controllers using fractional calculus. As this controller provides good disturbance rejection and is robust against plant uncertainties it is appropriate for the vibration mitigation in structures. In this paper, an FOPID controller is designed to adjust the contact force of piezoelectric friction dampers for semi-active control of base-isolated structures during far-field and near-field earthquake excitations. A multi-objective cuckoo search algorithm is employed to tune the controller parameters. Considering the resulting Pareto optimal front, the best input for the FOPID controller is selected. For seven pairs of earthquakes and nine performance indices, the performance of the proposed controller is compared with those provided by several well-known control techniques. According to the simulation results, the proposed controller performs better than other controllers in terms of simultaneous reduction of the maximum base displacement and story acceleration for various types of earthquakes. Also, it provides acceptable responses in terms of inter-story drifts, root mean square of base displacements and floor acceleration. In addition, the evaluation of robustness for a stiffness uncertainty of ±10% indicates that the proposed controller gives a robust performance against such modeling errors. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  1. A Secured Cognitive Agent based Multi-strategic Intelligent Search System

    Directory of Open Access Journals (Sweden)

    Neha Gulati

    2018-04-01

    Full Text Available Search Engine (SE is the most preferred information retrieval tool ubiquitously used. In spite of vast scale involvement of users in SE’s, their limited capabilities to understand the user/searcher context and emotions places high cognitive, perceptual and learning load on the user to maintain the search momentum. In this regard, the present work discusses a Cognitive Agent (CA based approach to support the user in Web-based search process. The work suggests a framework called Secured Cognitive Agent based Multi-strategic Intelligent Search System (CAbMsISS to assist the user in search process. It helps to reduce the contextual and emotional mismatch between the SE’s and user. After implementation of the proposed framework, performance analysis shows that CAbMsISS framework improves Query Retrieval Time (QRT and effectiveness for retrieving relevant results as compared to Present Search Engine (PSE. Supplementary to this, it also provides search suggestions when user accesses a resource previously tagged with negative emotions. Overall, the goal of the system is to enhance the search experience for keeping the user motivated. The framework provides suggestions through the search log that tracks the queries searched, resources accessed and emotions experienced during the search. The implemented framework also considers user security. Keywords: BDI model, Cognitive Agent, Emotion, Information retrieval, Intelligent search, Search Engine

  2. Compressed Data Structures for Range Searching

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Vind, Søren Juhl

    2015-01-01

    matrices and web graphs. Our contribution is twofold. First, we show how to compress geometric repetitions that may appear in standard range searching data structures (such as K-D trees, Quad trees, Range trees, R-trees, Priority R-trees, and K-D-B trees), and how to implement subsequent range queries...... on the compressed representation with only a constant factor overhead. Secondly, we present a compression scheme that efficiently identifies geometric repetitions in point sets, and produces a hierarchical clustering of the point sets, which combined with the first result leads to a compressed representation...

  3. Free energy minimization to predict RNA secondary structures and computational RNA design.

    Science.gov (United States)

    Churkin, Alexander; Weinbrand, Lina; Barash, Danny

    2015-01-01

    Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.

  4. Stochastic search in structural optimization - Genetic algorithms and simulated annealing

    Science.gov (United States)

    Hajela, Prabhat

    1993-01-01

    An account is given of illustrative applications of genetic algorithms and simulated annealing methods in structural optimization. The advantages of such stochastic search methods over traditional mathematical programming strategies are emphasized; it is noted that these methods offer a significantly higher probability of locating the global optimum in a multimodal design space. Both genetic-search and simulated annealing can be effectively used in problems with a mix of continuous, discrete, and integer design variables.

  5. Structure-based prediction of subtype selectivity of histamine H3 receptor selective antagonists in clinical trials.

    Science.gov (United States)

    Kim, Soo-Kyung; Fristrup, Peter; Abrol, Ravinder; Goddard, William A

    2011-12-27

    Histamine receptors (HRs) are excellent drug targets for the treatment of diseases, such as schizophrenia, psychosis, depression, migraine, allergies, asthma, ulcers, and hypertension. Among them, the human H(3) histamine receptor (hH(3)HR) antagonists have been proposed for specific therapeutic applications, including treatment of Alzheimer's disease, attention deficit hyperactivity disorder (ADHD), epilepsy, and obesity. However, many of these drug candidates cause undesired side effects through the cross-reactivity with other histamine receptor subtypes. In order to develop improved selectivity and activity for such treatments, it would be useful to have the three-dimensional structures for all four HRs. We report here the predicted structures of four HR subtypes (H(1), H(2), H(3), and H(4)) using the GEnSeMBLE (GPCR ensemble of structures in membrane bilayer environment) Monte Carlo protocol, sampling ∼35 million combinations of helix packings to predict the 10 most stable packings for each of the four subtypes. Then we used these 10 best protein structures with the DarwinDock Monte Carlo protocol to sample ∼50 000 × 10(20) poses to predict the optimum ligand-protein structures for various agonists and antagonists. We find that E206(5.46) contributes most in binding H(3) selective agonists (5, 6, 7) in agreement with experimental mutation studies. We also find that conserved E5.46/S5.43 in both of hH(3)HR and hH(4)HR are involved in H(3)/ H(4) subtype selectivity. In addition, we find that M378(6.55) in hH(3)HR provides additional hydrophobic interactions different from hH(4)HR (the corresponding amino acid of T323(6.55) in hH(4)HR) to provide additional subtype bias. From these studies, we developed a pharmacophore model based on our predictions for known hH(3)HR selective antagonists in clinical study [ABT-239 1, GSK-189,254 2, PF-3654746 3, and BF2.649 (tiprolisant) 4] that suggests critical selectivity directing elements are: the basic proton

  6. Identification of Fuzzy Inference Systems by Means of a Multiobjective Opposition-Based Space Search Algorithm

    Directory of Open Access Journals (Sweden)

    Wei Huang

    2013-01-01

    Full Text Available We introduce a new category of fuzzy inference systems with the aid of a multiobjective opposition-based space search algorithm (MOSSA. The proposed MOSSA is essentially a multiobjective space search algorithm improved by using an opposition-based learning that employs a so-called opposite numbers mechanism to speed up the convergence of the optimization algorithm. In the identification of fuzzy inference system, the MOSSA is exploited to carry out the parametric identification of the fuzzy model as well as to realize its structural identification. Experimental results demonstrate the effectiveness of the proposed fuzzy models.

  7. Pep-3D-Search: a method for B-cell epitope prediction based on mimotope analysis

    OpenAIRE

    Huang, Yan Xin; Bao, Yong Li; Guo, Shu Yan; Wang, Yan; Zhou, Chun Guang; Li, Yu Xin

    2008-01-01

    Abstract Background The prediction of conformational B-cell epitopes is one of the most important goals in immunoinformatics. The solution to this problem, even if approximate, would help in designing experiments to precisely map the residues of interaction between an antigen and an antibody. Consequently, this area of research has received considerable attention from immunologists, structural biologists and computational biologists. Phage-displayed random peptide libraries are powerful tools...

  8. Predicted crystal structures of molybdenum under high pressure

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Bing; Zhang, Guang Biao [Institute for Computational Materials Science, School of Physics and Electronics, Henan University, Kaifeng 475004 (China); Wang, Yuan Xu, E-mail: wangyx@henu.edu.cn [Institute for Computational Materials Science, School of Physics and Electronics, Henan University, Kaifeng 475004 (China); Guizhou Provincial Key Laboratory of Computational Nano-Material Science, Institute of Applied Physics, Guizhou Normal College, Guiyang 550018 (China)

    2013-04-15

    Highlights: ► A double-hexagonal close-packed (dhcp) structure of molybdenum is predicted. ► Calculated acoustic velocity confirms the bcc–dhcp phase transition at 660 GPa. ► The valence electrons of dhcp Mo are mostly localized in the interstitial sites. -- Abstract: The high-pressure structures of molybdenum (Mo) at zero temperature have been extensively explored through the newly developed particle swarm optimization (PSO) algorithm on crystal structural prediction. All the experimental and earlier theoretical structures were successfully reproduced in certain pressure ranges, validating our methodology in application to Mo. A double-hexagonal close-packed (dhcp) structure found by Mikhaylushkin et al. (2008) [12] is confirmed by the present PSO calculations. The lattice parameters and physical properties of the dhcp phase were investigated based on first principles calculations. The phase transition occurs only from bcc phase to dhcp phase at 660 GPa and at zero temperature. The calculated acoustic velocities also indicate a transition from the bcc to dhcp phases for Mo. More intriguingly, the calculated density of states (DOS) shows that the dhcp structure remains metallic. The calculated electron density difference (EDD) reveals that its valence electrons are localized in the interstitial regions.

  9. Achilles tendons from decorin- and biglycan-null mouse models have inferior mechanical and structural properties predicted by an image-based empirical damage model.

    Science.gov (United States)

    Gordon, J A; Freedman, B R; Zuskov, A; Iozzo, R V; Birk, D E; Soslowsky, L J

    2015-07-16

    Achilles tendons are a common source of pain and injury, and their pathology may originate from aberrant structure function relationships. Small leucine rich proteoglycans (SLRPs) influence mechanical and structural properties in a tendon-specific manner. However, their roles in the Achilles tendon have not been defined. The objective of this study was to evaluate the mechanical and structural differences observed in mouse Achilles tendons lacking class I SLRPs; either decorin or biglycan. In addition, empirical modeling techniques based on mechanical and image-based measures were employed. Achilles tendons from decorin-null (Dcn(-/-)) and biglycan-null (Bgn(-/-)) C57BL/6 female mice (N=102) were used. Each tendon underwent a dynamic mechanical testing protocol including simultaneous polarized light image capture to evaluate both structural and mechanical properties of each Achilles tendon. An empirical damage model was adapted for application to genetic variation and for use with image based structural properties to predict tendon dynamic mechanical properties. We found that Achilles tendons lacking decorin and biglycan had inferior mechanical and structural properties that were age dependent; and that simple empirical models, based on previously described damage models, were predictive of Achilles tendon dynamic modulus in both decorin- and biglycan-null mice. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Predicting nucleic acid binding interfaces from structural models of proteins.

    Science.gov (United States)

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  11. Generating "fragment-based virtual library" using pocket similarity search of ligand-receptor complexes.

    Science.gov (United States)

    Khashan, Raed S

    2015-01-01

    As the number of available ligand-receptor complexes is increasing, researchers are becoming more dedicated to mine these complexes to aid in the drug design and development process. We present free software which is developed as a tool for performing similarity search across ligand-receptor complexes for identifying binding pockets which are similar to that of a target receptor. The search is based on 3D-geometric and chemical similarity of the atoms forming the binding pocket. For each match identified, the ligand's fragment(s) corresponding to that binding pocket are extracted, thus forming a virtual library of fragments (FragVLib) that is useful for structure-based drug design. The program provides a very useful tool to explore available databases.

  12. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  13. Evolutionary Structure Prediction of Stoichiometric Compounds

    Science.gov (United States)

    Zhu, Qiang; Oganov, Artem

    2014-03-01

    In general, for a given ionic compound AmBn\\ at ambient pressure condition, its stoichiometry reflects the valence state ratio between per chemical specie (i.e., the charges for each anion and cation). However, compounds under high pressure exhibit significantly behavior, compared to those analogs at ambient condition. Here we developed a method to solve the crystal structure prediction problem based on the evolutionary algorithms, which can predict both the stable compounds and their crystal structures at arbitrary P,T-conditions, given just the set of chemical elements. By applying this method to a wide range of binary ionic systems (Na-Cl, Mg-O, Xe-O, Cs-F, etc), we discovered a lot of compounds with brand new stoichimetries which can become thermodynamically stable. Further electronic structure analysis on these novel compounds indicates that several factors can contribute to this extraordinary phenomenon: (1) polyatomic anions; (2) free electron localization; (3) emergence of new valence states; (4) metallization. In particular, part of the results have been confirmed by experiment, which warrants that this approach can play a crucial role in new materials design under extreme pressure conditions. This work is funded by DARPA (Grants No. W31P4Q1210008 and W31P4Q1310005), NSF (EAR-1114313 and DMR-1231586).

  14. Considerations for the development of task-based search engines

    DEFF Research Database (Denmark)

    Petcu, Paula; Dragusin, Radu

    2013-01-01

    Based on previous experience from working on a task-based search engine, we present a list of suggestions and ideas for an Information Retrieval (IR) framework that could inform the development of next generation professional search systems. The specific task that we start from is the clinicians......' information need in finding rare disease diagnostic hypotheses at the time and place where medical decisions are made. Our experience from the development of a search engine focused on supporting clinicians in completing this task has provided us valuable insights in what aspects should be considered...... by the developers of vertical search engines....

  15. Object-based target templates guide attention during visual search

    OpenAIRE

    Berggren, Nick; Eimer, Martin

    2018-01-01

    During visual search, attention is believed to be controlled in a strictly feature-based fashion, without any guidance by object-based target representations. To challenge this received view, we measured electrophysiological markers of attentional selection (N2pc component) and working memory (SPCN) in search tasks where two possible targets were defined by feature conjunctions (e.g., blue circles and green squares). Critically, some search displays also contained nontargets with two target f...

  16. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing; Zhang, Jun; Chen, Peng; Ji, Zhiwei; Deng, Shuping; Li, Chi

    2013-01-01

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  17. Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features

    KAUST Repository

    Wang, Bing

    2013-05-09

    Background: Ion mobility-mass spectrometry (IMMS), an analytical technique which combines the features of ion mobility spectrometry (IMS) and mass spectrometry (MS), can rapidly separates ions on a millisecond time-scale. IMMS becomes a powerful tool to analyzing complex mixtures, especially for the analysis of peptides in proteomics. The high-throughput nature of this technique provides a challenge for the identification of peptides in complex biological samples. As an important parameter, peptide drift time can be used for enhancing downstream data analysis in IMMS-based proteomics.Results: In this paper, a model is presented based on least square support vectors regression (LS-SVR) method to predict peptide ion drift time in IMMS from the sequence-based features of peptide. Four descriptors were extracted from peptide sequence to represent peptide ions by a 34-component vector. The parameters of LS-SVR were selected by a grid searching strategy, and a 10-fold cross-validation approach was employed for the model training and testing. Our proposed method was tested on three datasets with different charge states. The high prediction performance achieve demonstrate the effectiveness and efficiency of the prediction model.Conclusions: Our proposed LS-SVR model can predict peptide drift time from sequence information in relative high prediction accuracy by a test on a dataset of 595 peptides. This work can enhance the confidence of protein identification by combining with current protein searching techniques. 2013 Wang et al.; licensee BioMed Central Ltd.

  18. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    Science.gov (United States)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  19. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Directory of Open Access Journals (Sweden)

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  20. Efficacy of the theory of planned behavior in predicting breastfeeding: Meta-analysis and structural equation modeling.

    Science.gov (United States)

    Guo, J L; Wang, T F; Liao, J Y; Huang, C M

    2016-02-01

    This study assessed the applicability and efficacy of the theory of planned behavior (TPB) in predicting breastfeeding. The TPB assumes a rational approach for engaging in various behaviors, and has been used extensively for explaining health behavior. However, most studies have tested the effectiveness of TPB constructs in predicting how people perform actions for their own benefit rather than performing behaviors that are beneficial to others, such as breastfeeding infants. A meta-analysis approach could help clarify the breastfeeding practice to promote breastfeeding. This study used meta-analytic procedures. We searched for studies to include in our analysis, examining those published between January 1, 1990 and December 31, 2013 in PubMed, Medline, CINAHL, ProQuest, and Mosby's Index. We also reviewed journals with a history of publishing breastfeeding studies and searched reference lists for potential articles to include. Ten studies comprising a total of 2694 participants were selected for analysis. These studies yielded 10 effect sizes from the TPB, which ranged from 0.20 to 0.59. Structural equation model analysis using the pooled correlation matrix enabled us to determine the relative coefficients among TPB constructs. Attitude, subjective norms, and perceived behavioral control were all significant predictors of breastfeeding intention, whereas intention was a strong predictor of breastfeeding behavior. Perceived behavioral control reached a borderline level of significance to breastfeeding behavior. Theoretical and empirical implications are discussed from the perspective of evidence-based practice. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Prediction of CD8+ Epitopes in Leishmania braziliensis Proteins Using EPIBOT: In Silico Search and In Vivo Validation.

    Directory of Open Access Journals (Sweden)

    Angelo Duarte

    Full Text Available Leishmaniasis is caused by intracellular Leishmania parasites that induce a T-cell mediated response associated with recognition of CD4+ and CD8+ T cell Line 1Lineepitopes. Identification of CD8+ antigenic determinants is crucial for vaccine and therapy development. Herein, we developed an open-source software dedicated to search and compile data obtained from currently available on line prediction algorithms.We developed a two-phase algorithm and implemented in an open source software called EPIBOT, that consolidates the results obtained with single prediction algorithms, generating a final output in which epitopes are ranked. EPIBOT was initially trained using a set of 831 known epitopes from 397 proteins from IEDB. We then screened 63 Leishmania braziliensis vaccine candidates with the EPIBOT trained tool to search for CD8+ T cell epitopes. A proof-of-concept experiment was conducted with the top eight CD8+ epitopes, elected by EPIBOT. To do this, the elected peptides were synthesized and validated for their in vivo cytotoxicity. Among the tested epitopes, three were able to induce lysis of pulsed-target cells.Our results show that EPIBOT can successfully search across existing prediction tools, generating a compiled list of candidate CD8+ epitopes. This software is fast and a simple search engine that can be customized to search over different MHC alleles or HLA haplotypes.

  2. Cost Forecasting of Substation Projects Based on Cuckoo Search Algorithm and Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Dongxiao Niu

    2018-01-01

    Full Text Available Accurate prediction of substation project cost is helpful to improve the investment management and sustainability. It is also directly related to the economy of substation project. Ensemble Empirical Mode Decomposition (EEMD can decompose variables with non-stationary sequence signals into significant regularity and periodicity, which is helpful in improving the accuracy of prediction model. Adding the Gauss perturbation to the traditional Cuckoo Search (CS algorithm can improve the searching vigor and precision of CS algorithm. Thus, the parameters and kernel functions of Support Vector Machines (SVM model are optimized. By comparing the prediction results with other models, this model has higher prediction accuracy.

  3. Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships.

    Science.gov (United States)

    Zong, Nansu; Lee, Sungin; Ahn, Jinhyun; Kim, Hong-Gee

    2017-08-01

    The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search-improving search with inputs, keywords and preferences, under different topics. This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs). Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine. The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Upper-Lower Bounds Candidate Sets Searching Algorithm for Bayesian Network Structure Learning

    Directory of Open Access Journals (Sweden)

    Guangyi Liu

    2014-01-01

    Full Text Available Bayesian network is an important theoretical model in artificial intelligence field and also a powerful tool for processing uncertainty issues. Considering the slow convergence speed of current Bayesian network structure learning algorithms, a fast hybrid learning method is proposed in this paper. We start with further analysis of information provided by low-order conditional independence testing, and then two methods are given for constructing graph model of network, which is theoretically proved to be upper and lower bounds of the structure space of target network, so that candidate sets are given as a result; after that a search and scoring algorithm is operated based on the candidate sets to find the final structure of the network. Simulation results show that the algorithm proposed in this paper is more efficient than similar algorithms with the same learning precision.

  5. AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.

    Directory of Open Access Journals (Sweden)

    Jiansong Fang

    Full Text Available Alzheimer's disease (AD is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs to delay disease progression. The in silico prediction of chemical-protein interactions (CPI can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB and recursive partitioning (RP algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.

  6. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    Science.gov (United States)

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  7. High-throughput computational search for strengthening precipitates in alloys

    International Nuclear Information System (INIS)

    Kirklin, S.; Saal, James E.; Hegde, Vinay I.; Wolverton, C.

    2016-01-01

    The search for high-strength alloys and precipitation hardened systems has largely been accomplished through Edisonian trial and error experimentation. Here, we present a novel strategy using high-throughput computational approaches to search for promising precipitate/alloy systems. We perform density functional theory (DFT) calculations of an extremely large space of ∼200,000 potential compounds in search of effective strengthening precipitates for a variety of different alloy matrices, e.g., Fe, Al, Mg, Ni, Co, and Ti. Our search strategy involves screening phases that are likely to produce coherent precipitates (based on small lattice mismatch) and are composed of relatively common alloying elements. When combined with the Open Quantum Materials Database (OQMD), we can computationally screen for precipitates that either have a stable two-phase equilibrium with the host matrix, or are likely to precipitate as metastable phases. Our search produces (for the structure types considered) nearly all currently known high-strength precipitates in a variety of fcc, bcc, and hcp matrices, thus giving us confidence in the strategy. In addition, we predict a number of new, currently-unknown precipitate systems that should be explored experimentally as promising high-strength alloy chemistries.

  8. Optimum Design of Braced Steel Space Frames including Soil-Structure Interaction via Teaching-Learning-Based Optimization and Harmony Search Algorithms

    Directory of Open Access Journals (Sweden)

    Ayse T. Daloglu

    2018-01-01

    Full Text Available Optimum design of braced steel space frames including soil-structure interaction is studied by using harmony search (HS and teaching-learning-based optimization (TLBO algorithms. A three-parameter elastic foundation model is used to incorporate the soil-structure interaction effect. A 10-storey braced steel space frame example taken from literature is investigated according to four different bracing types for the cases with/without soil-structure interaction. X, V, Z, and eccentric V-shaped bracing types are considered in the study. Optimum solutions of examples are carried out by a computer program coded in MATLAB interacting with SAP2000-OAPI for two-way data exchange. The stress constraints according to AISC-ASD (American Institute of Steel Construction-Allowable Stress Design, maximum lateral displacement constraints, interstorey drift constraints, and beam-to-column connection constraints are taken into consideration in the optimum design process. The parameters of the foundation model are calculated depending on soil surface displacements by using an iterative approach. The results obtained in the study show that bracing types and soil-structure interaction play very important roles in the optimum design of steel space frames. Finally, the techniques used in the optimum design seem to be quite suitable for practical applications.

  9. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

    Science.gov (United States)

    2014-01-01

    Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in

  10. Neuro-fuzzy GMDH based particle swarm optimization for prediction of scour depth at downstream of grade control structures

    Directory of Open Access Journals (Sweden)

    Mohammad Najafzadeh

    2015-03-01

    Full Text Available In the present study, neuro-fuzzy based-group method of data handling (NF-GMDH as an adaptive learning network was utilized to predict the maximum scour depth at the downstream of grade-control structures. The NF-GMDH network was developed using particle swarm optimization (PSO. Effective parameters on the scour depth include sediment size, geometry of weir, and flow characteristics in the upstream and downstream of structure. Training and testing of performances were carried out using non-dimensional variables. Datasets were divided into three series of dataset (DS. The testing results of performances were compared with the gene-expression programming (GEP, evolutionary polynomial regression (EPR model, and conventional techniques. The NF-GMDH-PSO network produced lower error of the scour depth prediction than those obtained using the other models. Also, the effective input parameter on the maximum scour depth was determined through a sensitivity analysis.

  11. Predicting user click behaviour in search engine advertisements

    Science.gov (United States)

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  12. Protein Structure Prediction by Protein Threading

    Science.gov (United States)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  13. Chemical Information in Scirus and BASE (Bielefeld Academic Search Engine)

    Science.gov (United States)

    Bendig, Regina B.

    2009-01-01

    The author sought to determine to what extent the two search engines, Scirus and BASE (Bielefeld Academic Search Engines), would be useful to first-year university students as the first point of searching for chemical information. Five topics were searched and the first ten records of each search result were evaluated with regard to the type of…

  14. CentroidFold: a web server for RNA secondary structure prediction

    OpenAIRE

    Sato, Kengo; Hamada, Michiaki; Asai, Kiyoshi; Mituyama, Toutai

    2009-01-01

    The CentroidFold web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the ser...

  15. Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae.

    Science.gov (United States)

    Shahbaaz, Mohd; Ahmad, Faizan; Imtaiyaz Hassan, Md

    2015-06-01

    Haemophilus influenzae is a small pleomorphic Gram-negative bacteria which causes several chronic diseases, including bacteremia, meningitis, cellulitis, epiglottitis, septic arthritis, pneumonia, and empyema. Here we extensively analyzed the sequenced genome of H. influenzae strain Rd KW20 using protein family databases, protein structure prediction, pathways and genome context methods to assign a precise function to proteins whose functions are unknown. These proteins are termed as hypothetical proteins (HPs), for which no experimental information is available. Function prediction of these proteins would surely be supportive to precisely understand the biochemical pathways and mechanism of pathogenesis of Haemophilus influenzae. During the extensive analysis of H. influenzae genome, we found the presence of eight HPs showing lyase activity. Subsequently, we modeled and analyzed three-dimensional structure of all these HPs to determine their functions more precisely. We found these HPs possess cystathionine-β-synthase, cyclase, carboxymuconolactone decarboxylase, pseudouridine synthase A and C, D-tagatose-1,6-bisphosphate aldolase and aminodeoxychorismate lyase-like features, indicating their corresponding functions in the H. influenzae. Lyases are actively involved in the regulation of biosynthesis of various hormones, metabolic pathways, signal transduction, and DNA repair. Lyases are also considered as a key player for various biological processes. These enzymes are critically essential for the survival and pathogenesis of H. influenzae and, therefore, these enzymes may be considered as a potential target for structure-based rational drug design. Our structure-function relationship analysis will be useful to search and design potential lead molecules based on the structure of these lyases, for drug design and discovery.

  16. Energy Consumption Forecasting Using Semantic-Based Genetic Programming with Local Search Optimizer

    Directory of Open Access Journals (Sweden)

    Mauro Castelli

    2015-01-01

    Full Text Available Energy consumption forecasting (ECF is an important policy issue in today’s economies. An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. The paper proposes a semantic based genetic programming framework to address the ECF problem. In particular, we propose a system that finds (quasi-perfect solutions with high probability and that generates models able to produce near optimal predictions also on unseen data. The framework blends a recently developed version of genetic programming that integrates semantic genetic operators with a local search method. The main idea in combining semantic genetic programming and a local searcher is to couple the exploration ability of the former with the exploitation ability of the latter. Experimental results confirm the suitability of the proposed method in predicting the energy consumption. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that including a local searcher in the geometric semantic genetic programming system can speed up the search process and can result in fitter models that are able to produce an accurate forecasting also on unseen data.

  17. Pretraining Cortical Thickness Predicts Subsequent Perceptual Learning Rate in a Visual Search Task.

    Science.gov (United States)

    Frank, Sebastian M; Reavis, Eric A; Greenlee, Mark W; Tse, Peter U

    2016-03-01

    We report that preexisting individual differences in the cortical thickness of brain areas involved in a perceptual learning task predict the subsequent perceptual learning rate. Participants trained in a motion-discrimination task involving visual search for a "V"-shaped target motion trajectory among inverted "V"-shaped distractor trajectories. Motion-sensitive area MT+ (V5) was functionally identified as critical to the task: after 3 weeks of training, activity increased in MT+ during task performance, as measured by functional magnetic resonance imaging. We computed the cortical thickness of MT+ from anatomical magnetic resonance imaging volumes collected before training started, and found that it significantly predicted subsequent perceptual learning rates in the visual search task. Participants with thicker neocortex in MT+ before training learned faster than those with thinner neocortex in that area. A similar association between cortical thickness and training success was also found in posterior parietal cortex (PPC). © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Link Prediction in Evolving Networks Based on Popularity of Nodes.

    Science.gov (United States)

    Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

    2017-08-02

    Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.

  19. A nucleobase-centered coarse-grained representation for structure prediction of RNA motifs.

    Science.gov (United States)

    Poblete, Simón; Bottaro, Sandro; Bussi, Giovanni

    2018-02-28

    We introduce the SPlit-and-conQueR (SPQR) model, a coarse-grained (CG) representation of RNA designed for structure prediction and refinement. In our approach, the representation of a nucleotide consists of a point particle for the phosphate group and an anisotropic particle for the nucleoside. The interactions are, in principle, knowledge-based potentials inspired by the $\\mathcal {E}$SCORE function, a base-centered scoring function. However, a special treatment is given to base-pairing interactions and certain geometrical conformations which are lost in a raw knowledge-based model. This results in a representation able to describe planar canonical and non-canonical base pairs and base-phosphate interactions and to distinguish sugar puckers and glycosidic torsion conformations. The model is applied to the folding of several structures, including duplexes with internal loops of non-canonical base pairs, tetraloops, junctions and a pseudoknot. For the majority of these systems, experimental structures are correctly predicted at the level of individual contacts. We also propose a method for efficiently reintroducing atomistic detail from the CG representation.

  20. An Efficient Energy Constraint Based UAV Path Planning for Search and Coverage

    Directory of Open Access Journals (Sweden)

    German Gramajo

    2017-01-01

    Full Text Available A path planning strategy for a search and coverage mission for a small UAV that maximizes the area covered based on stored energy and maneuverability constraints is presented. The proposed formulation has a high level of autonomy, without requiring an exact choice of optimization parameters, and is appropriate for real-time implementation. The computed trajectory maximizes spatial coverage while closely satisfying terminal constraints on the position of the vehicle and minimizing the time of flight. Comparisons of this formulation to a path planning algorithm based on those with time constraint show equivalent coverage performance but improvement in prediction of overall mission duration and accuracy of the terminal position of the vehicle.

  1. Searching for qualitative research for inclusion in systematic reviews: a structured methodological review.

    Science.gov (United States)

    Booth, Andrew

    2016-05-04

    Qualitative systematic reviews or qualitative evidence syntheses (QES) are increasingly recognised as a way to enhance the value of systematic reviews (SRs) of clinical trials. They can explain the mechanisms by which interventions, evaluated within trials, might achieve their effect. They can investigate differences in effects between different population groups. They can identify which outcomes are most important to patients, carers, health professionals and other stakeholders. QES can explore the impact of acceptance, feasibility, meaningfulness and implementation-related factors within a real world setting and thus contribute to the design and further refinement of future interventions. To produce valid, reliable and meaningful QES requires systematic identification of relevant qualitative evidence. Although the methodologies of QES, including methods for information retrieval, are well-documented, little empirical evidence exists to inform their conduct and reporting. This structured methodological overview examines papers on searching for qualitative research identified from the Cochrane Qualitative and Implementation Methods Group Methodology Register and from citation searches of 15 key papers. A single reviewer reviewed 1299 references. Papers reporting methodological guidance, use of innovative methodologies or empirical studies of retrieval methods were categorised under eight topical headings: overviews and methodological guidance, sampling, sources, structured questions, search procedures, search strategies and filters, supplementary strategies and standards. This structured overview presents a contemporaneous view of information retrieval for qualitative research and identifies a future research agenda. This review concludes that poor empirical evidence underpins current information practice in information retrieval of qualitative research. A trend towards improved transparency of search methods and further evaluation of key search procedures offers

  2. Content-based Music Search and Recommendation System

    Science.gov (United States)

    Takegawa, Kazuki; Hijikata, Yoshinori; Nishida, Shogo

    Recently, the turn volume of music data on the Internet has increased rapidly. This has increased the user's cost to find music data suiting their preference from such a large data set. We propose a content-based music search and recommendation system. This system has an interface for searching and finding music data and an interface for editing a user profile which is necessary for music recommendation. By exploiting the visualization of the feature space of music and the visualization of the user profile, the user can search music data and edit the user profile. Furthermore, by exploiting the infomation which can be acquired from each visualized object in a mutually complementary manner, we make it easier for the user to search music data and edit the user profile. Concretely, the system gives to the user an information obtained from the user profile when searching music data and an information obtained from the feature space of music when editing the user profile.

  3. Cascaded bidirectional recurrent neural networks for protein secondary structure prediction.

    Science.gov (United States)

    Chen, Jinmiao; Chaudhari, Narendra

    2007-01-01

    Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.

  4. Theoretical prediction of low-density hexagonal ZnO hollow structures

    Energy Technology Data Exchange (ETDEWEB)

    Tuoc, Vu Ngoc, E-mail: tuoc.vungoc@hust.edu.vn [Institute of Engineering Physics, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi (Viet Nam); Huan, Tran Doan [Institute of Materials Science, University of Connecticut, Storrs, Connecticut 06269-3136 (United States); Thao, Nguyen Thi [Institute of Engineering Physics, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi (Viet Nam); Hong Duc University, 307 Le Lai, Thanh Hoa City (Viet Nam); Tuan, Le Manh [Hong Duc University, 307 Le Lai, Thanh Hoa City (Viet Nam)

    2016-10-14

    Along with wurtzite and zinc blende, zinc oxide (ZnO) has been found in a large number of polymorphs with substantially different properties and, hence, applications. Therefore, predicting and synthesizing new classes of ZnO polymorphs are of great significance and have been gaining considerable interest. Herein, we perform a density functional theory based tight-binding study, predicting several new series of ZnO hollow structures using the bottom-up approach. The geometry of the building blocks allows for obtaining a variety of hexagonal, low-density nanoporous, and flexible ZnO hollow structures. Their stability is discussed by means of the free energy computed within the lattice-dynamics approach. Our calculations also indicate that all the reported hollow structures are wide band gap semiconductors in the same fashion with bulk ZnO. The electronic band structures of the ZnO hollow structures are finally examined in detail.

  5. Exploration of the search space of the in-core fuel management problem by knowledge-based techniques

    International Nuclear Information System (INIS)

    Galperin, A.

    1995-01-01

    The process of generating reload configuration patterns is presented as a search procedure. The search space of the problem is found to contain ∼ 10 12 possible problem states. If computational resources and execution time necessary to evaluate a single solution are taken into account, this problem may be described as a ''large space search problem.'' Understanding of the structure of the search space, i.e., distribution of the optimal (or nearly optimal) solutions, is necessary to choose an appropriate search method and to utilize adequately domain heuristic knowledge. A worth function is developed based on two performance parameters: cycle length and power peaking factor. A series of numerical experiments was carried out; 300,000 patterns were generated in 40 sessions. All these patterns were analyzed by simulating the power production cycle and by evaluating the two performance parameters. The worth function was calculated and plotted. Analysis of the worth function reveals quite a complicated search space structure. The fine structure shows an extremely large number of local peaks: about one peak per hundred configurations. The direct implication of this discovery is that within a search space of 10 12 states, there are ∼10 10 local optima. Further consideration of the worth function shape shows that the distribution of the local optima forms a contour with much slower variations, where ''better'' or ''worse'' groups of patterns are spaced within a few thousand or tens of thousands of configurations, and finally very broad subregions of the whole space display variations of the worth function, where optimal regions include tens of thousands of patterns and are separated by hundreds of thousands and millions

  6. GeoSearcher: Location-Based Ranking of Search Engine Results.

    Science.gov (United States)

    Watters, Carolyn; Amoudi, Ghada

    2003-01-01

    Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…

  7. Searching Choices: Quantifying Decision-Making Processes Using Search Engine Data.

    Science.gov (United States)

    Moat, Helen Susannah; Olivola, Christopher Y; Chater, Nick; Preis, Tobias

    2016-07-01

    When making a decision, humans consider two types of information: information they have acquired through their prior experience of the world, and further information they gather to support the decision in question. Here, we present evidence that data from search engines such as Google can help us model both sources of information. We show that statistics from search engines on the frequency of content on the Internet can help us estimate the statistical structure of prior experience; and, specifically, we outline how such statistics can inform psychological theories concerning the valuation of human lives, or choices involving delayed outcomes. Turning to information gathering, we show that search query data might help measure human information gathering, and it may predict subsequent decisions. Such data enable us to compare information gathered across nations, where analyses suggest, for example, a greater focus on the future in countries with a higher per capita GDP. We conclude that search engine data constitute a valuable new resource for cognitive scientists, offering a fascinating new tool for understanding the human decision-making process. Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.

  8. A systematic review on popularity, application and characteristics of protein secondary structure prediction tools.

    Science.gov (United States)

    Kashani-Amin, Elaheh; Tabatabaei-Malazy, Ozra; Sakhteman, Amirhossein; Larijani, Bagher; Ebrahim-Habibi, Azadeh

    2018-02-27

    Prediction of proteins' secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple secondary structure prediction (SSP) options is challenging. The current study is an insight onto currently favored methods and tools, within various contexts. A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of 209 studies were finally found eligible to extract data. Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating a SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. This study provides a comprehensive insight about the recent usage of SSP tools which could be helpful for selecting a proper tool's choice. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. Using Artificial Intelligence to Retrieve the Optimal Parameters and Structures of Adaptive Network-Based Fuzzy Inference System for Typhoon Precipitation Forecast Modeling

    Directory of Open Access Journals (Sweden)

    Chien-Lin Huang

    2015-01-01

    Full Text Available This study aims to construct a typhoon precipitation forecast model providing forecasts one to six hours in advance using optimal model parameters and structures retrieved from a combination of the adaptive network-based fuzzy inference system (ANFIS and artificial intelligence. To enhance the accuracy of the precipitation forecast, two structures were then used to establish the precipitation forecast model for a specific lead-time: a single-model structure and a dual-model hybrid structure where the forecast models of higher and lower precipitation were integrated. In order to rapidly, automatically, and accurately retrieve the optimal parameters and structures of the ANFIS-based precipitation forecast model, a tabu search was applied to identify the adjacent radius in subtractive clustering when constructing the ANFIS structure. The coupled structure was also employed to establish a precipitation forecast model across short and long lead-times in order to improve the accuracy of long-term precipitation forecasts. The study area is the Shimen Reservoir, and the analyzed period is from 2001 to 2009. Results showed that the optimal initial ANFIS parameters selected by the tabu search, combined with the dual-model hybrid method and the coupled structure, provided the favors in computation efficiency and high-reliability predictions in typhoon precipitation forecasts regarding short to long lead-time forecasting horizons.

  10. Exploring high-pressure FeB{sub 2}: Structural and electronic properties predictions

    Energy Technology Data Exchange (ETDEWEB)

    Harran, Ismail [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Al Fashir University (Sudan); Wang, Hongyan [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Chen, Yuanzheng, E-mail: cyz@calypso.org.cn [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Jia, Mingzhen [School of Physical Science and Technology, Key Laboratory of Advanced Technologies of Materials, Ministry of Education of China, Southwest Jiaotong University, Chengdu, 610031 (China); Wu, Nannan [School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science & Technology, Baotou, 014010 (China)

    2016-09-05

    The high pressure (HP) structural phase of FeB{sub 2} compound is investigated by using first-principles crystal structure prediction based on the CALYPSO technique. A thermodynamically stable phase of FeB{sub 2} with space group Imma is predicted at pressure above 225 GPa, which is characterized by a layered orthorhombic structure containing puckered graphite-like boron layers. Its electronic and mechanical properties are identified and analyzed. The feature of band structures favors the occurrence of superconductivity, whereas, the calculated Pugh's ratio reveals that the HP Imma structure exhibits ductile mechanical property. - Highlights: • The high pressure structural phase of FeB{sub 2} compound is firstly investigated by the CALYPSO technique. • A thermodynamically stable Imma phase of FeB{sub 2} is predicted at pressure above 225 GPa. • The Imma structure is characterized by a 2D boron network containing puckered graphite-like boron layers. • The band feature of Imma structure favors the occurrence of superconductivity. • The calculated Pugh's ratio suggests that the Imma structure exhibits ductile mechanical property.

  11. Does linear separability really matter? Complex visual search is explained by simple search

    Science.gov (United States)

    Vighneshvel, T.; Arun, S. P.

    2013-01-01

    Visual search in real life involves complex displays with a target among multiple types of distracters, but in the laboratory, it is often tested using simple displays with identical distracters. Can complex search be understood in terms of simple searches? This link may not be straightforward if complex search has emergent properties. One such property is linear separability, whereby search is hard when a target cannot be separated from its distracters using a single linear boundary. However, evidence in favor of linear separability is based on testing stimulus configurations in an external parametric space that need not be related to their true perceptual representation. We therefore set out to assess whether linear separability influences complex search at all. Our null hypothesis was that complex search performance depends only on classical factors such as target-distracter similarity and distracter homogeneity, which we measured using simple searches. Across three experiments involving a variety of artificial and natural objects, differences between linearly separable and nonseparable searches were explained using target-distracter similarity and distracter heterogeneity. Further, simple searches accurately predicted complex search regardless of linear separability (r = 0.91). Our results show that complex search is explained by simple search, refuting the widely held belief that linear separability influences visual search. PMID:24029822

  12. A Method to Predict the Structure and Stability of RNA/RNA Complexes.

    Science.gov (United States)

    Xu, Xiaojun; Chen, Shi-Jie

    2016-01-01

    RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.

  13. Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics.

    Science.gov (United States)

    Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba

    2013-02-01

    Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Predicting fluid responsiveness with transthoracic echocardiography is not yet evidence based

    DEFF Research Database (Denmark)

    Wetterslev, M; Haase, N; Johansen, R R

    2013-01-01

    an integrated tool in the intensive care unit, this systematic review examined studies evaluating the predictive value of TTE for fluid responsiveness. In October 2012, we searched Pubmed, EMBASE and Web of Science for studies evaluating the predictive value of TTE-derived variables for fluid responsiveness...... responsiveness. Of the 4294 evaluated citations, only one study fully met our inclusion criteria. In this study, the predictive value of variations in inferior vena cava diameter (> 16%) for fluid responsiveness was moderate with sensitivity of 71% [95% confidence interval (CI) 44-90], specificity of 100% (95......% CI 73-100) and an area under the receiver operating curve of 0.90 (95% CI 0.73-0.98). Only one study of TTE-based methods fulfilled the criteria for valid assessment of fluid responsiveness. Before recommending the use of TTE in predicting fluid responsiveness, proper evaluation including...

  15. MetaboSearch: tool for mass-based metabolite identification using multiple databases.

    Directory of Open Access Journals (Sweden)

    Bin Zhou

    Full Text Available Searching metabolites against databases according to their masses is often the first step in metabolite identification for a mass spectrometry-based untargeted metabolomics study. Major metabolite databases include Human Metabolome DataBase (HMDB, Madison Metabolomics Consortium Database (MMCD, Metlin, and LIPID MAPS. Since each one of these databases covers only a fraction of the metabolome, integration of the search results from these databases is expected to yield a more comprehensive coverage. However, the manual combination of multiple search results is generally difficult when identification of hundreds of metabolites is desired. We have implemented a web-based software tool that enables simultaneous mass-based search against the four major databases, and the integration of the results. In addition, more complete chemical identifier information for the metabolites is retrieved by cross-referencing multiple databases. The search results are merged based on IUPAC International Chemical Identifier (InChI keys. Besides a simple list of m/z values, the software can accept the ion annotation information as input for enhanced metabolite identification. The performance of the software is demonstrated on mass spectrometry data acquired in both positive and negative ionization modes. Compared with search results from individual databases, MetaboSearch provides better coverage of the metabolome and more complete chemical identifier information.The software tool is available at http://omics.georgetown.edu/MetaboSearch.html.

  16. Enhancing Islamic Students’ Reading Comprehension through Predict Organize Search Summarize Evaluate Strategy

    Directory of Open Access Journals (Sweden)

    Darmayenti Darmayenti

    2017-02-01

    Full Text Available This paper is a report of an experimental research project conducted in a reading comprehension course for first-year students of the Adab Faculty of the State Institute for Islamic Studies Imam Bonjol Padang, West Sumatera, Indonesia, during the academic year 2015/2016. The “Predict Organize Search Summarize Evaluate” (POSSE is one strategy that can enhance students’ comprehension in reading. Two classes of Arabic and History students chosen through cluster random sampling technique were used as the sample of the research. Reading tests were used to collect the data which was given to both of classes on pre-test and post-test. The result of the research showed that the implementation of Predict Organize Search Summarize Evaluate strategy gave a significant difference in term of the students-learning outcome between the students who were taught through POSSE strategy and by traditional one. The finding of the study showed that teaching reading by using POSSE strategy gave significant effect towards students’ reading comprehension. This strategy could improve the students’ reading component on finding topic. It can be concluded that using POSSE Strategy has improved Indonesian students’ reading comprehension. It is also recommended for English lecturers use POSSE strategy as one of teaching strategies for reading comprehension.

  17. Search predicts and changes patience in intertemporal choice

    Science.gov (United States)

    Johnson, Eric J.

    2017-01-01

    Intertemporal choice impacts many important outcomes, such as decisions about health, education, wealth, and the environment. However, the psychological processes underlying decisions involving outcomes at different points in time remain unclear, limiting opportunities to intervene and improve people’s patience. This research examines information-search strategies used during intertemporal choice and their impact on decisions. In experiment 1, we demonstrate that search strategies vary substantially across individuals. We subsequently identify two distinct search strategies across individuals. Comparative searchers, who compare features across options, discount future options less and are more susceptible to acceleration versus delay framing than integrative searchers, who integrate the features of an option. Experiment 2 manipulates search using an unobtrusive method to establish a causal relationship between strategy and choice, randomly assigning participants to conditions promoting either comparative or integrative search. Again, comparative search promotes greater patience than integrative search. Additionally, when participants adopt a comparative search strategy, they also exhibit greater effects of acceleration versus delay framing. Although most participants reported that the manipulation did not change their behavior, promoting comparative search decreased discounting of future rewards substantially and speeded patient choices. These findings highlight the central role that heterogeneity in psychological processes plays in shaping intertemporal choice. Importantly, these results indicate that theories that ignore variability in search strategies may be inadvertently aggregating over different subpopulations that use very different processes. The findings also inform interventions in choice architecture to increase patience and improve consumer welfare. PMID:29078303

  18. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure.

    Science.gov (United States)

    Dagan-Wiener, Ayana; Nissim, Ido; Ben Abu, Natalie; Borgonovo, Gigliola; Bassoli, Angela; Niv, Masha Y

    2017-09-21

    Bitter taste is an innately aversive taste modality that is considered to protect animals from consuming toxic compounds. Yet, bitterness is not always noxious and some bitter compounds have beneficial effects on health. Hundreds of bitter compounds were reported (and are accessible via the BitterDB http://bitterdb.agri.huji.ac.il/dbbitter.php ), but numerous additional bitter molecules are still unknown. The dramatic chemical diversity of bitterants makes bitterness prediction a difficult task. Here we present a machine learning classifier, BitterPredict, which predicts whether a compound is bitter or not, based on its chemical structure. BitterDB was used as the positive set, and non-bitter molecules were gathered from literature to create the negative set. Adaptive Boosting (AdaBoost), based on decision trees machine-learning algorithm was applied to molecules that were represented using physicochemical and ADME/Tox descriptors. BitterPredict correctly classifies over 80% of the compounds in the hold-out test set, and 70-90% of the compounds in three independent external sets and in sensory test validation, providing a quick and reliable tool for classifying large sets of compounds into bitter and non-bitter groups. BitterPredict suggests that about 40% of random molecules, and a large portion (66%) of clinical and experimental drugs, and of natural products (77%) are bitter.

  19. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  20. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  1. Integrating chemical footprinting data into RNA secondary structure prediction.

    Directory of Open Access Journals (Sweden)

    Kourosh Zarringhalam

    Full Text Available Chemical and enzymatic footprinting experiments, such as shape (selective 2'-hydroxyl acylation analyzed by primer extension, yield important information about RNA secondary structure. Indeed, since the [Formula: see text]-hydroxyl is reactive at flexible (loop regions, but unreactive at base-paired regions, shape yields quantitative data about which RNA nucleotides are base-paired. Recently, low error rates in secondary structure prediction have been reported for three RNAs of moderate size, by including base stacking pseudo-energy terms derived from shape data into the computation of minimum free energy secondary structure. Here, we describe a novel method, RNAsc (RNA soft constraints, which includes pseudo-energy terms for each nucleotide position, rather than only for base stacking positions. We prove that RNAsc is self-consistent, in the sense that the nucleotide-specific probabilities of being unpaired in the low energy Boltzmann ensemble always become more closely correlated with the input shape data after application of RNAsc. From this mathematical perspective, the secondary structure predicted by RNAsc should be 'correct', in as much as the shape data is 'correct'. We benchmark RNAsc against the previously mentioned method for eight RNAs, for which both shape data and native structures are known, to find the same accuracy in 7 out of 8 cases, and an improvement of 25% in one case. Furthermore, we present what appears to be the first direct comparison of shape data and in-line probing data, by comparing yeast asp-tRNA shape data from the literature with data from in-line probing experiments we have recently performed. With respect to several criteria, we find that shape data appear to be more robust than in-line probing data, at least in the case of asp-tRNA.

  2. Data base on avian mortality on man-made structures

    Energy Technology Data Exchange (ETDEWEB)

    Dailey, N. S.

    1978-01-01

    A computerized data base concerning avian mortality on man-made structures is available for searching at the Ecological Sciences Information Center of the Information Center Complex, Information Division, Oak Ridge National Laboratory. This data base, which contains entries from the available literature, provides information on avian mortality from either collision into or electrocution on man-made structures. Primary emphasis has been placed on avian collision with obstacles such as television and radio towers, airport ceilometers, transmission lines, and cooling towers. Other structures included in the studies are fences, glass walls and windows, lighthouses, telegraph and telephone wires, buildings, monuments, smokestacks, and water towers.

  3. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

    Directory of Open Access Journals (Sweden)

    Wasik Szymon

    2010-05-01

    Full Text Available Abstract Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA

  4. MCTBI: a web server for predicting metal ion effects in RNA structures.

    Science.gov (United States)

    Sun, Li-Zhen; Zhang, Jing-Xiang; Chen, Shi-Jie

    2017-08-01

    Metal ions play critical roles in RNA structure and function. However, web servers and software packages for predicting ion effects in RNA structures are notably scarce. Furthermore, the existing web servers and software packages mainly neglect ion correlation and fluctuation effects, which are potentially important for RNAs. We here report a new web server, the MCTBI server (http://rna.physics.missouri.edu/MCTBI), for the prediction of ion effects for RNA structures. This server is based on the recently developed MCTBI, a model that can account for ion correlation and fluctuation effects for nucleic acid structures and can provide improved predictions for the effects of metal ions, especially for multivalent ions such as Mg 2+ effects, as shown by extensive theory-experiment test results. The MCTBI web server predicts metal ion binding fractions, the most probable bound ion distribution, the electrostatic free energy of the system, and the free energy components. The results provide mechanistic insights into the role of metal ions in RNA structure formation and folding stability, which is important for understanding RNA functions and the rational design of RNA structures. © 2017 Sun et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  5. Research on perturbation based Monte Carlo reactor criticality search

    International Nuclear Information System (INIS)

    Li Zeguang; Wang Kan; Li Yangliu; Deng Jingkang

    2013-01-01

    Criticality search is a very important aspect in reactor physics analysis. Due to the advantages of Monte Carlo method and the development of computer technologies, Monte Carlo criticality search is becoming more and more necessary and feasible. Traditional Monte Carlo criticality search method is suffered from large amount of individual criticality runs and uncertainty and fluctuation of Monte Carlo results. A new Monte Carlo criticality search method based on perturbation calculation is put forward in this paper to overcome the disadvantages of traditional method. By using only one criticality run to get initial k_e_f_f and differential coefficients of concerned parameter, the polynomial estimator of k_e_f_f changing function is solved to get the critical value of concerned parameter. The feasibility of this method was tested. The results show that the accuracy and efficiency of perturbation based criticality search method are quite inspiring and the method overcomes the disadvantages of traditional one. (authors)

  6. Search predicts and changes patience in intertemporal choice.

    Science.gov (United States)

    Reeck, Crystal; Wall, Daniel; Johnson, Eric J

    2017-11-07

    Intertemporal choice impacts many important outcomes, such as decisions about health, education, wealth, and the environment. However, the psychological processes underlying decisions involving outcomes at different points in time remain unclear, limiting opportunities to intervene and improve people's patience. This research examines information-search strategies used during intertemporal choice and their impact on decisions. In experiment 1, we demonstrate that search strategies vary substantially across individuals. We subsequently identify two distinct search strategies across individuals. Comparative searchers, who compare features across options, discount future options less and are more susceptible to acceleration versus delay framing than integrative searchers, who integrate the features of an option. Experiment 2 manipulates search using an unobtrusive method to establish a causal relationship between strategy and choice, randomly assigning participants to conditions promoting either comparative or integrative search. Again, comparative search promotes greater patience than integrative search. Additionally, when participants adopt a comparative search strategy, they also exhibit greater effects of acceleration versus delay framing. Although most participants reported that the manipulation did not change their behavior, promoting comparative search decreased discounting of future rewards substantially and speeded patient choices. These findings highlight the central role that heterogeneity in psychological processes plays in shaping intertemporal choice. Importantly, these results indicate that theories that ignore variability in search strategies may be inadvertently aggregating over different subpopulations that use very different processes. The findings also inform interventions in choice architecture to increase patience and improve consumer welfare. Copyright © 2017 the Author(s). Published by PNAS.

  7. Composition-Based Prediction of Temperature-Dependent Thermophysical Food Properties: Reevaluating Component Groups and Prediction Models.

    Science.gov (United States)

    Phinney, David Martin; Frelka, John C; Heldman, Dennis Ray

    2017-01-01

    Prediction of temperature-dependent thermophysical properties (thermal conductivity, density, specific heat, and thermal diffusivity) is an important component of process design for food manufacturing. Current models for prediction of thermophysical properties of foods are based on the composition, specifically, fat, carbohydrate, protein, fiber, water, and ash contents, all of which change with temperature. The objectives of this investigation were to reevaluate and improve the prediction expressions for thermophysical properties. Previously published data were analyzed over the temperature range from 10 to 150 °C. These data were analyzed to create a series of relationships between the thermophysical properties and temperature for each food component, as well as to identify the dependence of the thermophysical properties on more specific structural properties of the fats, carbohydrates, and proteins. Results from this investigation revealed that the relationships between the thermophysical properties of the major constituents of foods and temperature can be statistically described by linear expressions, in contrast to the current polynomial models. Links between variability in thermophysical properties and structural properties were observed. Relationships for several thermophysical properties based on more specific constituents have been identified. Distinctions between simple sugars (fructose, glucose, and lactose) and complex carbohydrates (starch, pectin, and cellulose) have been proposed. The relationships between the thermophysical properties and proteins revealed a potential correlation with the molecular weight of the protein. The significance of relating variability in constituent thermophysical properties with structural properties--such as molecular mass--could significantly improve composition-based prediction models and, consequently, the effectiveness of process design. © 2016 Institute of Food Technologists®.

  8. Search and rescue in collapsed structures: engineering and social science aspects.

    Science.gov (United States)

    El-Tawil, Sherif; Aguirre, Benigno

    2010-10-01

    This paper discusses the social science and engineering dimensions of search and rescue (SAR) in collapsed buildings. First, existing information is presented on factors that influence the behaviour of trapped victims, particularly human, physical, socioeconomic and circumstantial factors. Trapped victims are most often discussed in the context of structural collapse and injuries sustained. Most studies in this area focus on earthquakes as the type of disaster that produces the most extensive structural damage. Second, information is set out on the engineering aspects of urban search and rescue (USAR) in the United States, including the role of structural engineers in USAR operations, training and certification of structural specialists, and safety and general procedures. The use of computational simulation to link the engineering and social science aspects of USAR is discussed. This could supplement training of local SAR groups and USAR teams, allowing them to understand better the collapse process and how voids form in a rubble pile. A preliminary simulation tool developed for this purpose is described. © 2010 The Author(s). Journal compilation © Overseas Development Institute, 2010.

  9. Ligand and structure-based classification models for Prediction of P-glycoprotein inhibitors

    DEFF Research Database (Denmark)

    Klepsch, Freya; Poongavanam, Vasanthanathan; Ecker, Gerhard Franz

    2014-01-01

    an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function Chem...

  10. Electronic structure prediction via data-mining the empirical pseudopotential method

    Energy Technology Data Exchange (ETDEWEB)

    Zenasni, H; Aourag, H [LEPM, URMER, Departement of Physics, University Abou Bakr Belkaid, Tlemcen 13000 (Algeria); Broderick, S R; Rajan, K [Department of Materials Science and Engineering, Iowa State University, Ames, Iowa 50011-2230 (United States)

    2010-01-15

    We introduce a new approach for accelerating the calculation of the electronic structure of new materials by utilizing the empirical pseudopotential method combined with data mining tools. Combining data mining with the empirical pseudopotential method allows us to convert an empirical approach to a predictive approach. Here we consider tetrahedrally bounded III-V Bi semiconductors, and through the prediction of form factors based on basic elemental properties we can model the band structure and charge density for these semi-conductors, for which limited results exist. This work represents a unique approach to modeling the electronic structure of a material which may be used to identify new promising semi-conductors and is one of the few efforts utilizing data mining at an electronic level. (Abstract Copyright [2010], Wiley Periodicals, Inc.)

  11. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  12. Cooperative Multiagent System for Parking Availability Prediction Based on Time Varying Dynamic Markov Chains

    Directory of Open Access Journals (Sweden)

    Surafel Luleseged Tilahun

    2017-01-01

    Full Text Available Traffic congestion is one of the main issues in the study of transportation planning and management. It creates different problems including environmental pollution and health problem and incurs a cost which is increasing through years. One-third of this congestion is created by cars searching for parking places. Drivers may be aware that parking places are fully occupied but will drive around hoping that a parking place may become vacant. Opportunistic services, involving learning, predicting, and exploiting Internet of Things scenarios, are able to adapt to dynamic unforeseen situations and have the potential to ease parking search issues. Hence, in this paper, a cooperative dynamic prediction mechanism between multiple agents for parking space availability in the neighborhood, integrating foreseen and unforeseen events and adapting for long-term changes, is proposed. An agent in each parking place will use a dynamic and time varying Markov chain to predict the parking availability and these agents will communicate to produce the parking availability prediction in the whole neighborhood. Furthermore, a learning approach is proposed where the system can adapt to different changes in the parking demand including long-term changes. Simulation results, using synthesized data based on an actual parking lot data from a shopping mall in Geneva, show that the proposed model is promising based on the learning accuracy with service adaptation and performance in different cases.

  13. Structure-based characterization of multiprotein complexes.

    Science.gov (United States)

    Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J

    2014-07-08

    Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A comprehensive comparison of comparative RNA structure prediction approaches

    DEFF Research Database (Denmark)

    Gardner, P. P.; Giegerich, R.

    2004-01-01

    -finding and multiple-sequence-alignment algorithms. Results Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. Conclusions We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms......Background An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene...

  15. Automated Search-Based Robustness Testing for Autonomous Vehicle Software

    Directory of Open Access Journals (Sweden)

    Kevin M. Betts

    2016-01-01

    Full Text Available Autonomous systems must successfully operate in complex time-varying spatial environments even when dealing with system faults that may occur during a mission. Consequently, evaluating the robustness, or ability to operate correctly under unexpected conditions, of autonomous vehicle control software is an increasingly important issue in software testing. New methods to automatically generate test cases for robustness testing of autonomous vehicle control software in closed-loop simulation are needed. Search-based testing techniques were used to automatically generate test cases, consisting of initial conditions and fault sequences, intended to challenge the control software more than test cases generated using current methods. Two different search-based testing methods, genetic algorithms and surrogate-based optimization, were used to generate test cases for a simulated unmanned aerial vehicle attempting to fly through an entryway. The effectiveness of the search-based methods in generating challenging test cases was compared to both a truth reference (full combinatorial testing and the method most commonly used today (Monte Carlo testing. The search-based testing techniques demonstrated better performance than Monte Carlo testing for both of the test case generation performance metrics: (1 finding the single most challenging test case and (2 finding the set of fifty test cases with the highest mean degree of challenge.

  16. A Systematic Understanding of Successful Web Searches in Information-Based Tasks

    Science.gov (United States)

    Zhou, Mingming

    2013-01-01

    The purpose of this study is to research how Chinese university students solve information-based problems. With the Search Performance Index as the measure of search success, participants were divided into high, medium and low-performing groups. Based on their web search logs, these three groups were compared along five dimensions of the search…

  17. Development of health information search engine based on metadata and ontology.

    Science.gov (United States)

    Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

    2014-04-01

    The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.

  18. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements

    KAUST Repository

    Guturu, H.

    2013-11-11

    Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and \\'through-DNA\\' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.

  19. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements

    KAUST Repository

    Guturu, H.; Doxey, A. C.; Wenger, A. M.; Bejerano, G.

    2013-01-01

    Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.

  20. Nucleic acid secondary structure prediction and display.

    OpenAIRE

    Stüber, K

    1986-01-01

    A set of programs has been developed for the prediction and display of nucleic acid secondary structures. Information from experimental data can be used to restrict or enforce secondary structural elements. The predictions can be displayed either on normal line printers or on graphic devices like plotters or graphic terminals.

  1. Global stability-based design optimization of truss structures using ...

    Indian Academy of Sciences (India)

    Furthermore, a pure pareto-ranking based multi-objective optimization model is employed for the design optimization of the truss structure with multiple objectives. The computational performance of the optimization model is increased by implementing an island model into its evolutionary search mechanism. The proposed ...

  2. Pharmacophore searching: A potential solution for correcting unknown ligands (UNK) labelling errors in Protein Data Bank (PDB'S).

    Science.gov (United States)

    Ibrahim, Musadiq; Lapthorn, Adrian Jonathan; Ibrahim, Mohammad

    2017-08-01

    The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, error-free as much as possible. In this study, we have critically examined PDB structures of 292 protein molecules which have been deposited in the repository along with potentially incorrect ligands labelled as Unknown ligands (UNK). Pharmacophores were generated for all the protein structures by using Discovery Studio Visualizer (DSV) and Accelrys, Catalyst ® . The generated pharmacophores were subjected to the database search containing the reported ligand. Ligands obtained through Pharmacophore searching were then checked for fitting the observed electron density map by using Coot ® . The predicted ligands obtained via Pharmacophore searching fitted well with the observed electron density map, in comparison to the ligands reported in the PDB's. Based on our study we have learned that till may 2016, among 292 submitted structures in the PDB, at least 20 structures have ligands with a clear electron density but have been incorrectly labelled as unknown ligands (UNK). We have demonstrated that Pharmacophore searching and Coot ® can provide potential help to find suitable known ligands for these protein structures, the former for ligand search and the latter for electron density analysis. The use of these two techniques can facilitate the quick and reliable labelling of ligands where the electron density map serves as a reference. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Supervised learning of tools for content-based search of image databases

    Science.gov (United States)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  4. Polymer physics predicts the effects of structural variants on chromatin architecture.

    Science.gov (United States)

    Bianco, Simona; Lupiáñez, Darío G; Chiariello, Andrea M; Annunziatella, Carlo; Kraft, Katerina; Schöpflin, Robert; Wittler, Lars; Andrey, Guillaume; Vingron, Martin; Pombo, Ana; Mundlos, Stefan; Nicodemi, Mario

    2018-05-01

    Structural variants (SVs) can result in changes in gene expression due to abnormal chromatin folding and cause disease. However, the prediction of such effects remains a challenge. Here we present a polymer-physics-based approach (PRISMR) to model 3D chromatin folding and to predict enhancer-promoter contacts. PRISMR predicts higher-order chromatin structure from genome-wide chromosome conformation capture (Hi-C) data. Using the EPHA4 locus as a model, the effects of pathogenic SVs are predicted in silico and compared to Hi-C data generated from mouse limb buds and patient-derived fibroblasts. PRISMR deconvolves the folding complexity of the EPHA4 locus and identifies SV-induced ectopic contacts and alterations of 3D genome organization in homozygous or heterozygous states. We show that SVs can reconfigure topologically associating domains, thereby producing extensive rewiring of regulatory interactions and causing disease by gene misexpression. PRISMR can be used to predict interactions in silico, thereby providing a tool for analyzing the disease-causing potential of SVs.

  5. Study on MPGA-BP of Gravity Dam Deformation Prediction

    Directory of Open Access Journals (Sweden)

    Xiaoyu Wang

    2017-01-01

    Full Text Available Displacement is an important physical quantity of hydraulic structures deformation monitoring, and its prediction accuracy is the premise of ensuring the safe operation. Most existing metaheuristic methods have three problems: (1 falling into local minimum easily, (2 slowing convergence, and (3 the initial value’s sensitivity. Resolving these three problems and improving the prediction accuracy necessitate the application of genetic algorithm-based backpropagation (GA-BP neural network and multiple population genetic algorithm (MPGA. A hybrid multiple population genetic algorithm backpropagation (MPGA-BP neural network algorithm is put forward to optimize deformation prediction from periodic monitoring surveys of hydraulic structures. This hybrid model is employed for analyzing the displacement of a gravity dam in China. The results show the proposed model is superior to an ordinary BP neural network and statistical regression model in the aspect of global search, convergence speed, and prediction accuracy.

  6. Perturbation based Monte Carlo criticality search in density, enrichment and concentration

    International Nuclear Information System (INIS)

    Li, Zeguang; Wang, Kan; Deng, Jingkang

    2015-01-01

    Highlights: • A new perturbation based Monte Carlo criticality search method is proposed. • The method could get accurate results with only one individual criticality run. • The method is used to solve density, enrichment and concentration search problems. • Results show the feasibility and good performances of this method. • The relationship between results’ accuracy and perturbation order is discussed. - Abstract: Criticality search is a very important aspect in reactor physics analysis. Due to the advantages of Monte Carlo method and the development of computer technologies, Monte Carlo criticality search is becoming more and more necessary and feasible. Existing Monte Carlo criticality search methods need large amount of individual criticality runs and may have unstable results because of the uncertainties of criticality results. In this paper, a new perturbation based Monte Carlo criticality search method is proposed and discussed. This method only needs one individual criticality calculation with perturbation tallies to estimate k eff changing function using initial k eff and differential coefficients results, and solves polynomial equations to get the criticality search results. The new perturbation based Monte Carlo criticality search method is implemented in the Monte Carlo code RMC, and criticality search problems in density, enrichment and concentration are taken out. Results show that this method is quite inspiring in accuracy and efficiency, and has advantages compared with other criticality search methods

  7. Adaptive Neuro-Fuzzy Inference System Models for Force Prediction of a Mechatronic Flexible Structure

    DEFF Research Database (Denmark)

    Achiche, S.; Shlechtingen, M.; Raison, M.

    2016-01-01

    This paper presents the results obtained from a research work investigating the performance of different Adaptive Neuro-Fuzzy Inference System (ANFIS) models developed to predict excitation forces on a dynamically loaded flexible structure. For this purpose, a flexible structure is equipped...... obtained from applying a random excitation force on the flexible structure. The performance of the developed models is evaluated by analyzing the prediction capabilities based on a normalized prediction error. The frequency domain is considered to analyze the similarity of the frequencies in the predicted...... of the sampling frequency and sensor location on the model performance is investigated. The results obtained in this paper show that ANFIS models can be used to set up reliable force predictors for dynamical loaded flexible structures, when a certain degree of inaccuracy is accepted. Furthermore, the comparison...

  8. Impact of Glaucoma and Dry Eye on Text-Based Searching

    Science.gov (United States)

    Sun, Michelle J.; Rubin, Gary S.; Akpek, Esen K.; Ramulu, Pradeep Y.

    2017-01-01

    Purpose We determine if visual field loss from glaucoma and/or measures of dry eye severity are associated with difficulty searching, as judged by slower search times on a text-based search task. Methods Glaucoma patients with bilateral visual field (VF) loss, patients with clinically significant dry eye, and normally-sighted controls were enrolled from the Wilmer Eye Institute clinics. Subjects searched three Yellow Pages excerpts for a specific phone number, and search time was recorded. Results A total of 50 glaucoma subjects, 40 dry eye subjects, and 45 controls completed study procedures. On average, glaucoma patients exhibited 57% longer search times compared to controls (95% confidence interval [CI], 26%–96%, P Dry eye subjects demonstrated similar search times compared to controls, though worse Ocular Surface Disease Index (OSDI) vision-related subscores were associated with longer search times (P dry eye (P > 0.08 for Schirmer's testing without anesthesia, corneal fluorescein staining, and tear film breakup time). Conclusions Text-based visual search is slower for glaucoma patients with greater levels of VF loss and dry eye patients with greater self-reported visual difficulty, and these difficulties may contribute to decreased quality of life in these groups. Translational Relevance Visual search is impaired in glaucoma and dry eye groups compared to controls, highlighting the need for compensatory strategies and tools to assist individuals in overcoming their deficiencies. PMID:28670502

  9. Evidence-based librarianship: searching for the needed EBL evidence.

    Science.gov (United States)

    Eldredge, J D

    2000-01-01

    This paper discusses the challenges of finding evidence needed to implement Evidence-Based Librarianship (EBL). Focusing first on database coverage for three health sciences librarianship journals, the article examines the information contents of different databases. Strategies are needed to search for relevant evidence in the library literature via these databases, and the problems associated with searching the grey literature of librarianship. Database coverage, plausible search strategies, and the grey literature of library science all pose challenges to finding the needed research evidence for practicing EBL. Health sciences librarians need to ensure that systems are designed that can track and provide access to needed research evidence to support Evidence-Based Librarianship (EBL).

  10. Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2008-07-01

    Full Text Available Abstract Background Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins. Results By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states. Conclusion Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold is available at http://faculty.cs.tamu.edu/shsze/ssfold.

  11. Protein structure refinement using a quantum mechanics-based chemical shielding predictor

    DEFF Research Database (Denmark)

    Bratholm, Lars Andersen; Jensen, Jan Halborg

    2017-01-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor...... of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic...

  12. Computational predictions of zinc oxide hollow structures

    Science.gov (United States)

    Tuoc, Vu Ngoc; Huan, Tran Doan; Thao, Nguyen Thi

    2018-03-01

    Nanoporous materials are emerging as potential candidates for a wide range of technological applications in environment, electronic, and optoelectronics, to name just a few. Within this active research area, experimental works are predominant while theoretical/computational prediction and study of these materials face some intrinsic challenges, one of them is how to predict porous structures. We propose a computationally and technically feasible approach for predicting zinc oxide structures with hollows at the nano scale. The designed zinc oxide hollow structures are studied with computations using the density functional tight binding and conventional density functional theory methods, revealing a variety of promising mechanical and electronic properties, which can potentially find future realistic applications.

  13. Capturing alternative secondary structures of RNA by decomposition of base-pairing probabilities.

    Science.gov (United States)

    Hagio, Taichi; Sakuraba, Shun; Iwakiri, Junichi; Mori, Ryota; Asai, Kiyoshi

    2018-02-19

    It is known that functional RNAs often switch their functions by forming different secondary structures. Popular tools for RNA secondary structures prediction, however, predict the single 'best' structures, and do not produce alternative structures. There are bioinformatics tools to predict suboptimal structures, but it is difficult to detect which alternative secondary structures are essential. We proposed a new computational method to detect essential alternative secondary structures from RNA sequences by decomposing the base-pairing probability matrix. The decomposition is calculated by a newly implemented software tool, RintW, which efficiently computes the base-pairing probability distributions over the Hamming distance from arbitrary reference secondary structures. The proposed approach has been demonstrated on ROSE element RNA thermometer sequence and Lysine RNA ribo-switch, showing that the proposed approach captures conformational changes in secondary structures. We have shown that alternative secondary structures are captured by decomposing base-paring probabilities over Hamming distance. Source code is available from http://www.ncRNA.org/RintW .

  14. Knowledge base, information search and intention to adopt innovation

    NARCIS (Netherlands)

    Rijnsoever, van F.J.; Castaldi, C.

    2008-01-01

    Innovation is a process that involves searching for new information. This paper builds upon theoretical insights on individual and organizational learning and proposes a knowledge based model of how actors search for information when confronted with innovation. The model takes into account different

  15. Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine

    Science.gov (United States)

    Alghoson, Abdullah

    2017-01-01

    Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…

  16. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures

    DEFF Research Database (Denmark)

    Andersen, P.H.; Nielsen, Morten; Lund, Ole

    2006-01-01

    . We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15...... experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification....

  17. Cube search, revisited

    Science.gov (United States)

    Zhang, Xuetao; Huang, Jie; Yigit-Elliott, Serap; Rosenholtz, Ruth

    2015-01-01

    Observers can quickly search among shaded cubes for one lit from a unique direction. However, replace the cubes with similar 2-D patterns that do not appear to have a 3-D shape, and search difficulty increases. These results have challenged models of visual search and attention. We demonstrate that cube search displays differ from those with “equivalent” 2-D search items in terms of the informativeness of fairly low-level image statistics. This informativeness predicts peripheral discriminability of target-present from target-absent patches, which in turn predicts visual search performance, across a wide range of conditions. Comparing model performance on a number of classic search tasks, cube search does not appear unexpectedly easy. Easy cube search, per se, does not provide evidence for preattentive computation of 3-D scene properties. However, search asymmetries derived from rotating and/or flipping the cube search displays cannot be explained by the information in our current set of image statistics. This may merely suggest a need to modify the model's set of 2-D image statistics. Alternatively, it may be difficult cube search that provides evidence for preattentive computation of 3-D scene properties. By attributing 2-D luminance variations to a shaded 3-D shape, 3-D scene understanding may slow search for 2-D features of the target. PMID:25780063

  18. Data-Based Predictive Control with Multirate Prediction Step

    Science.gov (United States)

    Barlow, Jonathan S.

    2010-01-01

    Data-based predictive control is an emerging control method that stems from Model Predictive Control (MPC). MPC computes current control action based on a prediction of the system output a number of time steps into the future and is generally derived from a known model of the system. Data-based predictive control has the advantage of deriving predictive models and controller gains from input-output data. Thus, a controller can be designed from the outputs of complex simulation code or a physical system where no explicit model exists. If the output data happens to be corrupted by periodic disturbances, the designed controller will also have the built-in ability to reject these disturbances without the need to know them. When data-based predictive control is implemented online, it becomes a version of adaptive control. One challenge of MPC is computational requirements increasing with prediction horizon length. This paper develops a closed-loop dynamic output feedback controller that minimizes a multi-step-ahead receding-horizon cost function with multirate prediction step. One result is a reduced influence of prediction horizon and the number of system outputs on the computational requirements of the controller. Another result is an emphasis on portions of the prediction window that are sampled more frequently. A third result is the ability to include more outputs in the feedback path than in the cost function.

  19. Stochastic Extreme Load Predictions for Marine Structures

    DEFF Research Database (Denmark)

    Jensen, Jørgen Juncher

    1999-01-01

    Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non-linearity of the ......Development of rational design criteria for marine structures requires reliable estimates for the maximum wave-induced loads the structure may encounter during its operational lifetime. The paper discusses various methods for extreme value predictions taking into account the non......-linearity of the waves and the response. As example the wave-induced bending moment in the ship hull girder is considered....

  20. SciRide Finder: a citation-based paradigm in biomedical literature search.

    Science.gov (United States)

    Volanakis, Adam; Krawczyk, Konrad

    2018-04-18

    There are more than 26 million peer-reviewed biomedical research items according to Medline/PubMed. This breadth of information is indicative of the progress in biomedical sciences on one hand, but an overload for scientists performing literature searches on the other. A major portion of scientific literature search is to find statements, numbers and protocols that can be cited to build an evidence-based narrative for a new manuscript. Because science builds on prior knowledge, such information has likely been written out and cited in an older manuscript. Thus, Cited Statements, pieces of text from scientific literature supported by citing other peer-reviewed publications, carry significant amount of condensed information on prior art. Based on this principle, we propose a literature search service, SciRide Finder (finder.sciride.org), which constrains the search corpus to such Cited Statements only. We demonstrate that Cited Statements can carry different information to this found in titles/abstracts and full text, giving access to alternative literature search results than traditional search engines. We further show how presenting search results as a list of Cited Statements allows researchers to easily find information to build an evidence-based narrative for their own manuscripts.

  1. Object-based target templates guide attention during visual search.

    Science.gov (United States)

    Berggren, Nick; Eimer, Martin

    2018-05-03

    During visual search, attention is believed to be controlled in a strictly feature-based fashion, without any guidance by object-based target representations. To challenge this received view, we measured electrophysiological markers of attentional selection (N2pc component) and working memory (sustained posterior contralateral negativity; SPCN) in search tasks where two possible targets were defined by feature conjunctions (e.g., blue circles and green squares). Critically, some search displays also contained nontargets with two target features (incorrect conjunction objects, e.g., blue squares). Because feature-based guidance cannot distinguish these objects from targets, any selective bias for targets will reflect object-based attentional control. In Experiment 1, where search displays always contained only one object with target-matching features, targets and incorrect conjunction objects elicited identical N2pc and SPCN components, demonstrating that attentional guidance was entirely feature-based. In Experiment 2, where targets and incorrect conjunction objects could appear in the same display, clear evidence for object-based attentional control was found. The target N2pc became larger than the N2pc to incorrect conjunction objects from 250 ms poststimulus, and only targets elicited SPCN components. This demonstrates that after an initial feature-based guidance phase, object-based templates are activated when they are required to distinguish target and nontarget objects. These templates modulate visual processing and control access to working memory, and their activation may coincide with the start of feature integration processes. Results also suggest that while multiple feature templates can be activated concurrently, only a single object-based target template can guide attention at any given time. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  2. Predictive modeling of neuroanatomic structures for brain atrophy detection

    Science.gov (United States)

    Hu, Xintao; Guo, Lei; Nie, Jingxin; Li, Kaiming; Liu, Tianming

    2010-03-01

    In this paper, we present an approach of predictive modeling of neuroanatomic structures for the detection of brain atrophy based on cross-sectional MRI image. The underlying premise of applying predictive modeling for atrophy detection is that brain atrophy is defined as significant deviation of part of the anatomy from what the remaining normal anatomy predicts for that part. The steps of predictive modeling are as follows. The central cortical surface under consideration is reconstructed from brain tissue map and Regions of Interests (ROI) on it are predicted from other reliable anatomies. The vertex pair-wise distance between the predicted vertex and the true one within the abnormal region is expected to be larger than that of the vertex in normal brain region. Change of white matter/gray matter ratio within a spherical region is used to identify the direction of vertex displacement. In this way, the severity of brain atrophy can be defined quantitatively by the displacements of those vertices. The proposed predictive modeling method has been evaluated by using both simulated atrophies and MRI images of Alzheimer's disease.

  3. Prediction of residential radon exposure of the whole Swiss population: comparison of model-based predictions with measurement-based predictions.

    Science.gov (United States)

    Hauri, D D; Huss, A; Zimmermann, F; Kuehni, C E; Röösli, M

    2013-10-01

    Radon plays an important role for human exposure to natural sources of ionizing radiation. The aim of this article is to compare two approaches to estimate mean radon exposure in the Swiss population: model-based predictions at individual level and measurement-based predictions based on measurements aggregated at municipality level. A nationwide model was used to predict radon levels in each household and for each individual based on the corresponding tectonic unit, building age, building type, soil texture, degree of urbanization, and floor. Measurement-based predictions were carried out within a health impact assessment on residential radon and lung cancer. Mean measured radon levels were corrected for the average floor distribution and weighted with population size of each municipality. Model-based predictions yielded a mean radon exposure of the Swiss population of 84.1 Bq/m(3) . Measurement-based predictions yielded an average exposure of 78 Bq/m(3) . This study demonstrates that the model- and the measurement-based predictions provided similar results. The advantage of the measurement-based approach is its simplicity, which is sufficient for assessing exposure distribution in a population. The model-based approach allows predicting radon levels at specific sites, which is needed in an epidemiological study, and the results do not depend on how the measurement sites have been selected. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  4. Development of a special approach of the mineralization localization zones prediction based on the combination and the geoinformation analysis of heterogeneous geodata

    Science.gov (United States)

    Ivanova, Julia

    2014-05-01

    The complexity of any task solving, including tasks in the Earth Sciences, depends on the completeness of the information that is available. The prediction of the mineralization zone localization is a task with incomplete information. The tasks of prediction are complicated because of search data difficult formalize, and the absent of single information structures of the representation of the search data. These facts complicate the process of structuring, processing and analysis of information. Geodata that need to process are presented in various formats: raster two-dimensional and three-dimensional fields, vector layers of polygons and lines, point markable layers, the spectral and discrete, quantized and continuous, analog and digital forms, as well as chemical formalization. In this form representative data cannot be combining into superclasses. At the same time the information content of geodata that are applied individually is very small. While a number of low informative features, which can be obtained in the process of research of mineralization zones are usually redundant. As a result the quality of knowledge that can be obtained from the search data decreases, as well as the technological cycle of information processing increases. Additionally, that leads to exploitation of datasets, and production of large shared datasets [1]. To solve efficiently the tasks of predicting, it is necessary to use union heterogeneous search features, accumulated factual data and modern science-based mathematical apparatus of processing and analysis of the information. As well young branches of human knowledge help to solve this task: remote sensing, geoinformatics, Earth and Space Science Informatics [2], apparatus of catastrophe theory and nonlinear dynamics, game theory. The purpose of the suggested approach is to increase informational content, and to reduce of geodata redundancy to improve the accuracy of the prediction of mineralization zones. The developed algorithm

  5. Teaching AI Search Algorithms in a Web-Based Educational System

    Science.gov (United States)

    Grivokostopoulou, Foteini; Hatzilygeroudis, Ioannis

    2013-01-01

    In this paper, we present a way of teaching AI search algorithms in a web-based adaptive educational system. Teaching is based on interactive examples and exercises. Interactive examples, which use visualized animations to present AI search algorithms in a step-by-step way with explanations, are used to make learning more attractive. Practice…

  6. Antibody modeling using the prediction of immunoglobulin structure (PIGS) web server [corrected].

    Science.gov (United States)

    Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna; Tramontano, Anna

    2014-12-01

    Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.

  7. Working-memory capacity predicts the executive control of visual search among distractors: the influences of sustained and selective attention.

    Science.gov (United States)

    Poole, Bradley J; Kane, Michael J

    2009-07-01

    Variation in working-memory capacity (WMC) predicts individual differences in only some attention-control capabilities. Whereas higher WMC subjects outperform lower WMC subjects in tasks requiring the restraint of prepotent but inappropriate responses, and the constraint of attentional focus to target stimuli against distractors, they do not differ in prototypical visual-search tasks, even those that yield steep search slopes and engender top-down control. The present three experiments tested whether WMC, as measured by complex memory span tasks, would predict search latencies when the 1-8 target locations to be searched appeared alone, versus appearing among distractor locations to be ignored, with the latter requiring selective attentional focus. Subjects viewed target-location cues and then fixated on those locations over either long (1,500-1,550 ms) or short (300 ms) delays. Higher WMC subjects identified targets faster than did lower WMC subjects only in the presence of distractors and only over long fixation delays. WMC thus appears to affect subjects' ability to maintain a constrained attentional focus over time.

  8. Prediction of material damage in orthotropic metals for virtual structural testing

    OpenAIRE

    Ravindran, S.

    2010-01-01

    Models based on the Continuum Damage Mechanics principle are increasingly used for predicting the initiation and growth of damage in materials. The growing reliance on 3-D finite element (FE) virtual structural testing demands implementation and validation of robust material models that can predict the material behaviour accurately. The use of these models within numerical analyses requires suitable material data. EU aerospace companies along with Cranfield University and other similar resear...

  9. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  10. A quantitative prediction model of SCC rate for nuclear structure materials in high temperature water based on crack tip creep strain rate

    International Nuclear Information System (INIS)

    Yang, F.Q.; Xue, H.; Zhao, L.Y.; Fang, X.R.

    2014-01-01

    Highlights: • Creep is considered to be the primary mechanical factor of crack tip film degradation. • The prediction model of SCC rate is based on crack tip creep strain rate. • The SCC rate calculated at the secondary stage of creep is recommended. • The effect of stress intensity factor on SCC growth rate is discussed. - Abstract: The quantitative prediction of stress corrosion cracking (SCC) of structure materials is essential in safety assessment of nuclear power plants. A new quantitative prediction model is proposed by combining the Ford–Andresen model, a crack tip creep model and an elastic–plastic finite element method. The creep at the crack tip is considered to be the primary mechanical factor of protective film degradation, and the creep strain rate at the crack tip is suggested as primary mechanical factor in predicting the SCC rate. The SCC rates at secondary stage of creep are recommended when using the approach introduced in this study to predict the SCC rates of materials in high temperature water. The proposed approach can be used to understand the SCC crack growth in structural materials of light water reactors

  11. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Prediction of shot success for basketball free throws: visual search strategy.

    Science.gov (United States)

    Uchida, Yusuke; Mizuguchi, Nobuaki; Honda, Masaaki; Kanosue, Kazuyuki

    2014-01-01

    In ball games, players have to pay close attention to visual information in order to predict the movements of both the opponents and the ball. Previous studies have indicated that players primarily utilise cues concerning the ball and opponents' body motion. The information acquired must be effective for observing players to select the subsequent action. The present study evaluated the effects of changes in the video replay speed on the spatial visual search strategy and ability to predict free throw success. We compared eye movements made while observing a basketball free throw by novices and experienced basketball players. Correct response rates were close to chance (50%) at all video speeds for the novices. The correct response rate of experienced players was significantly above chance (and significantly above that of the novices) at the normal speed, but was not different from chance at both slow and fast speeds. Experienced players gazed more on the lower part of the player's body when viewing a normal speed video than the novices. The players likely detected critical visual information to predict shot success by properly moving their gaze according to the shooter's movements. This pattern did not change when the video speed was decreased, but changed when it was increased. These findings suggest that temporal information is important for predicting action outcomes and that such outcomes are sensitive to video speed.

  13. Predicting and validating protein interactions using network structure.

    Directory of Open Access Journals (Sweden)

    Pao-Yang Chen

    2008-07-01

    Full Text Available Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.

  14. Structural Search for High Pressure CS2 and Xe-Cl Compounds

    Science.gov (United States)

    Zarifi, Niloofar; Tse, John S.

    2018-04-01

    The recent successful implementation of several methodologies for the prediction of crystal structures based on the first-principles electronic structure have ushered in a new area of computational chemistry. In this study, the two most popular methods, namely genetic evolution and particle swarm optimization, were applied to the investigation of stable crystalline polymorphs of solid carbon disulfide and xenon halides at high pressure. It was found that both methods have their own merits. However, there are subtleties that need to be considered for the proper execution of the methods. We found a two-dimensional (2D) layered structure that may be responsible for the superconductivity in CS2. Except for XeCl2, no thermodynamically stable crystalline Xe halides were found under 60 GPa in the halide-rich region of the phase diagram.

  15. Web search queries can predict stock market volumes.

    Science.gov (United States)

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  16. Web search queries can predict stock market volumes.

    Directory of Open Access Journals (Sweden)

    Ilaria Bordino

    Full Text Available We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  17. Protein secondary structure prediction using modular reciprocal bidirectional recurrent neural networks.

    Science.gov (United States)

    Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali

    2010-12-01

    The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  18. Q-learning-based adjustable fixed-phase quantum Grover search algorithm

    International Nuclear Information System (INIS)

    Guo Ying; Shi Wensha; Wang Yijun; Hu, Jiankun

    2017-01-01

    We demonstrate that the rotation phase can be suitably chosen to increase the efficiency of the phase-based quantum search algorithm, leading to a dynamic balance between iterations and success probabilities of the fixed-phase quantum Grover search algorithm with Q-learning for a given number of solutions. In this search algorithm, the proposed Q-learning algorithm, which is a model-free reinforcement learning strategy in essence, is used for performing a matching algorithm based on the fraction of marked items λ and the rotation phase α. After establishing the policy function α = π(λ), we complete the fixed-phase Grover algorithm, where the phase parameter is selected via the learned policy. Simulation results show that the Q-learning-based Grover search algorithm (QLGA) enables fewer iterations and gives birth to higher success probabilities. Compared with the conventional Grover algorithms, it avoids the optimal local situations, thereby enabling success probabilities to approach one. (author)

  19. Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10.

    Science.gov (United States)

    Zhang, Yang

    2014-02-01

    We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems. Copyright © 2013 Wiley Periodicals, Inc.

  20. Hierarchical prediction of industrial water demand based on refined Laspeyres decomposition analysis.

    Science.gov (United States)

    Shang, Yizi; Lu, Shibao; Gong, Jiaguo; Shang, Ling; Li, Xiaofei; Wei, Yongping; Shi, Hongwang

    2017-12-01

    A recent study decomposed the changes in industrial water use into three hierarchies (output, technology, and structure) using a refined Laspeyres decomposition model, and found monotonous and exclusive trends in the output and technology hierarchies. Based on that research, this study proposes a hierarchical prediction approach to forecast future industrial water demand. Three water demand scenarios (high, medium, and low) were then established based on potential future industrial structural adjustments, and used to predict water demand for the structural hierarchy. The predictive results of this approach were compared with results from a grey prediction model (GPM (1, 1)). The comparison shows that the results of the two approaches were basically identical, differing by less than 10%. Taking Tianjin, China, as a case, and using data from 2003-2012, this study predicts that industrial water demand will continuously increase, reaching 580 million m 3 , 776.4 million m 3 , and approximately 1.09 billion m 3 by the years 2015, 2020 and 2025 respectively. It is concluded that Tianjin will soon face another water crisis if no immediate measures are taken. This study recommends that Tianjin adjust its industrial structure with water savings as the main objective, and actively seek new sources of water to increase its supply.

  1. Parallel content-based sub-image retrieval using hierarchical searching.

    Science.gov (United States)

    Yang, Lin; Qi, Xin; Xing, Fuyong; Kurc, Tahsin; Saltz, Joel; Foran, David J

    2014-04-01

    The capacity to systematically search through large image collections and ensembles and detect regions exhibiting similar morphological characteristics is central to pathology diagnosis. Unfortunately, the primary methods used to search digitized, whole-slide histopathology specimens are slow and prone to inter- and intra-observer variability. The central objective of this research was to design, develop, and evaluate a content-based image retrieval system to assist doctors for quick and reliable content-based comparative search of similar prostate image patches. Given a representative image patch (sub-image), the algorithm will return a ranked ensemble of image patches throughout the entire whole-slide histology section which exhibits the most similar morphologic characteristics. This is accomplished by first performing hierarchical searching based on a newly developed hierarchical annular histogram (HAH). The set of candidates is then further refined in the second stage of processing by computing a color histogram from eight equally divided segments within each square annular bin defined in the original HAH. A demand-driven master-worker parallelization approach is employed to speed up the searching procedure. Using this strategy, the query patch is broadcasted to all worker processes. Each worker process is dynamically assigned an image by the master process to search for and return a ranked list of similar patches in the image. The algorithm was tested using digitized hematoxylin and eosin (H&E) stained prostate cancer specimens. We have achieved an excellent image retrieval performance. The recall rate within the first 40 rank retrieved image patches is ∼90%. Both the testing data and source code can be downloaded from http://pleiad.umdnj.edu/CBII/Bioinformatics/.

  2. HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information

    Directory of Open Access Journals (Sweden)

    Hu Jianjun

    2011-05-01

    Full Text Available Abstract Background Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Results Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM. The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone

  3. The utility of comparative models and the local model quality for protein crystal structure determination by Molecular Replacement

    Directory of Open Access Journals (Sweden)

    Pawlowski Marcin

    2012-11-01

    Full Text Available Abstract Background Computational models of protein structures were proved to be useful as search models in Molecular Replacement (MR, a common method to solve the phase problem faced by macromolecular crystallography. The success of MR depends on the accuracy of a search model. Unfortunately, this parameter remains unknown until the final structure of the target protein is determined. During the last few years, several Model Quality Assessment Programs (MQAPs that predict the local accuracy of theoretical models have been developed. In this article, we analyze whether the application of MQAPs improves the utility of theoretical models in MR. Results For our dataset of 615 search models, the real local accuracy of a model increases the MR success ratio by 101% compared to corresponding polyalanine templates. On the contrary, when local model quality is not utilized in MR, the computational models solved only 4.5% more MR searches than polyalanine templates. For the same dataset of the 615 models, a workflow combining MR with predicted local accuracy of a model found 45% more correct solution than polyalanine templates. To predict such accuracy MetaMQAPclust, a “clustering MQAP” was used. Conclusions Using comparative models only marginally increases the MR success ratio in comparison to polyalanine structures of templates. However, the situation changes dramatically once comparative models are used together with their predicted local accuracy. A new functionality was added to the GeneSilico Fold Prediction Metaserver in order to build models that are more useful for MR searches. Additionally, we have developed a simple method, AmIgoMR (Am I good for MR?, to predict if an MR search with a template-based model for a given template is likely to find the correct solution.

  4. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

    Science.gov (United States)

    Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

    2017-07-03

    Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Ab initio random structure search for 13-atom clusters of fcc elements

    International Nuclear Information System (INIS)

    Chou, J P; Hsing, C R; Wei, C M; Cheng, C; Chang, C M

    2013-01-01

    The 13-atom metal clusters of fcc elements (Al, Rh, Ir, Ni, Pd, Pt, Cu, Ag, Au) were studied by density functional theory calculations. The global minima were searched for by the ab initio random structure searching method. In addition to some new lowest-energy structures for Pd 13 and Au 13 , we found that the effective coordination numbers of the lowest-energy clusters would increase with the ratio of the dimer-to-bulk bond length. This correlation, together with the electronic structures of the lowest-energy clusters, divides the 13-atom clusters of these fcc elements into two groups (except for Au 13 , which prefers a two-dimensional structure due to the relativistic effect). Compact-like clusters that are composed exclusively of triangular motifs are preferred for elements without d-electrons (Al) or with (nearly) filled d-band electrons (Ni, Pd, Cu, Ag). Non-compact clusters composed mainly of square motifs connected by some triangular motifs (Rh, Ir, Pt) are favored for elements with unfilled d-band electrons. (paper)

  6. Magnetophoresis of flexible DNA-based dumbbell structures

    Science.gov (United States)

    Babić, B.; Ghai, R.; Dimitrov, K.

    2008-02-01

    Controlled movement and manipulation of magnetic micro- and nanostructures using magnetic forces can give rise to important applications in biomedecine, diagnostics, and immunology. We report controlled magnetophoresis and stretching, in aqueous solution, of a DNA-based dumbbell structure containing magnetic and diamagnetic microspheres. The velocity and stretching of the dumbbell were experimentally measured and correlated with a theoretical model based on the forces acting on individual magnetic beads or the entire dumbbell structures. The results show that precise and predictable manipulation of dumbbell structures is achievable and can potentially be applied to immunomagnetic cell separators.

  7. A novel knowledge-based potential for RNA 3D structure evaluation

    Science.gov (United States)

    Yang, Yi; Gu, Qi; Zhang, Ben-Gong; Shi, Ya-Zhou; Shao, Zhi-Gang

    2018-03-01

    Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation. Project supported by the National Science Foundation of China (Grants Nos. 11605125, 11105054, 11274124, and 11401448).

  8. Advancing viral RNA structure prediction: measuring the thermodynamics of pyrimidine-rich internal loops.

    Science.gov (United States)

    Phan, Andy; Mailey, Katherine; Saeki, Jessica; Gu, Xiaobo; Schroeder, Susan J

    2017-05-01

    Accurate thermodynamic parameters improve RNA structure predictions and thus accelerate understanding of RNA function and the identification of RNA drug binding sites. Many viral RNA structures, such as internal ribosome entry sites, have internal loops and bulges that are potential drug target sites. Current models used to predict internal loops are biased toward small, symmetric purine loops, and thus poorly predict asymmetric, pyrimidine-rich loops with >6 nucleotides (nt) that occur frequently in viral RNA. This article presents new thermodynamic data for 40 pyrimidine loops, many of which can form UU or protonated CC base pairs. Uracil and protonated cytosine base pairs stabilize asymmetric internal loops. Accurate prediction rules are presented that account for all thermodynamic measurements of RNA asymmetric internal loops. New loop initiation terms for loops with >6 nt are presented that do not follow previous assumptions that increasing asymmetry destabilizes loops. Since the last 2004 update, 126 new loops with asymmetry or sizes greater than 2 × 2 have been measured. These new measurements significantly deepen and diversify the thermodynamic database for RNA. These results will help better predict internal loops that are larger, pyrimidine-rich, and occur within viral structures such as internal ribosome entry sites. © 2017 Phan et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  9. Churn prediction based on text mining and CRM data analysis

    OpenAIRE

    Schatzmann, Anders; Heitz, Christoph; Münch, Thomas

    2014-01-01

    Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. ...

  10. Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes

    DEFF Research Database (Denmark)

    Tiesyte, Dalia; Jensen, Christian Søndergaard

    2008-01-01

    , historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory......The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular...... of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that is capable...

  11. An ensemble method for predicting subnuclear localizations from primary protein structures.

    Directory of Open Access Journals (Sweden)

    Guo Sheng Han

    Full Text Available BACKGROUND: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. METHODOLOGY/PRINCIPAL FINDINGS: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. CONCLUSIONS: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method

  12. Contrast model for three-dimensional vehicles in natural lighting and search performance analysis

    Science.gov (United States)

    Witus, Gary; Gerhart, Grant R.; Ellis, R. Darin

    2001-09-01

    Ground vehicles in natural lighting tend to have significant and systematic variation in luminance through the presented area. This arises, in large part, from the vehicle surfaces having different orientations and shadowing relative to the source of illumination and the position of the observer. These systematic differences create the appearance of a structured 3D object. The 3D appearance is an important factor in search, figure-ground segregation, and object recognition. We present a contrast metric to predict search and detection performance that accounts for the 3D structure. The approach first computes the contrast of the front (or rear), side, and top surfaces. The vehicle contrast metric is the area-weighted sum of the absolute values of the contrasts of the component surfaces. The 3D structure contrast metric, together with target height, account for more than 80% of the variance in probability of detection and 75% of the variance in search time. When false alarm effects are discounted, they account for 89% of the variance in probability of detection and 95% of the variance in search time. The predictive power of the signature metric, when calibrated to half the data and evaluated against the other half, is 90% of the explanatory power.

  13. Mapping monomeric threading to protein-protein structure prediction.

    Science.gov (United States)

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  14. Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

    Directory of Open Access Journals (Sweden)

    Martin Alberto JM

    2009-01-01

    Full Text Available Abstract Background Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure. Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cα trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10% yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8

  15. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review

    Science.gov (United States)

    Sharafoddini, Anis; Dubin, Joel A

    2017-01-01

    Background Physicians and health policy makers are required to make predictions during their decision making in various medical problems. Many advances have been made in predictive modeling toward outcome prediction, but these innovations target an average patient and are insufficiently adjustable for individual patients. One developing idea in this field is individualized predictive analytics based on patient similarity. The goal of this approach is to identify patients who are similar to an index patient and derive insights from the records of similar patients to provide personalized predictions.. Objective The aim is to summarize and review published studies describing computer-based approaches for predicting patients’ future health status based on health data and patient similarity, identify gaps, and provide a starting point for related future research. Methods The method involved (1) conducting the review by performing automated searches in Scopus, PubMed, and ISI Web of Science, selecting relevant studies by first screening titles and abstracts then analyzing full-texts, and (2) documenting by extracting publication details and information on context, predictors, missing data, modeling algorithm, outcome, and evaluation methods into a matrix table, synthesizing data, and reporting results. Results After duplicate removal, 1339 articles were screened in abstracts and titles and 67 were selected for full-text review. In total, 22 articles met the inclusion criteria. Within included articles, hospitals were the main source of data (n=10). Cardiovascular disease (n=7) and diabetes (n=4) were the dominant patient diseases. Most studies (n=18) used neighborhood-based approaches in devising prediction models. Two studies showed that patient similarity-based modeling outperformed population-based predictive methods. Conclusions Interest in patient similarity-based predictive modeling for diagnosis and prognosis has been growing. In addition to raw/coded health

  16. Screw Remaining Life Prediction Based on Quantum Genetic Algorithm and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Xiaochen Zhang

    2017-01-01

    Full Text Available To predict the remaining life of ball screw, a screw remaining life prediction method based on quantum genetic algorithm (QGA and support vector machine (SVM is proposed. A screw accelerated test bench is introduced. Accelerometers are installed to monitor the performance degradation of ball screw. Combined with wavelet packet decomposition and isometric mapping (Isomap, the sensitive feature vectors are obtained and stored in database. Meanwhile, the sensitive feature vectors are randomly chosen from the database and constitute training samples and testing samples. Then the optimal kernel function parameter and penalty factor of SVM are searched with the method of QGA. Finally, the training samples are used to train optimized SVM while testing samples are adopted to test the prediction accuracy of the trained SVM so the screw remaining life prediction model can be got. The experiment results show that the screw remaining life prediction model could effectively predict screw remaining life.

  17. Real-time prediction of atmospheric Lagrangian coherent structures based on forecast data: An application and error analysis

    Science.gov (United States)

    BozorgMagham, Amir E.; Ross, Shane D.; Schmale, David G.

    2013-09-01

    The language of Lagrangian coherent structures (LCSs) provides a new means for studying transport and mixing of passive particles advected by an atmospheric flow field. Recent observations suggest that LCSs govern the large-scale atmospheric motion of airborne microorganisms, paving the way for more efficient models and management strategies for the spread of infectious diseases affecting plants, domestic animals, and humans. In addition, having reliable predictions of the timing of hyperbolic LCSs may contribute to improved aerobiological sampling of microorganisms with unmanned aerial vehicles and LCS-based early warning systems. Chaotic atmospheric dynamics lead to unavoidable forecasting errors in the wind velocity field, which compounds errors in LCS forecasting. In this study, we reveal the cumulative effects of errors of (short-term) wind field forecasts on the finite-time Lyapunov exponent (FTLE) fields and the associated LCSs when realistic forecast plans impose certain limits on the forecasting parameters. Objectives of this paper are to (a) quantify the accuracy of prediction of FTLE-LCS features and (b) determine the sensitivity of such predictions to forecasting parameters. Results indicate that forecasts of attracting LCSs exhibit less divergence from the archive-based LCSs than the repelling features. This result is important since attracting LCSs are the backbone of long-lived features in moving fluids. We also show under what circumstances one can trust the forecast results if one merely wants to know if an LCS passed over a region and does not need to precisely know the passage time.

  18. EVA: continuous automatic evaluation of protein structure prediction servers.

    Science.gov (United States)

    Eyrich, V A; Martí-Renom, M A; Przybylski, D; Madhusudhan, M S; Fiser, A; Pazos, F; Valencia, A; Sali, A; Rost, B

    2001-12-01

    Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. http://cubic.bioc.columbia.edu/eva. eva@cubic.bioc.columbia.edu

  19. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  20. Predicting HLA class I non-permissive amino acid residues substitutions.

    Directory of Open Access Journals (Sweden)

    T Andrew Binkowski

    Full Text Available Prediction of peptide binding to human leukocyte antigen (HLA molecules is essential to a wide range of clinical entities from vaccine design to stem cell transplant compatibility. Here we present a new structure-based methodology that applies robust computational tools to model peptide-HLA (p-HLA binding interactions. The method leverages the structural conservation observed in p-HLA complexes to significantly reduce the search space and calculate the system's binding free energy. This approach is benchmarked against existing p-HLA complexes and the prediction performance is measured against a library of experimentally validated peptides. The effect on binding activity across a large set of high-affinity peptides is used to investigate amino acid mismatches reported as high-risk factors in hematopoietic stem cell transplantation.

  1. AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

    Directory of Open Access Journals (Sweden)

    Cezar VASILESCU

    2010-01-01

    Full Text Available The Internet becomes for most of us a daily used instrument, for professional or personal reasons. We even do not remember the times when a computer and a broadband connection were luxury items. More and more people are relying on the complicated web network to find the needed information.This paper presents an overview of Internet search related issues, upon search engines and describes the parties and the basic mechanism that is embedded in a search for web based information resources. Also presents ways to increase the efficiency of web searches, through a better understanding of what search engines ignore at websites content.

  2. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    Directory of Open Access Journals (Sweden)

    Chi-Hua Tung

    2016-01-01

    Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  3. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes

    Directory of Open Access Journals (Sweden)

    Saqi Mansoor AS

    2006-04-01

    Full Text Available Abstract Background There has been an explosion in the number of single nucleotide polymorphisms (SNPs within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs, some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. Results The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. Conclusion The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html

  4. A GIS-based Quantitative Approach for the Search of Clandestine Graves, Italy.

    Science.gov (United States)

    Somma, Roberta; Cascio, Maria; Silvestro, Massimiliano; Torre, Eliana

    2018-05-01

    Previous research on the RAG color-coded prioritization systems for the discovery of clandestine graves has not considered all the factors influencing the burial site choice within a GIS project. The goal of this technical note was to discuss a GIS-based quantitative approach for the search of clandestine graves. The method is based on cross-referenced RAG maps with cumulative suitability factors to host a burial, leading to the editing of different search scenarios for ground searches showing high-(Red), medium-(Amber), and low-(Green) priority areas. The application of this procedure allowed several outcomes to be determined: If the concealment occurs at night, then the "search scenario without the visibility" will be the most effective one; if the concealment occurs in daylight, then the "search scenario with the DSM-based visibility" will be most appropriate; the different search scenarios may be cross-referenced with offender's confessions and eyewitnesses' testimonies to verify the veracity of their statements. © 2017 American Academy of Forensic Sciences.

  5. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    Science.gov (United States)

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  6. Bioinformatical approaches to RNA structure prediction & Sequencing of an ancient human genome

    DEFF Research Database (Denmark)

    Lindgreen, Stinus

    Stinus Lindgreen has been working in two different fields during his Ph.D. The first part has been focused on computational approaches to predict the structure of non-coding RNA molecules at the base pairing level. This has resulted in the analysis of various measures of the base pairing potentia...

  7. Predicting Consensus Structures for RNA Alignments Via Pseudo-Energy Minimization

    Directory of Open Access Journals (Sweden)

    Junilda Spirollari

    2009-01-01

    Full Text Available Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http:// datalab.njit.edu/biology/RSpredict.

  8. Search for leptoquarks at HERA

    Energy Technology Data Exchange (ETDEWEB)

    Huettmann, Antje

    2009-10-15

    A search for first generation leptoquarks was performed in polarized electron-proton collider data recorded with the ZEUS detector at HERA in the years 2003-2007. They were analyzed for final states with an electron and jets or with missing transverse momentum and jets and a search for resonance structures or other deviations from the Standard Model predictions in the spectra of the invariant mass of lepton and jets was performed. No evidence for leptoquark signals was found. The data were combined with the previously taken data at HERA corresponding to an integrated luminosity of 488 pb{sup -1} and limits were set on the Yukawa coupling {lambda} as a function of the leptoquark mass for different leptoquark types within the Buchmueller-Rueckl-Wyler model. (orig.)

  9. Multi-step-prediction of chaotic time series based on co-evolutionary recurrent neural network

    International Nuclear Information System (INIS)

    Ma Qianli; Zheng Qilun; Peng Hong; Qin Jiangwei; Zhong Tanwei

    2008-01-01

    This paper proposes a co-evolutionary recurrent neural network (CERNN) for the multi-step-prediction of chaotic time series, it estimates the proper parameters of phase space reconstruction and optimizes the structure of recurrent neural networks by co-evolutionary strategy. The searching space was separated into two subspaces and the individuals are trained in a parallel computational procedure. It can dynamically combine the embedding method with the capability of recurrent neural network to incorporate past experience due to internal recurrence. The effectiveness of CERNN is evaluated by using three benchmark chaotic time series data sets: the Lorenz series, Mackey-Glass series and real-world sun spot series. The simulation results show that CERNN improves the performances of multi-step-prediction of chaotic time series

  10. Prediction of the Fundamental Period of Infilled RC Frame Structures Using Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Panagiotis G. Asteris

    2016-01-01

    Full Text Available The fundamental period is one of the most critical parameters for the seismic design of structures. There are several literature approaches for its estimation which often conflict with each other, making their use questionable. Furthermore, the majority of these approaches do not take into account the presence of infill walls into the structure despite the fact that infill walls increase the stiffness and mass of structure leading to significant changes in the fundamental period. In the present paper, artificial neural networks (ANNs are used to predict the fundamental period of infilled reinforced concrete (RC structures. For the training and the validation of the ANN, a large data set is used based on a detailed investigation of the parameters that affect the fundamental period of RC structures. The comparison of the predicted values with analytical ones indicates the potential of using ANNs for the prediction of the fundamental period of infilled RC frame structures taking into account the crucial parameters that influence its value.

  11. Graph theoretical ordering of structures as a basis for systematic searches for regularities in molecular data

    International Nuclear Information System (INIS)

    Randic, M.; Wilkins, C.L.

    1979-01-01

    Selected molecular data on alkanes have been reexamined in a search for general regularities in isomeric variations. In contrast to the prevailing approaches concerned with fitting data by searching for optimal parameterization, the present work is primarily aimed at established trends, i.e., searching for relative magnitudes and their regularities among the isomers. Such an approach is complementary to curve fitting or correlation seeking procedures. It is particularly useful when there are incomplete data which allow trends to be recognized but no quantitative correlation to be established. One proceeds by first ordering structures. One way is to consider molecular graphs and enumerate paths of different length as the basic graph invariant. It can be shown that, for several thermodynamic molecular properties, the number of paths of length two (p 2 ) and length three (p 3 ) are critical. Hence, an ordering based on p 2 and p 3 indicates possible trends and behavior for many molecular properties, some of which relate to others, some which do not. By considering a grid graph derived by attributing to each isomer coordinates (p 2 ,p 3 ) and connecting points along the coordinate axis, one obtains a simple presentation useful for isomer structural interrelations. This skeletal frame is one upon which possible trends for different molecular properties may be conveniently represented. The significance of the results and their conceptual value is discussed. 16 figures, 3 tables

  12. Search for brown dwarfs in the IRAS data bases

    International Nuclear Information System (INIS)

    Low, F.J.

    1986-01-01

    A report is given on the initial searches for brown dwarf stars in the IRAS data bases. The paper was presented to the workshop on 'Astrophysics of brown dwarfs', Virginia, USA, 1985. To date no brown dwarfs have been discovered in the solar neighbourhood. Opportunities for future searches with greater sensitivity and different wavelengths are outlined. (U.K.)

  13. A prediction method based on wavelet transform and multiple models fusion for chaotic time series

    International Nuclear Information System (INIS)

    Zhongda, Tian; Shujiang, Li; Yanhong, Wang; Yi, Sha

    2017-01-01

    In order to improve the prediction accuracy of chaotic time series, a prediction method based on wavelet transform and multiple models fusion is proposed. The chaotic time series is decomposed and reconstructed by wavelet transform, and approximate components and detail components are obtained. According to different characteristics of each component, least squares support vector machine (LSSVM) is used as predictive model for approximation components. At the same time, an improved free search algorithm is utilized for predictive model parameters optimization. Auto regressive integrated moving average model (ARIMA) is used as predictive model for detail components. The multiple prediction model predictive values are fusion by Gauss–Markov algorithm, the error variance of predicted results after fusion is less than the single model, the prediction accuracy is improved. The simulation results are compared through two typical chaotic time series include Lorenz time series and Mackey–Glass time series. The simulation results show that the prediction method in this paper has a better prediction.

  14. Structure and Sequence Search on Aptamer-Protein Docking

    Science.gov (United States)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  15. Predicting Click-Through Rates of New Advertisements Based on the Bayesian Network

    Directory of Open Access Journals (Sweden)

    Zhipeng Fang

    2014-01-01

    Full Text Available Most classical search engines choose and rank advertisements (ads based on their click-through rates (CTRs. To predict an ad’s CTR, historical click information is frequently concerned. To accurately predict the CTR of the new ads is challenging and critical for real world applications, since we do not have plentiful historical data about these ads. Adopting Bayesian network (BN as the effective framework for representing and inferring dependencies and uncertainties among variables, in this paper, we establish a BN-based model to predict the CTRs of new ads. First, we built a Bayesian network of the keywords that are used to describe the ads in a certain domain, called keyword BN and abbreviated as KBN. Second, we proposed an algorithm for approximate inferences of the KBN to find similar keywords with those that describe the new ads. Finally based on the similar keywords, we obtain the similar ads and then calculate the CTR of the new ad by using the CTRs of the ads that are similar with the new ad. Experimental results show the efficiency and accuracy of our method.

  16. Transcription factor binding sites prediction based on modified nucleosomes.

    Directory of Open Access Journals (Sweden)

    Mohammad Talebzadeh

    Full Text Available In computational methods, position weight matrices (PWMs are commonly applied for transcription factor binding site (TFBS prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, "modified nucleosomes neighboring" and "modified nucleosomes occupancy", to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method

  17. Validation of Quantitative Structure-Activity Relationship (QSAR Model for Photosensitizer Activity Prediction

    Directory of Open Access Journals (Sweden)

    Sharifuddin M. Zain

    2011-11-01

    Full Text Available Photodynamic therapy is a relatively new treatment method for cancer which utilizes a combination of oxygen, a photosensitizer and light to generate reactive singlet oxygen that eradicates tumors via direct cell-killing, vasculature damage and engagement of the immune system. Most of photosensitizers that are in clinical and pre-clinical assessments, or those that are already approved for clinical use, are mainly based on cyclic tetrapyrroles. In an attempt to discover new effective photosensitizers, we report the use of the quantitative structure-activity relationship (QSAR method to develop a model that could correlate the structural features of cyclic tetrapyrrole-based compounds with their photodynamic therapy (PDT activity. In this study, a set of 36 porphyrin derivatives was used in the model development where 24 of these compounds were in the training set and the remaining 12 compounds were in the test set. The development of the QSAR model involved the use of the multiple linear regression analysis (MLRA method. Based on the method, r2 value, r2 (CV value and r2 prediction value of 0.87, 0.71 and 0.70 were obtained. The QSAR model was also employed to predict the experimental compounds in an external test set. This external test set comprises 20 porphyrin-based compounds with experimental IC50 values ranging from 0.39 µM to 7.04 µM. Thus the model showed good correlative and predictive ability, with a predictive correlation coefficient (r2 prediction for external test set of 0.52. The developed QSAR model was used to discover some compounds as new lead photosensitizers from this external test set.

  18. Search for Hadronic Resonances in CMS

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    Many models of new physics involve colored particles predicted to decay into hadronic final states. We present the results of searches for new heavy resonances in final states with up to 8 jets. Dedicated techniques have been developed to take advantage of the boosted topology and identify W and Z bosons. We also discuss a trigger strategy to extend the dijet search well below 1 TeV. These results are based on pp collision data collected with the CMS detector in 2011 and 2012.

  19. Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

    Science.gov (United States)

    Bratholm, Lars A; Jensen, Jan H

    2017-03-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural

  20. Pathfinder: multiresolution region-based searching of pathology images using IRM.

    OpenAIRE

    Wang, J. Z.

    2000-01-01

    The fast growth of digitized pathology slides has created great challenges in research on image database retrieval. The prevalent retrieval technique involves human-supplied text annotations to describe slide contents. These pathology images typically have very high resolution, making it difficult to search based on image content. In this paper, we present Pathfinder, an efficient multiresolution region-based searching system for high-resolution pathology image libraries. The system uses wave...

  1. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  2. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    Science.gov (United States)

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  3. Orthogonal search-based rule extraction for modelling the decision to transfuse.

    Science.gov (United States)

    Etchells, T A; Harrison, M J

    2006-04-01

    Data from an audit relating to transfusion decisions during intermediate or major surgery were analysed to determine the strengths of certain factors in the decision making process. The analysis, using orthogonal search-based rule extraction (OSRE) from a trained neural network, demonstrated that the risk of tissue hypoxia (ROTH) assessed using a 100-mm visual analogue scale, the haemoglobin value (Hb) and the presence or absence of on-going haemorrhage (OGH) were able to reproduce the transfusion decisions with a joint specificity of 0.96 and sensitivity of 0.93 and a positive predictive value of 0.9. The rules indicating transfusion were: 1. ROTH > 32 mm and Hb 13 mm and Hb 38 mm, Hb < 102 g x l(-1) and OGH; 4. Hb < 78 g x l(-1).

  4. Two Search Techniques within a Human Pedigree Database

    OpenAIRE

    Gersting, J. M.; Conneally, P. M.; Rogers, K.

    1982-01-01

    This paper presents the basic features of two search techniques from MEGADATS-2 (MEdical Genetics Acquisition and DAta Transfer System), a system for collecting, storing, retrieving and plotting human family pedigrees. The individual search provides a quick method for locating an individual in the pedigree database. This search uses a modified soundex coding and an inverted file structure based on a composite key. The navigational search uses a set of pedigree traversal operations (individual...

  5. Multilevel Thresholding Segmentation Based on Harmony Search Optimization

    Directory of Open Access Journals (Sweden)

    Diego Oliva

    2013-01-01

    Full Text Available In this paper, a multilevel thresholding (MT algorithm based on the harmony search algorithm (HSA is introduced. HSA is an evolutionary method which is inspired in musicians improvising new harmonies while playing. Different to other evolutionary algorithms, HSA exhibits interesting search capabilities still keeping a low computational overhead. The proposed algorithm encodes random samples from a feasible search space inside the image histogram as candidate solutions, whereas their quality is evaluated considering the objective functions that are employed by the Otsu’s or Kapur’s methods. Guided by these objective values, the set of candidate solutions are evolved through the HSA operators until an optimal solution is found. Experimental results demonstrate the high performance of the proposed method for the segmentation of digital images.

  6. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12.

    Science.gov (United States)

    Haas, Jürgen; Barbato, Alessandro; Behringer, Dario; Studer, Gabriel; Roth, Steven; Bertoni, Martino; Mostaguir, Khaled; Gumienny, Rafal; Schwede, Torsten

    2018-03-01

    Every second year, the community experiment "Critical Assessment of Techniques for Structure Prediction" (CASP) is conducting an independent blind assessment of structure prediction methods, providing a framework for comparing the performance of different approaches and discussing the latest developments in the field. Yet, developers of automated computational modeling methods clearly benefit from more frequent evaluations based on larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements the CASP experiment by conducting fully automated blind prediction assessments based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the PDB Protein Data Bank. CAMEO publishes weekly benchmarking results based on models collected during a 4-day prediction window, on average assessing ca. 100 targets during a time frame of 5 weeks. CAMEO benchmarking data is generated consistently for all participating methods at the same point in time, enabling developers to benchmark and cross-validate their method's performance, and directly refer to the benchmarking results in publications. In order to facilitate server development and promote shorter release cycles, CAMEO sends weekly email with submission statistics and low performance warnings. Many participants of CASP have successfully employed CAMEO when preparing their methods for upcoming community experiments. CAMEO offers a variety of scores to allow benchmarking diverse aspects of structure prediction methods. By introducing new scoring schemes, CAMEO facilitates new development in areas of active research, for example, modeling quaternary structure, complexes, or ligand binding sites. © 2017 Wiley Periodicals, Inc.

  7. Structural syntactic prediction measured with ELAN: evidence from ERPs.

    Science.gov (United States)

    Fonteneau, Elisabeth

    2013-02-08

    The current study used event-related potentials (ERPs) to investigate how and when argument structure information is used during the processing of sentences with a filler-gap dependency. We hypothesize that one specific property - animacy (living vs. non-living) - is used by the parser during the building of the syntactic structure. Participants heard sentences that were rated off-line as having an expected noun (Who did the Lion King chase the caravan with?) or an unexpected noun (Who did Lion King chase the animal with?). This prediction is based on the animacy properties relation between the wh-word and the noun in the object position. ERPs from the noun in the unexpected condition (animal) elicited a typical Early Left Anterior Negativity (ELAN)/P600 complex compared to the noun in the expected condition (caravan). Firstly, these results demonstrate that the ELAN reflects not only grammatical category violation but also animacy property expectations in filler-gap dependency. Secondly, our data suggests that the language comprehension system is able to make detailed predictions about aspects of the upcoming words to build up the syntactic structure. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  8. Active Search for Antecedents in Cataphoric Pronoun Resolution

    Directory of Open Access Journals (Sweden)

    Leticia ePablos

    2015-10-01

    Full Text Available Cataphoric dependencies where a pronoun precedes its antecedent appear to call on different mechanisms in language comprehension from forward dependencies where the antecedent precedes the pronoun. Previous research has shown that the resolution of cataphoric dependencies involves predictive processes such as the Active Search Mechanism, which hypothesizes the automatic search for an antecedent immediately after encountering a cataphoric pronoun. The current study employs gender mismatch to investigate whether the active search for an antecedent of a cataphoric pronoun is restricted only to grammatically licit positions. We present results from an event-related potential experiment on the reading comprehension of cataphoric dependencies in Dutch. Results show that gender mismatch gives rise to an anterior negativity at grammatically licit antecedent positions only. We hypothesize that this negativity reflects the prediction failure for an antecedent after encountering a pronoun, rather than a gender mismatch. We discuss the timing, topography and functionality of this negativity with respect to previous studies and how this relates to the ERPs elicited in the processing of structural constraints on pronoun resolution.

  9. Prediction of the Formulation Dependence of the Glass Transition Temperature for Amine-Epoxy Copolymers Using a Quantitative Structure-Property Relationship Based on the AM1 Method

    National Research Council Canada - National Science Library

    Morrill, Jason

    2004-01-01

    A designer Quantitative Structure-Property Relationsbip (QSPR) based upon molecular properties calculated using the AM1 semi-empirical quantum mechanical metbod was developed to predict the glass transition temperature (Tg...

  10. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    Background: The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based...... on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naive Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic...

  11. Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces

    International Nuclear Information System (INIS)

    Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

    2011-01-01

    The vast majority of the chores in the living cell involve protein–protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein–protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations

  12. The sequential structure of brain activation predicts skill.

    Science.gov (United States)

    Anderson, John R; Bothell, Daniel; Fincham, Jon M; Moon, Jungaa

    2016-01-29

    In an fMRI study, participants were trained to play a complex video game. They were scanned early and then again after substantial practice. While better players showed greater activation in one region (right dorsal striatum) their relative skill was better diagnosed by considering the sequential structure of whole brain activation. Using a cognitive model that played this game, we extracted a characterization of the mental states that are involved in playing a game and the statistical structure of the transitions among these states. There was a strong correspondence between this measure of sequential structure and the skill of different players. Using multi-voxel pattern analysis, it was possible to recognize, with relatively high accuracy, the cognitive states participants were in during particular scans. We used the sequential structure of these activation-recognized states to predict the skill of individual players. These findings indicate that important features about information-processing strategies can be identified from a model-based analysis of the sequential structure of brain activation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Atomic-accuracy prediction of protein loop structures through an RNA-inspired Ansatz.

    Directory of Open Access Journals (Sweden)

    Rhiju Das

    Full Text Available Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as

  14. Aromatic claw: A new fold with high aromatic content that evades structural prediction: Aromatic Claw

    Energy Technology Data Exchange (ETDEWEB)

    Sachleben, Joseph R. [Biomolecular NMR Core Facility, University of Chicago, Chicago Illinois; Adhikari, Aashish N. [Department of Chemistry, University of Chicago, Chicago Illinois; Gawlak, Grzegorz [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Hoey, Robert J. [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Liu, Gaohua [Northeast Structural Genomics Consortium (NESG), Department of Molecular Biology and Biochemistry, School of Arts and Sciences, and Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, and Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway New Jersey; Joachimiak, Andrzej [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Biological Sciences Division, Argonne National Laboratory, Argonne Illinois; Montelione, Gaetano T. [Northeast Structural Genomics Consortium (NESG), Department of Molecular Biology and Biochemistry, School of Arts and Sciences, and Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, and Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway New Jersey; Sosnick, Tobin R. [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Koide, Shohei [Department of Biochemistry and Molecular Biology, University of Chicago, Chicago Illinois; Department of Biochemistry and Molecular Pharmacology and the Perlmutter Cancer Center, New York University School of Medicine, New York New York

    2016-11-10

    We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β-sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α-helical with a small two-stranded β-sheet with a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.

  15. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    . These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D......Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution...

  16. Longitudinal connectome-based predictive modeling for REM sleep behavior disorder from structural brain connectivity

    Science.gov (United States)

    Giancardo, Luca; Ellmore, Timothy M.; Suescun, Jessika; Ocasio, Laura; Kamali, Arash; Riascos-Castaneda, Roy; Schiess, Mya C.

    2018-02-01

    Methods to identify neuroplasticity patterns in human brains are of the utmost importance in understanding and potentially treating neurodegenerative diseases. Parkinson disease (PD) research will greatly benefit and advance from the discovery of biomarkers to quantify brain changes in the early stages of the disease, a prodromal period when subjects show no obvious clinical symptoms. Diffusion tensor imaging (DTI) allows for an in-vivo estimation of the structural connectome inside the brain and may serve to quantify the degenerative process before the appearance of clinical symptoms. In this work, we introduce a novel strategy to compute longitudinal structural connectomes in the context of a whole-brain data-driven pipeline. In these initial tests, we show that our predictive models are able to distinguish controls from asymptomatic subjects at high risk of developing PD (REM sleep behavior disorder, RBD) with an area under the receiving operating characteristic curve of 0.90 (pParkinson's Progression Markers Initiative. By analyzing the brain connections most relevant for the predictive ability of the best performing model, we find connections that are biologically relevant to the disease.

  17. Quantum signature scheme based on a quantum search algorithm

    International Nuclear Information System (INIS)

    Yoon, Chun Seok; Kang, Min Sung; Lim, Jong In; Yang, Hyung Jin

    2015-01-01

    We present a quantum signature scheme based on a two-qubit quantum search algorithm. For secure transmission of signatures, we use a quantum search algorithm that has not been used in previous quantum signature schemes. A two-step protocol secures the quantum channel, and a trusted center guarantees non-repudiation that is similar to other quantum signature schemes. We discuss the security of our protocol. (paper)

  18. Protein secondary structure: category assignment and predictability

    DEFF Research Database (Denmark)

    Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

    2001-01-01

    In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary......-forward neural network with one hidden layer on a data set identical to the one used in earlier work....

  19. Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

    Science.gov (United States)

    Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

    2011-08-01

    The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.

  20. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Cobaugh Christian W

    2004-08-01

    Full Text Available Abstract Background A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1. Results The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases. Conclusion Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.

  1. A search for optimal parameters of resonance circuits ensuring damping of electroelastic structure vibrations based on the solution of natural vibration problem

    Science.gov (United States)

    Oshmarin, D.; Sevodina, N.; Iurlov, M.; Iurlova, N.

    2017-06-01

    In this paper, with the aim of providing passive control of structure vibrations a new approach has been proposed for selecting optimal parameters of external electric shunt circuits connected to piezoelectric elements located on the surface of the structure. The approach is based on the mathematical formulation of the natural vibration problem. The results of solution of this problem are the complex eigenfrequencies, the real part of which represents the vibration frequency and the imaginary part corresponds to the damping ratio, characterizing the rate of damping. A criterion of search for optimal parameters of the external passive shunt circuits, which can provide the system with desired dissipative properties, has been derived based on the analysis of responses of the real and imaginary parts of different complex eigenfrequencies to changes in the values of the parameters of the electric circuit. The efficiency of this approach has been verified in the context of natural vibration problem of rigidly clamped plate and semi-cylindrical shell, which is solved for series-connected and parallel -connected external resonance (consisting of resistive and inductive elements) R-L circuits. It has been shown that at lower (more energy-intensive) frequencies, a series-connected external circuit has the advantage of providing lower values of the circuit parameters, which renders it more attractive in terms of practical applications.

  2. Ternary alloy material prediction using genetic algorithm and cluster expansion

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Chong [Iowa State Univ., Ames, IA (United States)

    2015-12-01

    This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe3VSi2 is a new stable phase and it can be very inspiring to the future experiments.

  3. Novel citation-based search method for scientific literature: application to meta-analyses

    NARCIS (Netherlands)

    Janssens, A.C.J.W.; Gwinn, M.

    2015-01-01

    Background: Finding eligible studies for meta-analysis and systematic reviews relies on keyword-based searching as the gold standard, despite its inefficiency. Searching based on direct citations is not sufficiently comprehensive. We propose a novel strategy that ranks articles on their degree of

  4. DFT approach to (benzylthio)acetic acid: Conformational search, molecular (monomer and dimer) structure, vibrational spectroscopy and some electronic properties

    Science.gov (United States)

    Sienkiewicz-Gromiuk, Justyna

    2018-01-01

    The DFT studies were carried out with the B3LYP method utilizing the 6-31G and 6-311++G(d,p) basis sets depending on whether the aim of calculations was to gain the geometry at equilibrium, or to calculate the optimized molecular structure of (benzylthio)acetic acid (Hbta) in the forms of monomer and dimer. The minimum conformational energy search was followed by the potential energy surface (PES) scan of all rotary bonds existing in the acid molecule. The optimized geometrical monomeric and dimeric structures of the title compound were compared with the experimental structural data in the solid state. The detailed vibrational interpretation of experimental infrared and Raman bands was performed on the basis of theoretically simulated ESFF-scaled wavenumbers calculated for the monomer and dimer structures of Hbta. The electronic characteristics of Hbta is also presented in terms of Mulliken atomic charges, frontier molecular orbitals and global reactivity descriptors. Additionally, the MEP and ESP surfaces were computed to predict coordination sites for potential metal complex formation.

  5. Order Tracking Based on Robust Peak Search Instantaneous Frequency Estimation

    International Nuclear Information System (INIS)

    Gao, Y; Guo, Y; Chi, Y L; Qin, S R

    2006-01-01

    Order tracking plays an important role in non-stationary vibration analysis of rotating machinery, especially to run-up or coast down. An instantaneous frequency estimation (IFE) based order tracking of rotating machinery is introduced. In which, a peak search algorithms of spectrogram of time-frequency analysis is employed to obtain IFE of vibrations. An improvement to peak search is proposed, which can avoid strong non-order components or noises disturbing to the peak search work. Compared with traditional methods of order tracking, IFE based order tracking is simplified in application and only software depended. Testing testify the validity of the method. This method is an effective supplement to traditional methods, and the application in condition monitoring and diagnosis of rotating machinery is imaginable

  6. A class-based link prediction using Distance Dependent Chinese Restaurant Process

    Science.gov (United States)

    Andalib, Azam; Babamir, Seyed Morteza

    2016-08-01

    One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.

  7. NEAT-FLEX: Predicting the conformational flexibility of amino acids using neuroevolution of augmenting topologies.

    Science.gov (United States)

    Grisci, Bruno; Dorn, Márcio

    2017-06-01

    The development of computational methods to accurately model three-dimensional protein structures from sequences of amino acid residues is becoming increasingly important to the structural biology field. This paper addresses the challenge of predicting the tertiary structure of a given amino acid sequence, which has been reported to belong to the NP-Complete class of problems. We present a new method, namely NEAT-FLEX, based on NeuroEvolution of Augmenting Topologies (NEAT) to extract structural features from (ABS) proteins that are determined experimentally. The proposed method manipulates structural information from the Protein Data Bank (PDB) and predicts the conformational flexibility (FLEX) of residues of a target amino acid sequence. This information may be used in three-dimensional structure prediction approaches as a way to reduce the conformational search space. The proposed method was tested with 24 different amino acid sequences. Evolving neural networks were compared against a traditional error back-propagation algorithm; results show that the proposed method is a powerful way to extract and represent structural information from protein molecules that are determined experimentally.

  8. Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.

    Science.gov (United States)

    Mei, Gang; Xu, Nengxiong; Xu, Liangliang

    2016-01-01

    This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.

  9. Rapid profiling of polymeric phenolic acids in Salvia miltiorrhiza by hybrid data-dependent/targeted multistage mass spectrometry acquisition based on expected compounds prediction and fragment ion searching.

    Science.gov (United States)

    Shen, Yao; Feng, Zijin; Yang, Min; Zhou, Zhe; Han, Sumei; Hou, Jinjun; Li, Zhenwei; Wu, Wanying; Guo, De-An

    2018-04-01

    Phenolic acids are the major water-soluble components in Salvia miltiorrhiza (>5%). According to previous studies, many of them contribute to the cardiovascular effects and antioxidant effects of S. miltiorrhiza. Polymeric phenolic acids can be considered as the tanshinol derived metabolites, e.g., dimmers, trimers, and tetramers. A strategy combined with tanshinol-based expected compounds prediction, total ion chromatogram filtering, fragment ion searching, and parent list-based multistage mass spectrometry acquisition by linear trap quadropole-orbitrap Velos mass spectrometry was proposed to rapid profile polymeric phenolic acids in S. miltiorrhiza. More than 480 potential polymeric phenolic acids could be screened out by this strategy. Based on the fragment information obtained by parent list-activated data dependent multistage mass spectrometry acquisition, 190 polymeric phenolic acids were characterized by comparing their mass information with literature data, and 18 of them were firstly detected from S. miltiorrhiza. Seven potential compounds were tentatively characterized as new polymeric phenolic acids from S. miltiorrhiza. This strategy facilitates identification of polymeric phenolic acids in complex matrix with both selectivity and sensitivity, which could be expanded for rapid discovery and identification of compounds from complex matrix. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction.

    Science.gov (United States)

    Yao, Yuangen; Gui, Rong; Liu, Quan; Yi, Ming; Deng, Haiyou

    2017-12-08

    As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment. During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated. Based on different distance cutoffs (from 5 to 22 Å) and residue intervals (from 0 to 15) as well as six different reference states, we constructed a series of distance-dependent atom-pair potentials and tested them on several groups of structural decoy sets collected from diverse sources. A comprehensive investigation has been performed to clarify the effects of distance cutoff and residue interval on the potential's performance. Our results provide a new perspective as well as a practical guidance for optimizing distance-dependent statistical potentials. The optimal distance cutoff and residue interval are highly related with the reference state that the potential is based on, the measurements of the potential's performance, and the decoy sets that the potential is applied to. The performance of distance-dependent statistical potential can be significantly improved when the best statistical parameters for the specific application environment are adopted.

  11. RxnFinder: biochemical reaction search engines using molecular structures, molecular fragments and reaction similarity.

    Science.gov (United States)

    Hu, Qian-Nan; Deng, Zhe; Hu, Huanan; Cao, Dong-Sheng; Liang, Yi-Zeng

    2011-09-01

    Biochemical reactions play a key role to help sustain life and allow cells to grow. RxnFinder was developed to search biochemical reactions from KEGG reaction database using three search criteria: molecular structures, molecular fragments and reaction similarity. RxnFinder is helpful to get reference reactions for biosynthesis and xenobiotics metabolism. RxnFinder is freely available via: http://sdd.whu.edu.cn/rxnfinder. qnhu@whu.edu.cn.

  12. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  13. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  14. Towards Prediction Of Crystal Structure Of Al-Rich Intermetallides Formed In Al-T-A Systems

    International Nuclear Information System (INIS)

    Bram, Avraham I.; Meshic, Louisa; Ilse Katz institute for nanotechnology, Ben Gurion University of the Negev; Venkert, Arie

    2014-01-01

    Crystal structure of the material has a significant contribution on its properties. However, there is no universal model that can predict precisely the crystallographic structure of a stable material at specific composition and temperature. Since the 1950's, various prediction approaches were developed and yielded many different methods of computer simulation and innovative theories which are summarized in the review of Woodley et al. These methods are based on complicated calculations of quantum sizes

  15. Effective Energy Methods for Global Optimization for Biopolymer Structure Prediction

    National Research Council Canada - National Science Library

    Shalloway, David

    1998-01-01

    .... Its main strength is that it uncovers and exploits the intrinsic "hidden structures" of biopolymer energy landscapes to efficiently perform global minimization using a hierarchical search procedure...

  16. A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction.

    Science.gov (United States)

    Haider, Saad; Rahman, Raziur; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.

  17. Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.

    Science.gov (United States)

    Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas

    2017-07-24

    Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.

  18. Update on CERN Search based on SharePoint 2013

    Science.gov (United States)

    Alvarez, E.; Fernandez, S.; Lossent, A.; Posada, I.; Silva, B.; Wagner, A.

    2017-10-01

    CERN’s enterprise Search solution “CERN Search” provides a central search solution for users and CERN service providers. A total of about 20 million public and protected documents from a wide range of document collections is indexed, including Indico, TWiki, Drupal, SharePoint, JACOW, E-group archives, EDMS, and CERN Web pages. In spring 2015, CERN Search was migrated to a new infrastructure based on SharePoint 2013. In the context of this upgrade, the document pre-processing and indexing process was redesigned and generalised. The new data feeding framework allows to profit from new functionality and it facilitates the long term maintenance of the system.

  19. Prediction of the strength of concrete radiation shielding based on LS-SVM

    International Nuclear Information System (INIS)

    Juncai, Xu; Qingwen, Ren; Zhenzhong, Shen

    2015-01-01

    Highlights: • LS-SVM was introduced for prediction of the strength of RSC. • A model for prediction of the strength of RSC was implemented. • The grid search algorithm was used to optimize the parameters of the LS-SVM. • The performance of LS-SVM in predicting the strength of RSC was evaluated. - Abstract: Radiation-shielding concrete (RSC) and conventional concrete differ in strength because of their distinct constituents. Predicting the strength of RSC with different constituents plays a vital role in radiation shielding (RS) engineering design. In this study, a model to predict the strength of RSC is established using a least squares-support vector machine (LS-SVM) through grid search algorithm. The algorithm is used to optimize the parameters of the LS-SVM on the basis of traditional prediction methods for conventional concrete. The predicted results of the LS-SVM model are compared with the experimental data. The results of the prediction are stable and consistent with the experimental results. In addition, the studied parameters exhibit significant effects on the simulation results. Therefore, the proposed method can be applied in predicting the strength of RSC, and the predicted results can be adopted as an important reference for RS engineering design

  20. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots.

    Science.gov (United States)

    Hajdin, Christine E; Bellaousov, Stanislav; Huggins, Wayne; Leonard, Christopher W; Mathews, David H; Weeks, Kevin M

    2013-04-02

    A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in well-folded RNAs were identified.

  1. Comparison of PubMed and Google Scholar literature searches.

    Science.gov (United States)

    Anders, Michael E; Evans, Dennis P

    2010-05-01

    Literature searches are essential to evidence-based respiratory care. To conduct literature searches, respiratory therapists rely on search engines to retrieve information, but there is a dearth of literature on the comparative efficiencies of search engines for researching clinical questions in respiratory care. To compare PubMed and Google Scholar search results for clinical topics in respiratory care to that of a benchmark. We performed literature searches with PubMed and Google Scholar, on 3 clinical topics. In PubMed we used the Clinical Queries search filter. In Google Scholar we used the search filters in the Advanced Scholar Search option. We used the reference list of a related Cochrane Collaboration evidence-based systematic review as the benchmark for each of the search results. We calculated recall (sensitivity) and precision (positive predictive value) with 2 x 2 contingency tables. We compared the results with the chi-square test of independence and Fisher's exact test. PubMed and Google Scholar had similar recall for both overall search results (71% vs 69%) and full-text results (43% vs 51%). PubMed had better precision than Google Scholar for both overall search results (13% vs 0.07%, P PubMed searches with the Clinical Queries filter are more precise than with the Advanced Scholar Search in Google Scholar for respiratory care topics. PubMed appears to be more practical to conduct efficient, valid searches for informing evidence-based patient-care protocols, for guiding the care of individual patients, and for educational purposes.

  2. On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

    Directory of Open Access Journals (Sweden)

    Julien Becker

    Full Text Available Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix together with the CSP (cysteine separation profile are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to

  3. QCD dipole predictions for DIS and diffractive structure functions

    International Nuclear Information System (INIS)

    Royon, C.

    1997-01-01

    The proton structure function F 2 , the gluon density F G , and the longitudinal structure function F L are derived in the QCD dipole picture of BFKL dynamics. We use a three parameter fit to describe the 1994 H1 proton structure function F 2 data in the low x, moderate Q 2 range. Without any additional parameter, the gluon density and the longitudinal structure functions are predicted. The diffractive dissociation processes are also discussed within the same framework, and a new prediction for the proton diffractive structure function is obtained

  4. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

    Science.gov (United States)

    Zhang, Wenxuan; Yang, Jianyi; He, Baoji; Walker, Sara Elizabeth; Zhang, Hongjiu; Govindarajoo, Brandon; Virtanen, Jouko; Xue, Zhidong; Shen, Hong-Bin; Zhang, Yang

    2016-09-01

    We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  5. Tales from the Field: Search Strategies Applied in Web Searching

    Directory of Open Access Journals (Sweden)

    Soohyung Joo

    2010-08-01

    Full Text Available In their web search processes users apply multiple types of search strategies, which consist of different search tactics. This paper identifies eight types of information search strategies with associated cases based on sequences of search tactics during the information search process. Thirty-one participants representing the general public were recruited for this study. Search logs and verbal protocols offered rich data for the identification of different types of search strategies. Based on the findings, the authors further discuss how to enhance web-based information retrieval (IR systems to support each type of search strategy.

  6. URS DataBase: universe of RNA structures and their motifs.

    Science.gov (United States)

    Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail

    2016-01-01

    The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA-protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification.Database URL: http://server3.lpm.org.ru/urs/. © The Author(s) 2016. Published by Oxford University Press.

  7. QCD dipole prediction for dis and diffractive structure functions

    International Nuclear Information System (INIS)

    Royon, CH.

    1996-01-01

    The F 2 , F G , R = F L /F T proton structure functions are derived in the QCD dipole picture of BFKL dynamics. We get a three parameter fit describing the 1994 H1 proton structure function F 2 data in the low x, moderate Q 2 range. Without any additional parameter, the gluon density and the longitudinal structure functions are predicted. The diffractive dissociation processes are also discussed, and a new prediction for the proton diffractive structure function is obtained. (author)

  8. Search for new physics with boosted Z bosons at CMS and development of an IP-based control protocol for the CMS upgrades

    CERN Document Server

    Williams, Thomas Stephen; Shepherd-Themistocleous, Claire

    2015-01-01

    From 2009 to 2012, the Large Hadron Collider has been colliding proton beams at centreof-mass energies from 900 GeV to 8 TeV. The particles produced from these collisions havebeen recorded using the Compact Muon Solenoid (CMS) detector, with the purpose ofinvestigating the predictions of the Standard Model of particle physics, and searching forphenomena predicted by theories of physics beyond the Standard Model.A search for new heavy particles whose decays involve boosted Z bosons is presented,√based on the di-electron decay channel, using 19.7 fb−1 of s = 8 TeV pp collision datacollected by the CMS detector in 2012. In the context of this search, the selection efficiencyfor highly-boosted Z bosons has been studied extensively in both data and simulation. Noevidence for new particles is observed, and upper limits on the cross section for production ofexcited quarks are obtained. The analysis excludes, at 95 % confidence level, the productionof excited quarks decaying to qZ for excited quark masses below ...

  9. Search Method Based on Figurative Indexation of Folksonomic Features of Graphic Files

    Directory of Open Access Journals (Sweden)

    Oleg V. Bisikalo

    2013-11-01

    Full Text Available In this paper the search method based on usage of figurative indexation of folksonomic characteristics of graphical files is described. The method takes into account extralinguistic information, is based on using a model of figurative thinking of humans. The paper displays the creation of a method of searching image files based on their formal, including folksonomical clues.

  10. Clinician search behaviors may be influenced by search engine design.

    Science.gov (United States)

    Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul

    2010-06-30

    Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that

  11. Hybrid Optimization in the Design of Reciprocal Structures

    DEFF Research Database (Denmark)

    Parigi, Dario; Kirkegaard, Poul Henning; Sassone, Mario

    2012-01-01

    that explore the global domain of solutions as genetic algorithms (GAs). The benchmark tests show that when the control on the topology is required the best result is obtained by a hybrid approach that combines the global search of the GA with the local search of a GB algorithm. The optimization method......The paper presents a method to generate the geometry of reciprocal structures by means of a hybrid optimization procedure. The geometry of reciprocal structures where elements are sitting on the top or in the bottom of each other is extremely difficult to predict because of the non....... In this paper it is shown that the geometrically compatible position of the elements could be determined by local search algorithm gradient-based (GB). However the control on which bar sit on the top or in the bottom at each connection can be regarded as a topological problem and require the use of algorithms...

  12. Pedestrian Tracking Based on Camshift with Kalman Prediction for Autonomous Vehicles

    Directory of Open Access Journals (Sweden)

    Lie Guo

    2016-06-01

    Full Text Available Pedestrian detection and tracking is the key to autonomous vehicle navigation systems avoiding potentially dangerous situations. Firstly, the probability distribution of colour information is established after a pedestrian is located in an image. Then the detected results are utilized to initialize a Kalman filter to predict the possible position of the pedestrian centroid in the future frame. A Camshift tracking algorithm is used to track the pedestrian in the specific search window of the next frame based on the prediction results. The actual position of the pedestrian centroid is output from the Camshift tracking algorithm to update the gain and error covariance matrix of the Kalman filter. Experimental results in real traffic situations show the proposed pedestrian tracking algorithm can achieve good performance even when they are partly occluded in inconsistent illumination circumstances.

  13. A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories

    Science.gov (United States)

    Nguyen, Hung D.; Steele, Gynelle C.

    2016-01-01

    This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.

  14. A computational approach for the annotation of hydrogen-bonded base interactions in crystallographic structures of the ribozymes

    Energy Technology Data Exchange (ETDEWEB)

    Hamdani, Hazrina Yusof, E-mail: hazrina@mfrlab.org [School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi (Malaysia); Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas (Malaysia); Artymiuk, Peter J., E-mail: p.artymiuk@sheffield.ac.uk [Dept. of Molecular Biology and Biotechnology, Firth Court, University of Sheffield, S10 T2N Sheffield (United Kingdom); Firdaus-Raih, Mohd, E-mail: firdaus@mfrlab.org [School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi (Malaysia)

    2015-09-25

    A fundamental understanding of the atomic level interactions in ribonucleic acid (RNA) and how they contribute towards RNA architecture is an important knowledge platform to develop through the discovery of motifs from simple arrangements base pairs, to more complex arrangements such as triples and larger patterns involving non-standard interactions. The network of hydrogen bond interactions is important in connecting bases to form potential tertiary motifs. Therefore, there is an urgent need for the development of automated methods for annotating RNA 3D structures based on hydrogen bond interactions. COnnection tables Graphs for Nucleic ACids (COGNAC) is automated annotation system using graph theoretical approaches that has been developed for the identification of RNA 3D motifs. This program searches for patterns in the unbroken networks of hydrogen bonds for RNA structures and capable of annotating base pairs and higher-order base interactions, which ranges from triples to sextuples. COGNAC was able to discover 22 out of 32 quadruples occurrences of the Haloarcula marismortui large ribosomal subunit (PDB ID: 1FFK) and two out of three occurrences of quintuple interaction reported by the non-canonical interactions in RNA (NCIR) database. These and several other interactions of interest will be discussed in this paper. These examples demonstrate that the COGNAC program can serve as an automated annotation system that can be used to annotate conserved base-base interactions and could be added as additional information to established RNA secondary structure prediction methods.

  15. A computational approach for the annotation of hydrogen-bonded base interactions in crystallographic structures of the ribozymes

    International Nuclear Information System (INIS)

    Hamdani, Hazrina Yusof; Artymiuk, Peter J.; Firdaus-Raih, Mohd

    2015-01-01

    A fundamental understanding of the atomic level interactions in ribonucleic acid (RNA) and how they contribute towards RNA architecture is an important knowledge platform to develop through the discovery of motifs from simple arrangements base pairs, to more complex arrangements such as triples and larger patterns involving non-standard interactions. The network of hydrogen bond interactions is important in connecting bases to form potential tertiary motifs. Therefore, there is an urgent need for the development of automated methods for annotating RNA 3D structures based on hydrogen bond interactions. COnnection tables Graphs for Nucleic ACids (COGNAC) is automated annotation system using graph theoretical approaches that has been developed for the identification of RNA 3D motifs. This program searches for patterns in the unbroken networks of hydrogen bonds for RNA structures and capable of annotating base pairs and higher-order base interactions, which ranges from triples to sextuples. COGNAC was able to discover 22 out of 32 quadruples occurrences of the Haloarcula marismortui large ribosomal subunit (PDB ID: 1FFK) and two out of three occurrences of quintuple interaction reported by the non-canonical interactions in RNA (NCIR) database. These and several other interactions of interest will be discussed in this paper. These examples demonstrate that the COGNAC program can serve as an automated annotation system that can be used to annotate conserved base-base interactions and could be added as additional information to established RNA secondary structure prediction methods

  16. A product feature-based user-centric product search model

    OpenAIRE

    Ben Jabeur, Lamjed; Soulier, Laure; Tamine, Lynda; Mousset, Paul

    2016-01-01

    During the online shopping process, users would search for interesting products and quickly access those that fit with their needs among a long tail of similar or closely related products. Our contribution addresses head queries that are frequently submitted on e-commerce Web sites. Head queries usually target featured products with several variations, accessories, and complementary products. We present in this paper a product feature-based user-centric model for product search involving in a...

  17. Correlating Structural Order with Structural Rearrangement in Dusty Plasma Liquids: Can Structural Rearrangement be Predicted by Static Structural Information?

    Science.gov (United States)

    Su, Yen-Shuo; Liu, Yu-Hsuan; I, Lin

    2012-11-01

    Whether the static microstructural order information is strongly correlated with the subsequent structural rearrangement (SR) and their predicting power for SR are investigated experimentally in the quenched dusty plasma liquid with microheterogeneities. The poor local structural order is found to be a good alarm to identify the soft spot and predict the short term SR. For the site with good structural order, the persistent time for sustaining the structural memory until SR has a large mean value but a broad distribution. The deviation of the local structural order from that averaged over nearest neighbors serves as a good second alarm to further sort out the short time SR sites. It has the similar sorting power to that using the temporal fluctuation of the local structural order over a small time interval.

  18. A multicontroller structure for teaching and designing predictive control strategies

    International Nuclear Information System (INIS)

    Hodouin, D.; Desbiens, A.

    1999-01-01

    The paper deals with the unification of the existing linear control algorithms in order to facilitate their transfer to the engineering students and to industry's engineers. The resulting control algorithm is the Global Predictive Control (GlobPC), which is now taught at the graduate and continuing education levels. GlobPC is based on an internal model framework where three independent control criteria are minimized: one for tracking, one for regulation and one for feedforward. This structure allows to obtain desired tracking, regulation and feedforward behaviors in an optimal way while keeping them perfectly separated. It also cleanly separates the deterministic and stochastic predictions of the process model output. (author)

  19. Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

    Science.gov (United States)

    Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

    2018-03-01

    Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.

  20. Geostatistical Spatio-Time model of crime in el Salvador: Structural and Predictive Analysis

    Directory of Open Access Journals (Sweden)

    Welman Rosa Alvarado

    2011-07-01

    Full Text Available Today, to study a geospatial and spatio-temporal phenomena requires searching statistical tools that enable the analysis of the dependency of space, time and interactions. The science that studies this kind of subjects is the Geoestatics which the goal is to predict spatial phenomenon. This science is considered the base for modeling phenomena that involves interactions between space and time. In the past 10 years, the Geostatistic had seen a great development in areas like the geology, soils, remote sensing, epidemiology, agriculture, ecology, economy, etc. In this research, the geostatistic had been apply to build a predictive map about crime in El Salvador; for that the variability of space and time together is studied to generate crime scenarios: crime hot spots are determined, crime vulnerable groups are identified, to improve political decisions and facilitate to decision makers about the insecurity in the country.