WorldWideScience

Sample records for randomly selected sequences

  1. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

    Science.gov (United States)

    Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

    2014-01-13

    Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

  2. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  3. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  4. Blocked Randomization with Randomly Selected Block Sizes

    Directory of Open Access Journals (Sweden)

    Jimmy Efird

    2010-12-01

    Full Text Available When planning a randomized clinical trial, careful consideration must be given to how participants are selected for various arms of a study. Selection and accidental bias may occur when participants are not assigned to study groups with equal probability. A simple random allocation scheme is a process by which each participant has equal likelihood of being assigned to treatment versus referent groups. However, by chance an unequal number of individuals may be assigned to each arm of the study and thus decrease the power to detect statistically significant differences between groups. Block randomization is a commonly used technique in clinical trial design to reduce bias and achieve balance in the allocation of participants to treatment arms, especially when the sample size is small. This method increases the probability that each arm will contain an equal number of individuals by sequencing participant assignments by block. Yet still, the allocation process may be predictable, for example, when the investigator is not blind and the block size is fixed. This paper provides an overview of blocked randomization and illustrates how to avoid selection bias by using random block sizes.

  5. Generation of pseudo-random sequences for spread spectrum systems

    Science.gov (United States)

    Moser, R.; Stover, J.

    1985-05-01

    The characteristics of pseudo random radio signal sequences (PRS) are explored. The randomness of the PSR is a matter of artificially altering the sequence of binary digits broadcast. Autocorrelations of the two sequences shifted in time, if high, determine if the signals are the same and thus allow for position identification. Cross-correlation can also be calculated between sequences. Correlations closest to zero are obtained with large volume of prime numbers in the sequences. Techniques for selecting optimal and maximal lengths for the sequences are reviewed. If the correlations are near zero in the sequences, then signal channels can accommodate multiple users. Finally, Gold codes are discussed as a technique for maximizing the code lengths.

  6. Selective retrieval of memory and concept sequences through neuro-windows

    OpenAIRE

    Kakeya, Hideki; Okabe, Yoichi

    1999-01-01

    This letter presents a crosscorrelational associative memory model which realizes selective retrieval of pattern sequences. When hierarchically correlated sequences are memorized, sequences of the correlational centers can be defined as the concept sequences. The authors propose a modified neuro-window method which enables selective retrieval of memory sequences and concept sequences. It is also shown that the proposed model realizes capacity expansion of the memory which stores random sequen...

  7. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  8. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  9. LPTAU, Quasi Random Sequence Generator

    International Nuclear Information System (INIS)

    Sobol, Ilya M.

    1993-01-01

    1 - Description of program or function: LPTAU generates quasi random sequences. These are uniformly distributed sets of L=M N points in the N-dimensional unit cube: I N =[0,1]x...x[0,1]. These sequences are used as nodes for multidimensional integration; as searching points in global optimization; as trial points in multi-criteria decision making; as quasi-random points for quasi Monte Carlo algorithms. 2 - Method of solution: Uses LP-TAU sequence generation (see references). 3 - Restrictions on the complexity of the problem: The number of points that can be generated is L 30 . The dimension of the space cannot exceed 51

  10. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    Directory of Open Access Journals (Sweden)

    Xin Ma

    2015-01-01

    Full Text Available The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR method, followed by incremental feature selection (IFS. We incorporated features of conjoint triad features and three novel features: binding propensity (BP, nonbinding propensity (NBP, and evolutionary information combined with physicochemical properties (EIPP. The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient. High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  11. Design of Long Period Pseudo-Random Sequences from the Addition of -Sequences over

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of -sequences with pairwise-prime linear spans (AMPLS. Using -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to , a signal set is constructed.

  12. Statistical distributions of optimal global alignment scores of random protein sequences

    Directory of Open Access Journals (Sweden)

    Tang Jiaowei

    2005-10-01

    Full Text Available Abstract Background The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. Results In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. Conclusion We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.

  13. Image encryption using random sequence generated from generalized information domain

    International Nuclear Information System (INIS)

    Zhang Xia-Yan; Wu Jie-Hua; Zhang Guo-Ji; Li Xuan; Ren Ya-Zhou

    2016-01-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation–diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security. (paper)

  14. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  15. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  16. Bunches of random cross-correlated sequences

    International Nuclear Information System (INIS)

    Maystrenko, A A; Melnik, S S; Pritula, G M; Usatenko, O V

    2013-01-01

    The statistical properties of random cross-correlated sequences constructed by the convolution method (likewise referred to as the Rice or the inverse Fourier transformation) are examined. We clarify the meaning of the filtering function—the kernel of the convolution operator—and show that it is the value of the cross-correlation function which describes correlations between the initial white noise and constructed correlated sequences. The matrix generalization of this method for constructing a bunch of N cross-correlated sequences is presented. Algorithms for their generation are reduced to solving the problem of decomposition of the Fourier transform of the correlation matrix into a product of two mutually conjugate matrices. Different decompositions are considered. The limits of weak and strong correlations for the one-point probability and pair correlation functions of sequences generated by the method under consideration are studied. Special cases of heavy-tailed distributions of the generated sequences are analyzed. We show that, if the filtering function is rather smooth, the distribution function of generated variables has the Gaussian or Lévy form depending on the analytical properties of the distribution (or characteristic) functions of the initial white noise. Anisotropic properties of statistically homogeneous random sequences related to the asymmetry of a filtering function are revealed and studied. These asymmetry properties are expressed in terms of the third- or fourth-order correlation functions. Several examples of the construction of correlated chains with a predefined correlation matrix are given. (paper)

  17. Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

    Science.gov (United States)

    Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

    2007-01-01

    We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.

  18. A method for selecting cis-acting regulatory sequences that respond to small molecule effectors

    Directory of Open Access Journals (Sweden)

    Allas Ülar

    2010-08-01

    Full Text Available Abstract Background Several cis-acting regulatory sequences functioning at the level of mRNA or nascent peptide and specifically influencing transcription or translation have been described. These regulatory elements often respond to specific chemicals. Results We have developed a method that allows us to select cis-acting regulatory sequences that respond to diverse chemicals. The method is based on the β-lactamase gene containing a random sequence inserted into the beginning of the ORF. Several rounds of selection are used to isolate sequences that suppress β-lactamase expression in response to the compound under study. We have isolated sequences that respond to erythromycin, troleandomycin, chloramphenicol, meta-toluate and homoserine lactone. By introducing synonymous and non-synonymous mutations we have shown that at least in the case of erythromycin the sequences act at the peptide level. We have also tested the cross-activities of the constructs and found that in most cases the sequences respond most strongly to the compound on which they were isolated. Conclusions Several selected peptides showed ligand-specific changes in amino acid frequencies, but no consensus motif could be identified. This is consistent with previous observations on natural cis-acting peptides, showing that it is often impossible to demonstrate a consensus. Applying the currently developed method on a larger scale, by selecting and comparing an extended set of sequences, might allow the sequence rules underlying the activity of cis-acting regulatory peptides to be identified.

  19. Sequence selection by dynamical symmetry breaking in an autocatalytic binary polymer model

    DEFF Research Database (Denmark)

    Fellermann, Harold; Tanaka, Shinpei; Rasmussen, Steen

    2017-01-01

    Template-directed replication of nucleic acids is at the essence of all living beings and a major milestone for any origin of life scenario. We present an idealized model of prebiotic sequence replication, where binary polymers act as templates for their autocatalytic replication, thereby serving...... as each others reactants and products in an intertwined molecular ecology. Our model demonstrates how autocatalysis alters the qualitative and quantitative system dynamics in counterintuitive ways. Most notably, numerical simulations reveal a very strong intrinsic selection mechanism that favors...... the appearance of a few population structures with highly ordered and repetitive sequence patterns when starting from a pool of monomers. We demonstrate both analytically and through simulation how this "selection of the dullest" is caused by continued symmetry breaking through random fluctuations...

  20. Pseudo-random Trees: Multiple Independent Sequence Generators for Parallel and Branching Computations

    Science.gov (United States)

    Halton, John H.

    1989-09-01

    A class of families of linear congruential pseudo-random sequences is defined, for which it is possible to branch at any event without changing the sequence of random numbers used in the original random walk and for which the sequences in different branches show properties analogous to mutual statistical independence. This is a hitherto unavailable, and computationally desirable, tool.

  1. Design of Long Period Pseudo-Random Sequences from the Addition of m -Sequences over 𝔽 p

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of m -sequences with pairwise-prime linear spans (AMPLS. Using m -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to 𝔽 2 , a signal set ( ( 2 n − 1 ( 2 m − 1 , ( 2 n + 1 ( 2 m + 1 , ( 2 ( n + 1 / 2 + 1 ( 2 ( m + 1 / 2 + 1 is constructed.

  2. A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

    Science.gov (United States)

    Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

    2011-07-01

    An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.

  3. A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

    International Nuclear Information System (INIS)

    Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

    2011-01-01

    An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications

  4. Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

    Science.gov (United States)

    Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

    2016-04-28

    Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. A symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences

    International Nuclear Information System (INIS)

    Xiao Fanghong

    2004-01-01

    By considering a chaotic pseudo-random sequence as a symbolic sequence, authors present a symbolic dynamics approach for the complexity analysis of chaotic pseudo-random sequences. The method is applied to the cases of Logistic map and one-way coupled map lattice to demonstrate how it works, and a comparison is made between it and the approximate entropy method. The results show that this method is applicable to distinguish the complexities of different chaotic pseudo-random sequences, and it is superior to the approximate entropy method

  6. 47 CFR 1.1602 - Designation for random selection.

    Science.gov (United States)

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Designation for random selection. 1.1602 Section 1.1602 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1602 Designation for random selection...

  7. 47 CFR 1.1603 - Conduct of random selection.

    Science.gov (United States)

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Conduct of random selection. 1.1603 Section 1.1603 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1603 Conduct of random selection. The...

  8. Correlations of pseudo-random numbers of multiplicative sequence

    International Nuclear Information System (INIS)

    Bukin, A.D.

    1989-01-01

    An algorithm is suggested for searching with a computer in unit n-dimensional cube the sets of planes where all the points fall whose coordinates are composed of n successive pseudo-random numbers of multiplicative sequence. This effect should be taken into account in Monte-Carlo calculations with definite constructive dimension. The parameters of these planes are obtained for three random number generators. 2 refs.; 2 tabs

  9. A Method to Select Software Test Cases in Consideration of Past Input Sequence

    International Nuclear Information System (INIS)

    Kim, Hee Eun; Kim, Bo Gyung; Kang, Hyun Gook

    2015-01-01

    In the Korea Nuclear I and C Systems (KNICS) project, the software for the fully-digitalized reactor protection system (RPS) was developed under a strict procedure. Even though the behavior of the software is deterministic, the randomness of input sequence produces probabilistic behavior of software. A software failure occurs when some inputs to the software occur and interact with the internal state of the digital system to trigger a fault that was introduced into the software during the software lifecycle. In this paper, the method to select test set for software failure probability estimation is suggested. This test set reflects past input sequence of software, which covers all possible cases. In this study, the method to select test cases for software failure probability quantification was suggested. To obtain profile of paired state variables, relationships of the variables need to be considered. The effect of input from human operator also have to be considered. As an example, test set of PZR-PR-Lo-Trip logic was examined. This method provides framework for selecting test cases of safety-critical software

  10. Perceptions of randomness in binary sequences: Normative, heuristic, or both?

    Science.gov (United States)

    Reimers, Stian; Donkin, Chris; Le Pelley, Mike E

    2018-03-01

    When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH) rational norms based on limited experience. We test these accounts. Participants in Experiment 1 rated the likelihood of occurrence for all possible strings of 4, 5, and 6 observations in a sequence of coin flips. Judgments were better explained by representativeness in alternation rate, relative proportion of heads and tails, and sequence complexity, than by objective probabilities. Experiments 2 and 3 gave similar results using incentivized binary choice procedures. Overall the evidence suggests that participants are not sensitive to variation in objective probabilities of a sub-sequence occurring; they appear to use heuristics based on several distinct forms of representativeness. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Computationally Efficient Chaotic Spreading Sequence Selection for Asynchronous DS-CDMA

    Directory of Open Access Journals (Sweden)

    Litviņenko Anna

    2017-12-01

    Full Text Available The choice of the spreading sequence for asynchronous direct-sequence code-division multiple-access (DS-CDMA systems plays a crucial role for the mitigation of multiple-access interference. Considering the rich dynamics of chaotic sequences, their use for spreading allows overcoming the limitations of the classical spreading sequences. However, to ensure low cross-correlation between the sequences, careful selection must be performed. This paper presents a novel exhaustive search algorithm, which allows finding sets of chaotic spreading sequences of required length with a particularly low mutual cross-correlation. The efficiency of the search is verified by simulations, which show a significant advantage compared to non-selected chaotic sequences. Moreover, the impact of sequence length on the efficiency of the selection is studied.

  12. Affinity selection of Nipah and Hendra virus-related vaccine candidates from a complex random peptide library displayed on bacteriophage virus-like particles

    Energy Technology Data Exchange (ETDEWEB)

    Peabody, David S.; Chackerian, Bryce; Ashley, Carlee; Carnes, Eric; Negrete, Oscar

    2017-01-24

    The invention relates to virus-like particles of bacteriophage MS2 (MS2 VLPs) displaying peptide epitopes or peptide mimics of epitopes of Nipah Virus envelope glycoprotein that elicit an immune response against Nipah Virus upon vaccination of humans or animals. Affinity selection on Nipah Virus-neutralizing monoclonal antibodies using random sequence peptide libraries on MS2 VLPs selected peptides with sequence similarity to peptide sequences found within the envelope glycoprotein of Nipah itself, thus identifying the epitopes the antibodies recognize. The selected peptide sequences themselves are not necessarily identical in all respects to a sequence within Nipah Virus glycoprotein, and therefore may be referred to as epitope mimics VLPs displaying these epitope mimics can serve as vaccine. On the other hand, display of the corresponding wild-type sequence derived from Nipah Virus and corresponding to the epitope mapped by affinity selection, may also be used as a vaccine.

  13. Cryptographic pseudo-random sequence from the spatial chaotic map

    International Nuclear Information System (INIS)

    Sun Fuyan; Liu Shutang

    2009-01-01

    A scheme for pseudo-random binary sequence generation based on the spatial chaotic map is proposed. In order to face the challenge of using the proposed PRBS in cryptography, the proposed PRBS is subjected to statistical tests which are the well-known FIPS-140-1 in the area of cryptography, and correlation properties of the proposed sequences are investigated. The proposed PRBS successfully passes all these tests. Results of statistical testing of the sequences are found encouraging. The results of statistical tests suggest strong candidature for cryptographic applications.

  14. Humans can consciously generate random number sequences: a possible test for artificial intelligence.

    Science.gov (United States)

    Persaud, Navindra

    2005-01-01

    Computer algorithms can only produce seemingly random or pseudorandom numbers whereas certain natural phenomena, such as the decay of radioactive particles, can be utilized to produce truly random numbers. In this study, the ability of humans to generate random numbers was tested in healthy adults. Subjects were simply asked to generate and dictate random numbers. Generated numbers were tested for uniformity, independence and information density. The results suggest that humans can generate random numbers that are uniformly distributed, independent of one another and unpredictable. If humans can generate sequences of random numbers then neural networks or forms of artificial intelligence, which are purported to function in ways essentially the same as the human brain, should also be able to generate sequences of random numbers. Elucidating the precise mechanism by which humans generate random number sequences and the underlying neural substrates may have implications in the cognitive science of decision-making. It is possible that humans use their random-generating neural machinery to make difficult decisions in which all expected outcomes are similar. It is also possible that certain people, perhaps those with neurological or psychiatric impairments, are less able or unable to generate random numbers. If the random-generating neural machinery is employed in decision making its impairment would have profound implications in matters of agency and free will.

  15. Primitive polynomials selection method for pseudo-random number generator

    Science.gov (United States)

    Anikin, I. V.; Alnajjar, Kh

    2018-01-01

    In this paper we suggested the method for primitive polynomials selection of special type. This kind of polynomials can be efficiently used as a characteristic polynomials for linear feedback shift registers in pseudo-random number generators. The proposed method consists of two basic steps: finding minimum-cost irreducible polynomials of the desired degree and applying primitivity tests to get the primitive ones. Finally two primitive polynomials, which was found by the proposed method, used in pseudorandom number generator based on fuzzy logic (FRNG) which had been suggested before by the authors. The sequences generated by new version of FRNG have low correlation magnitude, high linear complexity, less power consumption, is more balanced and have better statistical properties.

  16. Magnetic nanoparticle imaging by random and maximum length sequences of inhomogeneous activation fields.

    Science.gov (United States)

    Baumgarten, Daniel; Eichardt, Roland; Crevecoeur, Guillaume; Supriyanto, Eko; Haueisen, Jens

    2013-01-01

    Biomedical applications of magnetic nanoparticles require a precise knowledge of their biodistribution. From multi-channel magnetorelaxometry measurements, this distribution can be determined by means of inverse methods. It was recently shown that the combination of sequential inhomogeneous excitation fields in these measurements is favorable regarding the reconstruction accuracy when compared to homogeneous activation . In this paper, approaches for the determination of activation sequences for these measurements are investigated. Therefor, consecutive activation of single coils, random activation patterns and families of m-sequences are examined in computer simulations involving a sample measurement setup and compared with respect to the relative condition number of the system matrix. We obtain that the values of this condition number decrease with larger number of measurement samples for all approaches. Random sequences and m-sequences reveal similar results with a significant reduction of the required number of samples. We conclude that the application of pseudo-random sequences for sequential activation in the magnetorelaxometry imaging of magnetic nanoparticles considerably reduces the number of required sequences while preserving the relevant measurement information.

  17. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    Science.gov (United States)

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  18. Chemical rationale for selection of isolates for genome sequencing

    DEFF Research Database (Denmark)

    Rank, Christian; Larsen, Thomas Ostenfeld; Frisvad, Jens Christian

    The advances in gene sequencing will in the near future enable researchers to affordably acquire the full genomes of handpicked isolates. We here present a method to evaluate the chemical potential of an entire species and select representatives for genome sequencing. The selection criteria for new...... strains to be sequenced can be manifold, but for studying the functional phenotype, using a metabolome based approach offers a cheap and rapid assessment of critical strains to cover the chemical diversity. We have applied this methodology on the complex A. flavus/A. oryzae group. Though these two species...... are in principal identical, they represent two different phenotypes. This is clearly presented through a correspondence analysis of selected extrolites, in which the subtle chemical differences are visually dispersed. The results points to a handful of strains, which, if sequenced, will likely enhance our...

  19. The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population.

    Science.gov (United States)

    Nordborg, Magnus; Innan, Hideki

    2003-03-01

    A stochastic model for the genealogy of a sample of recombining sequences containing one or more sites subject to selection in a subdivided population is described. Selection is incorporated by dividing the population into allelic classes and then conditioning on the past sizes of these classes. The past allele frequencies at the selected sites are thus treated as parameters rather than as random variables. The purpose of the model is not to investigate the dynamics of selection, but to investigate effects of linkage to the selected sites on the genealogy of the surrounding chromosomal region. This approach is useful for modeling strong selection, when it is natural to parameterize the past allele frequencies at the selected sites. Several models of strong balancing selection are used as examples, and the effects on the pattern of neutral polymorphism in the chromosomal region are discussed. We focus in particular on the statistical power to detect balancing selection when it is present.

  20. Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

    Directory of Open Access Journals (Sweden)

    Jinjian Jiang

    2017-07-01

    Full Text Available Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.

  1. Selection of mRNA 5'-untranslated region sequence with high translation efficiency through ribosome display

    International Nuclear Information System (INIS)

    Mie, Masayasu; Shimizu, Shun; Takahashi, Fumio; Kobatake, Eiry

    2008-01-01

    The 5'-untranslated region (5'-UTR) of mRNAs functions as a translation enhancer, promoting translation efficiency. Many in vitro translation systems exhibit a reduced efficiency in protein translation due to decreased translation initiation. The use of a 5'-UTR sequence with high translation efficiency greatly enhances protein production in these systems. In this study, we have developed an in vitro selection system that favors 5'-UTRs with high translation efficiency using a ribosome display technique. A 5'-UTR random library, comprised of 5'-UTRs tagged with a His-tag and Renilla luciferase (R-luc) fusion, were in vitro translated in rabbit reticulocytes. By limiting the translation period, only mRNAs with high translation efficiency were translated. During translation, mRNA, ribosome and translated R-luc with His-tag formed ternary complexes. They were collected with translated His-tag using Ni-particles. Extracted mRNA from ternary complex was amplified using RT-PCR and sequenced. Finally, 5'-UTR with high translation efficiency was obtained from random 5'-UTR library

  2. Sequence Selection and Performance in DS/CDMA Systems

    Directory of Open Access Journals (Sweden)

    Jefferson Santos Ambrosio

    2016-03-01

    Full Text Available In this work key concepts on coding division multiple access (CDMA communication systems have been discussed. The sequence selection impact on the performance and capacity of direct sequence CDMA (DS/CDMA systems under AWGN and increasing system loading, as well as under multiple antennas channels was investigated.

  3. The RANDOM computer program: A linear congruential random number generator

    Science.gov (United States)

    Miles, R. F., Jr.

    1986-01-01

    The RANDOM Computer Program is a FORTRAN program for generating random number sequences and testing linear congruential random number generators (LCGs). The linear congruential form of random number generator is discussed, and the selection of parameters of an LCG for a microcomputer described. This document describes the following: (1) The RANDOM Computer Program; (2) RANDOM.MOD, the computer code needed to implement an LCG in a FORTRAN program; and (3) The RANCYCLE and the ARITH Computer Programs that provide computational assistance in the selection of parameters for an LCG. The RANDOM, RANCYCLE, and ARITH Computer Programs are written in Microsoft FORTRAN for the IBM PC microcomputer and its compatibles. With only minor modifications, the RANDOM Computer Program and its LCG can be run on most micromputers or mainframe computers.

  4. Selectivity and sparseness in randomly connected balanced networks.

    Directory of Open Access Journals (Sweden)

    Cengiz Pehlevan

    Full Text Available Neurons in sensory cortex show stimulus selectivity and sparse population response, even in cases where no strong functionally specific structure in connectivity can be detected. This raises the question whether selectivity and sparseness can be generated and maintained in randomly connected networks. We consider a recurrent network of excitatory and inhibitory spiking neurons with random connectivity, driven by random projections from an input layer of stimulus selective neurons. In this architecture, the stimulus-to-stimulus and neuron-to-neuron modulation of total synaptic input is weak compared to the mean input. Surprisingly, we show that in the balanced state the network can still support high stimulus selectivity and sparse population response. In the balanced state, strong synapses amplify the variation in synaptic input and recurrent inhibition cancels the mean. Functional specificity in connectivity emerges due to the inhomogeneity caused by the generative statistical rule used to build the network. We further elucidate the mechanism behind and evaluate the effects of model parameters on population sparseness and stimulus selectivity. Network response to mixtures of stimuli is investigated. It is shown that a balanced state with unselective inhibition can be achieved with densely connected input to inhibitory population. Balanced networks exhibit the "paradoxical" effect: an increase in excitatory drive to inhibition leads to decreased inhibitory population firing rate. We compare and contrast selectivity and sparseness generated by the balanced network to randomly connected unbalanced networks. Finally, we discuss our results in light of experiments.

  5. Random Sequence for Optimal Low-Power Laser Generated Ultrasound

    Science.gov (United States)

    Vangi, D.; Virga, A.; Gulino, M. S.

    2017-08-01

    Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.

  6. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  7. Testing, Selection, and Implementation of Random Number Generators

    National Research Council Canada - National Science Library

    Collins, Joseph C

    2008-01-01

    An exhaustive evaluation of state-of-the-art random number generators with several well-known suites of tests provides the basis for selection of suitable random number generators for use in stochastic simulations...

  8. Automatic generation of randomized trial sequences for priming experiments.

    Science.gov (United States)

    Ihrke, Matthias; Behrendt, Jörg

    2011-01-01

    In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.

  9. GenRGenS: Software for Generating Random Genomic Sequences and Structures

    OpenAIRE

    Ponty , Yann; Termier , Michel; Denise , Alain

    2006-01-01

    International audience; GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes of models useful for sequence analysis, such as Markov chains, hidden Markov models, weighted context-free grammars, regular expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free grammars, thus allowing the user to model and to generate structured objects (such as RNA secondary structures) of any giv...

  10. RANDOMNUMBERS, Random Number Sequence Generated from Gas Ionisation Chamber Data

    International Nuclear Information System (INIS)

    Frigerio, N.A.; Sanathanan, L.P.; Morley, M.; Tyler, S.A.; Clark, N.A.; Wang, J.

    1989-01-01

    1 - Description of problem or function: RANDOM NUMBERS is a data collection of almost 2.7 million 31-bit random numbers generated by using a high resolution gas ionization detector chamber in conjunction with a 4096-channel multichannel analyzer to record the radioactive decay of alpha particles from a U-235 source. The signals from the decaying alpha particles were fed to the 4096-channel analyzer, and for each channel the frequency of signals registered in a 20,000-microsecond interval was recorded. The parity bits of these frequency counts, 0 for an even count and 1 for and odd count, were then assembled in sequence to form 31-bit random numbers and transcribed onto magnetic tape. This cycle was repeated to obtain the random numbers. 2 - Method of solution: The frequency distribution of counts from the device conforms to the Brockwell-Moyal distribution which takes into account the dead time of the counter. The count data were analyzed and tests for randomness on a sample indicate that the device is a highly reliable source of truly random numbers. 3 - Restrictions on the complexity of the problem: The RANDOM NUMBERS tape contains 2,669,568 31-bit numbers

  11. Long period pseudo random number sequence generator

    Science.gov (United States)

    Wang, Charles C. (Inventor)

    1989-01-01

    A circuit for generating a sequence of pseudo random numbers, (A sub K). There is an exponentiator in GF(2 sup m) for the normal basis representation of elements in a finite field GF(2 sup m) each represented by m binary digits and having two inputs and an output from which the sequence (A sub K). Of pseudo random numbers is taken. One of the two inputs is connected to receive the outputs (E sub K) of maximal length shift register of n stages. There is a switch having a pair of inputs and an output. The switch outputs is connected to the other of the two inputs of the exponentiator. One of the switch inputs is connected for initially receiving a primitive element (A sub O) in GF(2 sup m). Finally, there is a delay circuit having an input and an output. The delay circuit output is connected to the other of the switch inputs and the delay circuit input is connected to the output of the exponentiator. Whereby after the exponentiator initially receives the primitive element (A sub O) in GF(2 sup m) through the switch, the switch can be switched to cause the exponentiator to receive as its input a delayed output A(K-1) from the exponentiator thereby generating (A sub K) continuously at the output of the exponentiator. The exponentiator in GF(2 sup m) is novel and comprises a cyclic-shift circuit; a Massey-Omura multiplier; and, a control logic circuit all operably connected together to perform the function U(sub i) = 92(sup i) (for n(sub i) = 1 or 1 (for n(subi) = 0).

  12. Experimental Rugged Fitness Landscape in Protein Sequence Space

    OpenAIRE

    HAYASHI, Yuuki; 相田, 拓洋; TOYOTA, Hitoshi; 伏見, 譲; URABE, Itaru; YOMO, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phag...

  13. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  14. Application of random effects to the study of resource selection by animals.

    Science.gov (United States)

    Gillies, Cameron S; Hebblewhite, Mark; Nielsen, Scott E; Krawchuk, Meg A; Aldridge, Cameron L; Frair, Jacqueline L; Saher, D Joanne; Stevens, Cameron E; Jerde, Christopher L

    2006-07-01

    1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions

  15. Local randomization in neighbor selection improves PRM roadmap quality

    KAUST Repository

    McMahon, Troy; Jacobs, Sam; Boyd, Bryan; Tapia, Lydia; Amato, Nancy M.

    2012-01-01

    Probabilistic Roadmap Methods (PRMs) are one of the most used classes of motion planning methods. These sampling-based methods generate robot configurations (nodes) and then connect them to form a graph (roadmap) containing representative feasible pathways. A key step in PRM roadmap construction involves identifying a set of candidate neighbors for each node. Traditionally, these candidates are chosen to be the k-closest nodes based on a given distance metric. In this paper, we propose a new neighbor selection policy called LocalRand(k,K'), that first computes the K' closest nodes to a specified node and then selects k of those nodes at random. Intuitively, LocalRand attempts to benefit from random sampling while maintaining the higher levels of local planner success inherent to selecting more local neighbors. We provide a methodology for selecting the parameters k and K'. We perform an experimental comparison which shows that for both rigid and articulated robots, LocalRand results in roadmaps that are better connected than the traditional k-closest policy or a purely random neighbor selection policy. The cost required to achieve these results is shown to be comparable to k-closest. © 2012 IEEE.

  16. Local randomization in neighbor selection improves PRM roadmap quality

    KAUST Repository

    McMahon, Troy

    2012-10-01

    Probabilistic Roadmap Methods (PRMs) are one of the most used classes of motion planning methods. These sampling-based methods generate robot configurations (nodes) and then connect them to form a graph (roadmap) containing representative feasible pathways. A key step in PRM roadmap construction involves identifying a set of candidate neighbors for each node. Traditionally, these candidates are chosen to be the k-closest nodes based on a given distance metric. In this paper, we propose a new neighbor selection policy called LocalRand(k,K\\'), that first computes the K\\' closest nodes to a specified node and then selects k of those nodes at random. Intuitively, LocalRand attempts to benefit from random sampling while maintaining the higher levels of local planner success inherent to selecting more local neighbors. We provide a methodology for selecting the parameters k and K\\'. We perform an experimental comparison which shows that for both rigid and articulated robots, LocalRand results in roadmaps that are better connected than the traditional k-closest policy or a purely random neighbor selection policy. The cost required to achieve these results is shown to be comparable to k-closest. © 2012 IEEE.

  17. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  18. Chirality- and sequence-selective successive self-sorting via specific homo- and complementary-duplex formations

    Science.gov (United States)

    Makiguchi, Wataru; Tanabe, Junki; Yamada, Hidekazu; Iida, Hiroki; Taura, Daisuke; Ousaka, Naoki; Yashima, Eiji

    2015-01-01

    Self-recognition and self-discrimination within complex mixtures are of fundamental importance in biological systems, which entirely rely on the preprogrammed monomer sequences and homochirality of biological macromolecules. Here we report artificial chirality- and sequence-selective successive self-sorting of chiral dimeric strands bearing carboxylic acid or amidine groups joined by chiral amide linkers with different sequences through homo- and complementary-duplex formations. A mixture of carboxylic acid dimers linked by racemic-1,2-cyclohexane bis-amides with different amide sequences (NHCO or CONH) self-associate to form homoduplexes in a completely sequence-selective way, the structures of which are different from each other depending on the linker amide sequences. The further addition of an enantiopure amide-linked amidine dimer to a mixture of the racemic carboxylic acid dimers resulted in the formation of a single optically pure complementary duplex with a 100% diastereoselectivity and complete sequence specificity stabilized by the amidinium–carboxylate salt bridges, leading to the perfect chirality- and sequence-selective duplex formation. PMID:26051291

  19. The algorithm of random length sequences synthesis for frame synchronization of digital television systems

    Directory of Open Access Journals (Sweden)

    Аndriy V. Sadchenko

    2015-12-01

    Full Text Available Digital television systems need to ensure that all digital signals processing operations are performed simultaneously and consistently. Frame synchronization dictated by the need to match phases of transmitter and receiver so that it would be possible to identify the start of a frame. As a frame synchronization signals are often used long length binary sequence with good aperiodic autocorrelation function. Aim: This work is dedicated to the development of the algorithm of random length sequences synthesis. Materials and Methods: The paper provides a comparative analysis of the known sequences, which can be used at present as synchronization ones, revealed their advantages and disadvantages. This work proposes the algorithm for the synthesis of binary synchronization sequences of random length with good autocorrelation properties based on noise generator with a uniform distribution law of probabilities. A "white noise" semiconductor generator is proposed to use as the initial material for the synthesis of binary sequences with desired properties. Results: The statistical analysis of the initial implementations of the "white noise" and synthesized sequences for frame synchronization of digital television is conducted. The comparative analysis of the synthesized sequences with known ones was carried out. The results show the benefits of obtained sequences in compare with known ones. The performed simulations confirm the obtained results. Conclusions: Thus, the search algorithm of binary synchronization sequences with desired autocorrelation properties received. According to this algorithm, the sequence can be longer in length and without length limitations. The received sync sequence can be used for frame synchronization in modern digital communication systems that will increase their efficiency and noise immunity.

  20. Randomizer for High Data Rates

    Science.gov (United States)

    Garon, Howard; Sank, Victor J.

    2018-01-01

    NASA as well as a number of other space agencies now recognize that the current recommended CCSDS randomizer used for telemetry (TM) is too short. When multiple applications of the PN8 Maximal Length Sequence (MLS) are required in order to fully cover a channel access data unit (CADU), spectral problems in the form of elevated spurious discretes (spurs) appear. Originally the randomizer was called a bit transition generator (BTG) precisely because it was thought that its primary value was to insure sufficient bit transitions to allow the bit/symbol synchronizer to lock and remain locked. We, NASA, have shown that the old BTG concept is a limited view of the real value of the randomizer sequence and that the randomizer also aids in signal acquisition as well as minimizing the potential for false decoder lock. Under the guidelines we considered here there are multiple maximal length sequences under GF(2) which appear attractive in this application. Although there may be mitigating reasons why another MLS sequence could be selected, one sequence in particular possesses a combination of desired properties which offsets it from the others.

  1. Random sequences are an abundant source of bioactive RNAs or peptides

    DEFF Research Database (Denmark)

    Neme, Rafik; Amador, Cristina; Yildirim, Burcin

    2017-01-01

    It is generally assumed that new genes arise through duplication and/or recombination of existing genes. The probability that a new functional gene could arise out of random non-coding DNA is so far considered to be negligible, as it seems unlikely that such an RNA or protein sequence could have ...

  2. Identification of (R)-selective ω-aminotransferases by exploring evolutionary sequence space.

    Science.gov (United States)

    Kim, Eun-Mi; Park, Joon Ho; Kim, Byung-Gee; Seo, Joo-Hyun

    2018-03-01

    Several (R)-selective ω-aminotransferases (R-ωATs) have been reported. The existence of additional R-ωATs having different sequence characteristics from previous ones is highly expected. In addition, it is generally accepted that R-ωATs are variants of aminotransferase group III. Based on these backgrounds, sequences in RefSeq database were scored using family profiles of branched-chain amino acid aminotransferase (BCAT) and d-alanine aminotransferase (DAT) to predict and identify putative R-ωATs. Sequences with two profile analysis scores were plotted on two-dimensional score space. Candidates with relatively similar scores in both BCAT and DAT profiles (i.e., profile analysis score using BCAT profile was similar to profile analysis score using DAT profile) were selected. Experimental results for selected candidates showed that putative R-ωATs from Saccharopolyspora erythraea (R-ωAT_Sery), Bacillus cellulosilyticus (R-ωAT_Bcel), and Bacillus thuringiensis (R-ωAT_Bthu) had R-ωAT activity. Additional experiments revealed that R-ωAT_Sery also possessed DAT activity while R-ωAT_Bcel and R-ωAT_Bthu had BCAT activity. Selecting putative R-ωATs from regions with similar profile analysis scores identified potential R-ωATs. Therefore, R-ωATs could be efficiently identified by using simple family profile analysis and exploring evolutionary sequence space. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  4. Sequence-selective single-molecule alkylation with a pyrrole-imidazole polyamide visualized in a DNA nanoscaffold.

    Science.gov (United States)

    Yoshidome, Tomofumi; Endo, Masayuki; Kashiwazaki, Gengo; Hidaka, Kumi; Bando, Toshikazu; Sugiyama, Hiroshi

    2012-03-14

    We demonstrate a novel strategy for visualizing sequence-selective alkylation of target double-stranded DNA (dsDNA) using a synthetic pyrrole-imidazole (PI) polyamide in a designed DNA origami scaffold. Doubly functionalized PI polyamide was designed by introduction of an alkylating agent 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI) and biotin for sequence-selective alkylation at the target sequence and subsequent streptavidin labeling, respectively. Selective alkylation of the target site in the substrate DNA was observed by analysis using sequencing gel electrophoresis. For the single-molecule observation of the alkylation by functionalized PI polyamide using atomic force microscopy (AFM), the target position in the dsDNA (∼200 base pairs) was alkylated and then visualized by labeling with streptavidin. Newly designed DNA origami scaffold named "five-well DNA frame" carrying five different dsDNA sequences in its cavities was used for the detailed analysis of the sequence-selectivity and alkylation. The 64-mer dsDNAs were introduced to five individual wells, in which target sequence AGTXCCA/TGGYACT (XY = AT, TA, GC, CG) was employed as fully matched (X = G) and one-base mismatched (X = A, T, C) sequences. The fully matched sequence was alkylated with 88% selectivity over other mismatched sequences. In addition, the PI polyamide failed to attach to the target sequence lacking the alkylation site after washing and streptavidin treatment. Therefore, the PI polyamide discriminated the one mismatched nucleotide at the single-molecule level, and alkylation anchored the PI polyamide to the target dsDNA.

  5. Supplementation of Nucleosides During Selection can Reduce Sequence Variant Levels in CHO Cells Using GS/MSX Selection System.

    Science.gov (United States)

    Tang, Danming; Lam, Cynthia; Louie, Salina; Hoi, Kam Hon; Shaw, David; Yim, Mandy; Snedecor, Brad; Misaghi, Shahram

    2018-01-01

    In the process of generating stable monoclonal antibody (mAb) producing cell lines, reagents such as methotrexate (MTX) or methionine sulfoximine (MSX) are often used. However, using such selection reagent(s) increases the possibility of having higher occurrence of sequence variants in the expressed antibody molecules due to the effects of MTX or MSX on de novo nucleotide synthesis. Since MSX inhibits glutamine synthase (GS) and results in both amino acid and nucleoside starvation, it is questioned whether supplementing nucleosides into the media could lower sequence variant levels without affecting titer. The results show that the supplementation of nucleosides to the media during MSX selection decreased genomic DNA mutagenesis rates in the selected cells, probably by reducing nucleotide mis-incorporation into the DNA. Furthermore, addition of nucleosides enhance clone recovery post selection and does not affect antibody expression. It is further observed that nucleoside supplements lowered DNA mutagenesis rates only at the initial stage of the clone selection and do not have any effect on DNA mutagenesis rates after stable cell lines are established. Therefore, the data suggests that addition of nucleosides during early stages of MSX selection can lower sequence variant levels without affecting titer or clone stability in antibody expression. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  7. PRIMITIVE MATRICES AND GENERATORS OF PSEUDO RANDOM SEQUENCES OF GALOIS

    Directory of Open Access Journals (Sweden)

    A. Beletsky

    2014-04-01

    Full Text Available In theory and practice of information cryptographic protection one of the key problems is the forming a binary pseudo-random sequences (PRS with a maximum length with acceptable statistical characteristics. PRS generators are usually implemented by linear shift register (LSR of maximum period with linear feedback [1]. In this paper we extend the concept of LSR, assuming that each of its rank (memory cell can be in one of the following condition. Let’s call such registers “generalized linear shift register.” The research goal is to develop algorithms for constructing Galois and Fibonacci generalized matrix of n-order over the field , which uniquely determined both the structure of corresponding generalized of n-order LSR maximal period, and formed on their basis Galois PRS generators of maximum length. Thus the article presents the questions of formation the primitive generalized Fibonacci and Galois arbitrary order matrix over the prime field . The synthesis of matrices is based on the use of irreducible polynomials of degree and primitive elements of the extended field generated by polynomial. The constructing methods of Galois and Fibonacci conjugated primitive matrices are suggested. The using possibilities of such matrices in solving the problem of constructing generalized generators of Galois pseudo-random sequences are discussed.

  8. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.

    Directory of Open Access Journals (Sweden)

    Wangchao Lou

    Full Text Available Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that

  9. On theoretical models of gene expression evolution with random genetic drift and natural selection.

    Directory of Open Access Journals (Sweden)

    Osamu Ogasawara

    2009-11-01

    Full Text Available The relative contributions of natural selection and random genetic drift are a major source of debate in the study of gene expression evolution, which is hypothesized to serve as a bridge from molecular to phenotypic evolution. It has been suggested that the conflict between views is caused by the lack of a definite model of the neutral hypothesis, which can describe the long-run behavior of evolutionary change in mRNA abundance. Therefore previous studies have used inadequate analogies with the neutral prediction of other phenomena, such as amino acid or nucleotide sequence evolution, as the null hypothesis of their statistical inference.In this study, we introduced two novel theoretical models, one based on neutral drift and the other assuming natural selection, by focusing on a common property of the distribution of mRNA abundance among a variety of eukaryotic cells, which reflects the result of long-term evolution. Our results demonstrated that (1 our models can reproduce two independently found phenomena simultaneously: the time development of gene expression divergence and Zipf's law of the transcriptome; (2 cytological constraints can be explicitly formulated to describe long-term evolution; (3 the model assuming that natural selection optimized relative mRNA abundance was more consistent with previously published observations than the model of optimized absolute mRNA abundances.The models introduced in this study give a formulation of evolutionary change in the mRNA abundance of each gene as a stochastic process, on the basis of previously published observations. This model provides a foundation for interpreting observed data in studies of gene expression evolution, including identifying an adequate time scale for discriminating the effect of natural selection from that of random genetic drift of selectively neutral variations.

  10. Reduced randomness in quantum cryptography with sequences of qubits encoded in the same basis

    International Nuclear Information System (INIS)

    Lamoureux, L.-P.; Cerf, N. J.; Bechmann-Pasquinucci, H.; Gisin, N.; Macchiavello, C.

    2006-01-01

    We consider the cloning of sequences of qubits prepared in the states used in the BB84 or six-state quantum cryptography protocol, and show that the single-qubit fidelity is unaffected even if entire sequences of qubits are prepared in the same basis. This result is only valid provided that the sequences are much shorter than the total key. It is of great importance for practical quantum cryptosystems because it reduces the need for high-speed random number generation without impairing on the security against finite-size cloning attacks

  11. Pseudo-Random Sequences Generated by a Class of One-Dimensional Smooth Map

    Science.gov (United States)

    Wang, Xing-Yuan; Qin, Xue; Xie, Yi-Xin

    2011-08-01

    We extend a class of a one-dimensional smooth map. We make sure that for each desired interval of the parameter the map's Lyapunov exponent is positive. Then we propose a novel parameter perturbation method based on the good property of the extended one-dimensional smooth map. We perturb the parameter r in each iteration by the real number xi generated by the iteration. The auto-correlation function and NIST statistical test suite are taken to illustrate the method's randomness finally. We provide an application of this method in image encryption. Experiments show that the pseudo-random sequences are suitable for this application.

  12. Cardiorespiratory Kinetics Determined by Pseudo-Random Binary Sequences - Comparisons between Walking and Cycling.

    Science.gov (United States)

    Koschate, J; Drescher, U; Thieschäfer, L; Heine, O; Baum, K; Hoffmann, U

    2016-12-01

    This study aims to compare cardiorespiratory kinetics as a response to a standardised work rate protocol with pseudo-random binary sequences between cycling and walking in young healthy subjects. Muscular and pulmonary oxygen uptake (V̇O 2 ) kinetics as well as heart rate kinetics were expected to be similar for walking and cycling. Cardiac data and V̇O 2 of 23 healthy young subjects were measured in response to pseudo-random binary sequences. Kinetics were assessed applying time series analysis. Higher maxima of cross-correlation functions between work rate and the respective parameter indicate faster kinetics responses. Muscular V̇O 2 kinetics were estimated from heart rate and pulmonary V̇O 2 using a circulatory model. Muscular (walking vs. cycling [mean±SD in arbitrary units]: 0.40±0.08 vs. 0.41±0.08) and pulmonary V̇O 2 kinetics (0.35±0.06 vs. 0.35±0.06) were not different, although the time courses of the cross-correlation functions of pulmonary V̇O 2 showed unexpected biphasic responses. Heart rate kinetics (0.50±0.14 vs. 0.40±0.14; P=0.017) was faster for walking. Regarding the biphasic cross-correlation functions of pulmonary V̇O 2 during walking, the assessment of muscular V̇O 2 kinetics via pseudo-random binary sequences requires a circulatory model to account for cardio-dynamic distortions. Faster heart rate kinetics for walking should be considered by comparing results from cycle and treadmill ergometry. © Georg Thieme Verlag KG Stuttgart · New York.

  13. Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

    Science.gov (United States)

    Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

    2017-05-01

    Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.

  14. Interference Suppression Performance of Automotive UWB Radars Using Pseudo Random Sequences

    Directory of Open Access Journals (Sweden)

    I. Pasya

    2015-12-01

    Full Text Available Ultra wideband (UWB automotive radars have attracted attention from the viewpoint of reducing traffic accidents. The performance of automotive radars may be degraded by interference from nearby radars using the same frequency. In this study, a scenario where two cars pass each other on a road was considered. Considering the utilization of cross-polarization, the desired-to-undesired signal power ratio (DUR was found to vary approximately from -10 to 30 dB. Different pseudo random sequences were employed for spectrum spreading the different radar signals to mitigate the interference effects. This paper evaluates the interference suppression provided by maximum length sequence (MLS and Gold sequence (GS through numerical simulations of the radar’s performance in terms of probability of false alarm and probability of detection. It was found that MLS and GS yielded nearly the same performance when the DUR is -10 dB (worst case; for example when fixing the probability of false alarm to 0.0001, the probabilities of detection were 0.964 and 0.946 respectively. The GS are more advantageous than MLS due to larger number of different sequences having the same length in GS than in MLS.

  15. Partial summations of stationary sequences of non-Gaussian random variables

    DEFF Research Database (Denmark)

    Mohr, Gunnar; Ditlevsen, Ove Dalager

    1996-01-01

    The distribution of the sum of a finite number of identically distributed random variables is in many cases easily determined given that the variables are independent. The moments of any order of the sum can always be expressed by the moments of the single term without computational problems...... of convergence of the distribution of a sum (or an integral) of mutually dependent random variables to the Gaussian distribution. The paper is closely related to the work in Ditlevsen el al. [Ditlevsen, O., Mohr, G. & Hoffmeyer, P. Integration of non-Gaussian fields. Prob. Engng Mech 11 (1996) 15-23](2)....... lognormal variables or polynomials of standard Gaussian variables. The dependency structure is induced by specifying the autocorrelation structure of the sequence of standard Gaussian variables. Particularly useful polynomials are the Winterstein approximations that distributionally fit with non...

  16. Rhipicephalus microplus dataset of nonredundant raw sequence reads from 454 GS FLX sequencing of Cot-selected (Cot = 660) genomic DNA

    Science.gov (United States)

    A reassociation kinetics-based approach was used to reduce the complexity of genomic DNA from the Deutsch laboratory strain of the cattle tick, Rhipicephalus microplus, to facilitate genome sequencing. Selected genomic DNA (Cot value = 660) was sequenced using 454 GS FLX technology, resulting in 356...

  17. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding

    OpenAIRE

    He, Jiangfeng; Zhao, Xiaoqing; Laroche, André; Lu, Zhen-Xiang; Liu, HongKui; Li, Ziqin

    2014-01-01

    Marker-assisted selection (MAS) refers to the use of molecular markers to assist phenotypic selections in crop improvement. Several types of molecular markers, such as single nucleotide polymorphism (SNP), have been identified and effectively used in plant breeding. The application of next-generation sequencing (NGS) technologies has led to remarkable advances in whole genome sequencing, which provides ultra-throughput sequences to revolutionize plant genotyping and breeding. To further broad...

  18. How random are random numbers generated using photons?

    International Nuclear Information System (INIS)

    Solis, Aldo; Angulo Martínez, Alí M; Ramírez Alarcón, Roberto; Cruz Ramírez, Hector; U’Ren, Alfred B; Hirsch, Jorge G

    2015-01-01

    Randomness is fundamental in quantum theory, with many philosophical and practical implications. In this paper we discuss the concept of algorithmic randomness, which provides a quantitative method to assess the Borel normality of a given sequence of numbers, a necessary condition for it to be considered random. We use Borel normality as a tool to investigate the randomness of ten sequences of bits generated from the differences between detection times of photon pairs generated by spontaneous parametric downconversion. These sequences are shown to fulfil the randomness criteria without difficulties. As deviations from Borel normality for photon-generated random number sequences have been reported in previous work, a strategy to understand these diverging findings is outlined. (paper)

  19. Pseudo-Random Sequences Generated by a Class of One-Dimensional Smooth Map

    International Nuclear Information System (INIS)

    Wang Xing-Yuan; Qin Xue; Xie Yi-Xin

    2011-01-01

    We extend a class of a one-dimensional smooth map. We make sure that for each desired interval of the parameter the map's Lyapunov exponent is positive. Then we propose a novel parameter perturbation method based on the good property of the extended one-dimensional smooth map. We perturb the parameter r in each iteration by the real number x i generated by the iteration. The auto-correlation function and NIST statistical test suite are taken to illustrate the method's randomness finally. We provide an application of this method in image encryption. Experiments show that the pseudo-random sequences are suitable for this application. (general)

  20. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    Science.gov (United States)

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  1. Automation Selection and Sequencing of Traps for Vibratory Feeders

    DEFF Research Database (Denmark)

    Mathiesen, Simon; Ellekilde, Lars-Peter

    2017-01-01

    Vibratory parts feeders with mechanical orienting devices are used extensively in the assembly automation industry. Even so, the design process is based on trial-and-error approaches and is largely manual. In this paper, a methodology is presented for automatic design of this type of feeder....... The approach uses dynamic simulation for generating the necessary data for configuring a feeder with a sequence of mechanical orienting devices called traps, with the goal of reorienting all parts from a random to fixed orientation. Then, a fast algorithm for facilitating this configuration task automatically...

  2. Configurational statistics of a polymer chain with random sequence of elements

    International Nuclear Information System (INIS)

    Obukhov, S.P.

    1984-10-01

    It is shown that for a disordered polymer chain the upper critical dimension is d c =3. At d≤3 the effect of randomness increases on large scales due to the space correlations of attractive and repulsive monomers, but it can also be screened by repulsive two- or three-body interaction. The renorm group equations indicate that near the theta point it can be the large dispersion of sizes of polymers which differ only in sequences of elements. (orig.)

  3. Alignment efficiency and discomfort of three orthodontic archwire sequences: a randomized clinical trial.

    Science.gov (United States)

    Ong, Emily; Ho, Christopher; Miles, Peter

    2011-03-01

    To compare the efficiency of orthodontic archwire sequences produced by three manufacturers. Prospective, randomized clinical trial with three parallel groups. Private orthodontic practice in Caloundra, QLD, Australia. One hundred and thirty-two consecutive patients were randomized to one of three archwire sequence groups: (i) 3M Unitek, 0·014 inch Nitinol, 0·017 inch × 0·017 inch heat activated Ni-Ti; (ii) GAC international, 0·014 inch Sentalloy, 0·016 × 0·022 inch Bioforce; and (iii) Ormco corporation, 0·014 inch Damon Copper Ni-Ti, 0·014 × 0·025 inch Damon Copper Ni-Ti. All patients received 0·018 × 0·025 inch slot Victory Series™ brackets. Mandibular impressions were taken before the insertion of each archwire. Patients completed discomfort surveys according to a seven-point Likert Scale at 4 h, 24 h, 3 days and 7 days after the insertion of each archwire. Efficiency was measured by time required to reach the working archwire, mandibular anterior alignment and level of discomfort. No significant differences were found in the reduction of irregularity between the archwire sequences at any time-point (T1: P = 0·12; T2: P = 0·06; T3: P = 0·21) or in the time to reach the working archwire (P = 0·28). No significant differences were found in the overall discomfort scores between the archwire sequences (4 h: P = 0·30; 24 h: P = 0·18; 3 days: P = 0·53; 7 days: P = 0·47). When the time-points were analysed individually, the 3M Unitek archwire sequence induced significantly less discomfort than GAC and Ormco archwires 24 h after the insertion of the third archwire (P = 0·02). This could possibly be attributed to the progression in archwire material and archform. The archwire sequences were similar in alignment efficiency and overall discomfort. Progression in archwire dimension and archform may contribute to discomfort levels. This study provides clinical justification for three common archwire sequences in 0·018 × 0·025 inch slot brackets.

  4. Pulsed magnetization transfer contrast MRI by a sequence with water selective excitation

    Energy Technology Data Exchange (ETDEWEB)

    Schick, F. [Univ. of Tuebingen (Germany)

    1996-01-01

    A water selective SE imaging sequence was developed providing suitable properties for the assessment of magnetization transfer (MT) effects in tissues with considerable amounts of fat. The sequence with water selective excitation and slice selective refocusing combines the following features: The RIF exposure on the macromolecular protons is relatively low for single slice imaging without MT prepulses, since no additional pulses for fat saturation are necessary. Water selection by frequency selective excitation diminishes faults in the subtraction of images recorded with and without MT prepulses (which might arise from movements). High differences in the signal amplitudes from hyaline cartilage and muscle tissue were obtained comparing images recorded with irradiation of the series of prepulses for MT and those lacking MT prepulses. Utilizations of the described water selective approach for the assessment of MT effects in lesions of cartilage and bone are demonstrated. MT saturation was also examined in muscles with fatty degeneration of patients suffering from progressive muscular dystrophy. The described technique allows determination of MT effects with good precision in a single slice, especially in regions with dominating fat signals. 22 refs., 5 figs.

  5. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

    Directory of Open Access Journals (Sweden)

    Sheng Yang

    2015-01-01

    Full Text Available Sequencing is widely used to discover associations between microRNAs (miRNAs and diseases. However, the negative binomial distribution (NB and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF, was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96 from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  6. Sequence-selective targeting of duplex DNA by peptide nucleic acids

    DEFF Research Database (Denmark)

    Nielsen, Peter E

    2010-01-01

    Sequence-selective gene targeting constitutes an attractive drug-discovery approach for genetic therapy, with the aim of reducing or enhancing the activity of specific genes at the transcriptional level, or as part of a methodology for targeted gene repair. The pseudopeptide DNA mimic peptide...

  7. DNA-based random number generation in security circuitry.

    Science.gov (United States)

    Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

    2010-06-01

    DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.

  8. Application of Ammonium Persulfate for Selective Oxidation of Guanines for Nucleic Acid Sequencing

    Directory of Open Access Journals (Sweden)

    Yafen Wang

    2017-07-01

    Full Text Available Nucleic acids can be sequenced by a chemical procedure that partially damages the nucleotide positions at their base repetition. Many methods have been reported for the selective recognition of guanine. The accurate identification of guanine in both single and double regions of DNA and RNA remains a challenging task. Herein, we present a new, non-toxic and simple method for the selective recognition of guanine in both DNA and RNA sequences via ammonium persulfate modification. This strategy can be further successfully applied to the detection of 5-methylcytosine by using PCR.

  9. 40 CFR 761.308 - Sample selection by random number generation on any two-dimensional square grid.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 30 2010-07-01 2010-07-01 false Sample selection by random number... § 761.79(b)(3) § 761.308 Sample selection by random number generation on any two-dimensional square... area created in accordance with paragraph (a) of this section, select two random numbers: one each for...

  10. Quasi-Coherent Noise Jamming to LFM Radar Based on Pseudo-random Sequence Phase-modulation

    Directory of Open Access Journals (Sweden)

    N. Tai

    2015-12-01

    Full Text Available A novel quasi-coherent noise jamming method is proposed against linear frequency modulation (LFM signal and pulse compression radar. Based on the structure of digital radio frequency memory (DRFM, the jamming signal is acquired by the pseudo-random sequence phase-modulation of sampled radar signal. The characteristic of jamming signal in time domain and frequency domain is analyzed in detail. Results of ambiguity function indicate that the blanket jamming effect along the range direction will be formed when jamming signal passes through the matched filter. By flexible controlling the parameters of interrupted-sampling pulse and pseudo-random sequence, different covering distances and jamming effects will be achieved. When the jamming power is equivalent, this jamming obtains higher process gain compared with non-coherent jamming. The jamming signal enhances the detection threshold and the real target avoids being detected. Simulation results and circuit engineering implementation validate that the jamming signal covers real target effectively.

  11. Applications of random forest feature selection for fine-scale genetic population assignment.

    Science.gov (United States)

    Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G

    2018-02-01

    Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

  12. The sequence relay selection strategy based on stochastic dynamic programming

    Science.gov (United States)

    Zhu, Rui; Chen, Xihao; Huang, Yangchao

    2017-07-01

    Relay-assisted (RA) network with relay node selection is a kind of effective method to improve the channel capacity and convergence performance. However, most of the existing researches about the relay selection did not consider the statically channel state information and the selection cost. This shortage limited the performance and application of RA network in practical scenarios. In order to overcome this drawback, a sequence relay selection strategy (SRSS) was proposed. And the performance upper bound of SRSS was also analyzed in this paper. Furthermore, in order to make SRSS more practical, a novel threshold determination algorithm based on the stochastic dynamic program (SDP) was given to work with SRSS. Numerical results are also presented to exhibit the performance of SRSS with SDP.

  13. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity.

    Science.gov (United States)

    Mulligan, M E; Hawley, D K; Entriken, R; McClure, W R

    1984-01-11

    We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KBk2, the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of +/-4.1 in KBk2 over a range of 10(4) in the same parameter. The quantitative evaluation was linked to an automated (Apple II) procedure for searching and evaluating possible promoters in DNA sequence files.

  14. Interference-aware random beam selection for spectrum sharing systems

    KAUST Repository

    Abdallah, Mohamed M.

    2012-09-01

    Spectrum sharing systems have been introduced to alleviate the problem of spectrum scarcity by allowing secondary unlicensed networks to share the spectrum with primary licensed networks under acceptable interference levels to the primary users. In this paper, we develop interference-aware random beam selection schemes that provide enhanced throughput for the secondary link under the condition that the interference observed at the primary link is within a predetermined acceptable value. For a secondary transmitter equipped with multiple antennas, our schemes select a random beam, among a set of power- optimized orthogonal random beams, that maximizes the capacity of the secondary link while satisfying the interference constraint at the primary receiver for different levels of feedback information describing the interference level at the primary receiver. For the proposed schemes, we develop a statistical analysis for the signal-to-noise and interference ratio (SINR) statistics as well as the capacity of the secondary link. Finally, we present numerical results that study the effect of system parameters including number of beams and the maximum transmission power on the capacity of the secondary link attained using the proposed schemes. © 2012 IEEE.

  15. Genotyping by sequencing (GBS, an ultimate marker-assisted selection (MAS tool to accelerate plant breeding

    Directory of Open Access Journals (Sweden)

    Jiangfeng eHe

    2014-09-01

    Full Text Available Marker-assisted selection (MAS refers to the use of molecular markers to assist phenotypic selections in crop improvement. Several types of molecular markers, such as single nucleotide polymorphism (SNP, have been identified and effectively used in plant breeding. The application of next-generation sequencing (NGS technologies has led to remarkable advances in whole genome sequencing, which provides ultra-throughput sequences to revolutionize plant genotyping and breeding. To further broaden NGS usages to large crop genomes such as maize and wheat, genotyping by sequencing (GBS has been developed and applied in sequencing multiplexed samples that combine molecular marker discovery and genotyping. GBS is a novel application of NGS protocols for discovering and genotyping SNPs in crop genomes and populations. The GBS approach includes the digestion of genomic DNA with restriction enzymes followed by the ligation of barcode adapter, PCR amplification and sequencing of the amplified DNA pool on a single lane of flow cells. Bioinformatic pipelines are needed to analyze and interpret GBS datasets. As an ultimate MAS tool and a cost-effective technique, GBS has been successfully used in implementing genome-wide association study (GWAS, genomic diversity study, genetic linkage analysis, molecular marker discovery and genomic selection (GS under a large scale of plant breeding programs.

  16. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding.

    Science.gov (United States)

    He, Jiangfeng; Zhao, Xiaoqing; Laroche, André; Lu, Zhen-Xiang; Liu, HongKui; Li, Ziqin

    2014-01-01

    Marker-assisted selection (MAS) refers to the use of molecular markers to assist phenotypic selections in crop improvement. Several types of molecular markers, such as single nucleotide polymorphism (SNP), have been identified and effectively used in plant breeding. The application of next-generation sequencing (NGS) technologies has led to remarkable advances in whole genome sequencing, which provides ultra-throughput sequences to revolutionize plant genotyping and breeding. To further broaden NGS usages to large crop genomes such as maize and wheat, genotyping-by-sequencing (GBS) has been developed and applied in sequencing multiplexed samples that combine molecular marker discovery and genotyping. GBS is a novel application of NGS protocols for discovering and genotyping SNPs in crop genomes and populations. The GBS approach includes the digestion of genomic DNA with restriction enzymes followed by the ligation of barcode adapter, PCR amplification and sequencing of the amplified DNA pool on a single lane of flow cells. Bioinformatic pipelines are needed to analyze and interpret GBS datasets. As an ultimate MAS tool and a cost-effective technique, GBS has been successfully used in implementing genome-wide association study (GWAS), genomic diversity study, genetic linkage analysis, molecular marker discovery and genomic selection under a large scale of plant breeding programs.

  17. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    Directory of Open Access Journals (Sweden)

    von Reumont Björn M

    2010-03-01

    Full Text Available Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment

  18. Interference-aware random beam selection for spectrum sharing systems

    KAUST Repository

    Abdallah, Mohamed M.; Sayed, Mostafa M.; Alouini, Mohamed-Slim; Qaraqe, Khalid A.

    2012-01-01

    . In this paper, we develop interference-aware random beam selection schemes that provide enhanced throughput for the secondary link under the condition that the interference observed at the primary link is within a predetermined acceptable value. For a secondary

  19. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication.

    Directory of Open Access Journals (Sweden)

    Andreas Martin Lisewski

    2008-09-01

    Full Text Available The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions and in structure (structural defects trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a sensitive to random errors and (b restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.

  20. Sequence- vs. chip-assisted genomic selection: accurate biological information is advised.

    Science.gov (United States)

    Pérez-Enciso, Miguel; Rincón, Juan C; Legarra, Andrés

    2015-05-09

    The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined 'gene' regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high

  1. Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space

    Science.gov (United States)

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.

    2013-01-01

    For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960

  2. Combinatorial pooling enables selective sequencing of the barley gene space.

    Directory of Open Access Journals (Sweden)

    Stefano Lonardi

    2013-04-01

    Full Text Available For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  3. Combinatorial pooling enables selective sequencing of the barley gene space.

    Science.gov (United States)

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J

    2013-04-01

    For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  4. High-Tg Polynorbornene-Based Block and Random Copolymers for Butanol Pervaporation Membranes

    Science.gov (United States)

    Register, Richard A.; Kim, Dong-Gyun; Takigawa, Tamami; Kashino, Tomomasa; Burtovyy, Oleksandr; Bell, Andrew

    Vinyl addition polymers of substituted norbornene (NB) monomers possess desirably high glass transition temperatures (Tg); however, until very recently, the lack of an applicable living polymerization chemistry has precluded the synthesis of such polymers with controlled architecture, or copolymers with controlled sequence distribution. We have recently synthesized block and random copolymers of NB monomers bearing hydroxyhexafluoroisopropyl and n-butyl substituents (HFANB and BuNB) via living vinyl addition polymerization with Pd-based catalysts. Both series of polymers were cast into the selective skin layers of thin film composite (TFC) membranes, and these organophilic membranes investigated for the isolation of n-butanol from dilute aqueous solution (model fermentation broth) via pervaporation. The block copolymers show well-defined microphase-separated morphologies, both in bulk and as the selective skin layers on TFC membranes, while the random copolymers are homogeneous. Both block and random vinyl addition copolymers are effective as n-butanol pervaporation membranes, with the block copolymers showing a better flux-selectivity balance. While polyHFANB has much higher permeability and n-butanol selectivity than polyBuNB, incorporating BuNB units into the polymer (in either a block or random sequence) limits the swelling of the polyHFANB and thereby improves the n-butanol pervaporation selectivity.

  5. Random number generation and creativity.

    Science.gov (United States)

    Bains, William

    2008-01-01

    A previous paper suggested that humans can generate genuinely random numbers. I tested this hypothesis by repeating the experiment with a larger number of highly numerate subjects, asking them to call out a sequence of digits selected from 0 through 9. The resulting sequences were substantially non-random, with an excess of sequential pairs of numbers and a deficit of repeats of the same number, in line with previous literature. However, the previous literature suggests that humans generate random numbers with substantial conscious effort, and distractions which reduce that effort reduce the randomness of the numbers. I reduced my subjects' concentration by asking them to call out in another language, and with alcohol - neither affected the randomness of their responses. This suggests that the ability to generate random numbers is a 'basic' function of the human mind, even if those numbers are not mathematically 'random'. I hypothesise that there is a 'creativity' mechanism, while not truly random, provides novelty as part of the mind's defence against closed programming loops, and that testing for the effects seen here in people more or less familiar with numbers or with spontaneous creativity could identify more features of this process. It is possible that training to perform better at simple random generation tasks could help to increase creativity, through training people to reduce the conscious mind's suppression of the 'spontaneous', creative response to new questions.

  6. Selection, Recombination and History in a Parasitic Flatworm (Echinococcus Inferred from Nucleotide Sequences

    Directory of Open Access Journals (Sweden)

    Haag KL

    1998-01-01

    Full Text Available Three species of flatworms from the genus Echinococcus (E. granulosus, E. multilocularis and E. vogeli and four strains of E. granulosus (cattle, horse, pig and sheep strains were analysed by the PCR-SSCP method followed by sequencing, using as targets two non-coding and two coding (one nuclear and one mitochondrial genomic regions. The sequencing data was used to evaluate hypothesis about the parasite breeding system and the causes of genetic diversification. The calculated recombination parameters suggested that cross-fertilisation was rare in the history of the group. However, the relative rates of substitution in the coding sequences showed that positive selection (instead of purifying selection drove the evolution of an elastase and neutrophil chemotaxis inhibitor gene (AgB/1. The phylogenetic analyses revealed several ambiguities, indicating that the taxonomic status of the E. granulosus horse strain should be revised

  7. Conversion of the random amplified polymorphic DNA (RAPD ...

    African Journals Online (AJOL)

    Conversion of the random amplified polymorphic DNA (RAPD) marker UBC#116 linked to Fusarium crown and root rot resistance gene (Frl) into a co-dominant sequence characterized amplified region (SCAR) marker for marker-assisted selection of tomato.

  8. The signature of positive selection at randomly chosen loci.

    OpenAIRE

    Przeworski, Molly

    2002-01-01

    In Drosophila and humans, there are accumulating examples of loci with a significant excess of high-frequency-derived alleles or high levels of linkage disequilibrium, relative to a neutral model of a random-mating population of constant size. These are features expected after a recent selective sweep. Their prevalence suggests that positive directional selection may be widespread in both species. However, as I show here, these features do not persist long after the sweep ends: The high-frequ...

  9. Simulated Performance Evaluation of a Selective Tracker Through Random Scenario Generation

    DEFF Research Database (Denmark)

    Hussain, Dil Muhammad Akbar

    2006-01-01

    performance assessment. Therefore, a random target motion scenario is adopted. Its implementation in particular for testing the proposed selective track splitting algorithm using Kalman filters is investigated through a number of performance parameters which gives the activity profile of the tracking scenario......  The paper presents a simulation study on the performance of a target tracker using selective track splitting filter algorithm through a random scenario implemented on a digital signal processor.  In a typical track splitting filter all the observation which fall inside a likelihood ellipse...... are used for update, however, in our proposed selective track splitting filter less number of observations are used for track update.  Much of the previous performance work [1] has been done on specific (deterministic) scenarios. One of the reasons for considering the specific scenarios, which were...

  10. Minimization over randomly selected lines

    Directory of Open Access Journals (Sweden)

    Ismet Sahin

    2013-07-01

    Full Text Available This paper presents a population-based evolutionary optimization method for minimizing a given cost function. The mutation operator of this method selects randomly oriented lines in the cost function domain, constructs quadratic functions interpolating the cost function at three different points over each line, and uses extrema of the quadratics as mutated points. The crossover operator modifies each mutated point based on components of two points in population, instead of one point as is usually performed in other evolutionary algorithms. The stopping criterion of this method depends on the number of almost degenerate quadratics. We demonstrate that the proposed method with these mutation and crossover operations achieves faster and more robust convergence than the well-known Differential Evolution and Particle Swarm algorithms.

  11. Experimental Rugged Fitness Landscape in Protein Sequence Space

    Science.gov (United States)

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12–130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7×104-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18–24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region. PMID:17183728

  12. Experimental rugged fitness landscape in protein sequence space.

    Science.gov (United States)

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-12-20

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4)-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  13. Experimental rugged fitness landscape in protein sequence space.

    Directory of Open Access Journals (Sweden)

    Yuuki Hayashi

    Full Text Available The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1 the dependence of stationary fitness on library size, which increased gradually, and (2 the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  14. Selection for altruism through random drift in variable size populations

    Directory of Open Access Journals (Sweden)

    Houchmandzadeh Bahram

    2012-05-01

    Full Text Available Abstract Background Altruistic behavior is defined as helping others at a cost to oneself and a lowered fitness. The lower fitness implies that altruists should be selected against, which is in contradiction with their widespread presence is nature. Present models of selection for altruism (kin or multilevel show that altruistic behaviors can have ‘hidden’ advantages if the ‘common good’ produced by altruists is restricted to some related or unrelated groups. These models are mostly deterministic, or assume a frequency dependent fitness. Results Evolutionary dynamics is a competition between deterministic selection pressure and stochastic events due to random sampling from one generation to the next. We show here that an altruistic allele extending the carrying capacity of the habitat can win by increasing the random drift of “selfish” alleles. In other terms, the fixation probability of altruistic genes can be higher than those of a selfish ones, even though altruists have a smaller fitness. Moreover when populations are geographically structured, the altruists advantage can be highly amplified and the fixation probability of selfish genes can tend toward zero. The above results are obtained both by numerical and analytical calculations. Analytical results are obtained in the limit of large populations. Conclusions The theory we present does not involve kin or multilevel selection, but is based on the existence of random drift in variable size populations. The model is a generalization of the original Fisher-Wright and Moran models where the carrying capacity depends on the number of altruists.

  15. Selection bias and subject refusal in a cluster-randomized controlled trial

    Directory of Open Access Journals (Sweden)

    Rochelle Yang

    2017-07-01

    Full Text Available Abstract Background Selection bias and non-participation bias are major methodological concerns which impact external validity. Cluster-randomized controlled trials are especially prone to selection bias as it is impractical to blind clusters to their allocation into intervention or control. This study assessed the impact of selection bias in a large cluster-randomized controlled trial. Methods The Improved Cardiovascular Risk Reduction to Enhance Rural Primary Care (ICARE study examined the impact of a remote pharmacist-led intervention in twelve medical offices. To assess eligibility, a standardized form containing patient demographics and medical information was completed for each screened patient. Eligible patients were approached by the study coordinator for recruitment. Both the study coordinator and the patient were aware of the site’s allocation prior to consent. Patients who consented or declined to participate were compared across control and intervention arms for differing characteristics. Statistical significance was determined using a two-tailed, equal variance t-test and a chi-square test with adjusted Bonferroni p-values. Results were adjusted for random cluster variation. Results There were 2749 completed screening forms returned to research staff with 461 subjects who had either consented or declined participation. Patients with poorly controlled diabetes were found to be significantly more likely to decline participation in intervention sites compared to those in control sites. A higher mean diastolic blood pressure was seen in patients with uncontrolled hypertension who declined in the control sites compared to those who declined in the intervention sites. However, these findings were no longer significant after adjustment for random variation among the sites. After this adjustment, females were now found to be significantly more likely to consent than males (odds ratio = 1.41; 95% confidence interval = 1.03, 1

  16. Detection of optic nerve lesions in optic neuritis using frequency-selective fat-saturation sequences

    International Nuclear Information System (INIS)

    Miller, D.H.; MacManus, D.G.; Bartlett, P.A.; Kapoor, R.; Morrissey, S.P.; Moseley, I.F.

    1993-01-01

    MRI was performed on seven patients with acute optic neuritis, using two sequences which suppress the signal from orbital fat: frequency-selective fat-saturation and inversion recovery with a short inversion time. Lesions were seen on both sequences in all the symptomatic optic nerves studied. (orig.)

  17. Evaluation and Selection of Best Priority Sequencing Rule in Job Shop Scheduling using Hybrid MCDM Technique

    Science.gov (United States)

    Kiran Kumar, Kalla; Nagaraju, Dega; Gayathri, S.; Narayanan, S.

    2017-05-01

    Priority Sequencing Rules provide the guidance for the order in which the jobs are to be processed at a workstation. The application of different priority rules in job shop scheduling gives different order of scheduling. More experimentation needs to be conducted before a final choice is made to know the best priority sequencing rule. Hence, a comprehensive method of selecting the right choice is essential in managerial decision making perspective. This paper considers seven different priority sequencing rules in job shop scheduling. For evaluation and selection of the best priority sequencing rule, a set of eight criteria are considered. The aim of this work is to demonstrate the methodology of evaluating and selecting the best priority sequencing rule by using hybrid multi criteria decision making technique (MCDM), i.e., analytical hierarchy process (AHP) with technique for order preference by similarity to ideal solution (TOPSIS). The criteria weights are calculated by using AHP whereas the relative closeness values of all priority sequencing rules are computed based on TOPSIS with the help of data acquired from the shop floor of a manufacturing firm. Finally, from the findings of this work, the priority sequencing rules are ranked from most important to least important. The comprehensive methodology presented in this paper is very much essential for the management of a workstation to choose the best priority sequencing rule among the available alternatives for processing the jobs with maximum benefit.

  18. Natural selection and algorithmic design of mRNA.

    Science.gov (United States)

    Cohen, Barry; Skiena, Steven

    2003-01-01

    Messenger RNA (mRNA) sequences serve as templates for proteins according to the triplet code, in which each of the 4(3) = 64 different codons (sequences of three consecutive nucleotide bases) in RNA either terminate transcription or map to one of the 20 different amino acids (or residues) which build up proteins. Because there are more codons than residues, there is inherent redundancy in the coding. Certain residues (e.g., tryptophan) have only a single corresponding codon, while other residues (e.g., arginine) have as many as six corresponding codons. This freedom implies that the number of possible RNA sequences coding for a given protein grows exponentially in the length of the protein. Thus nature has wide latitude to select among mRNA sequences which are informationally equivalent, but structurally and energetically divergent. In this paper, we explore how nature takes advantage of this freedom and how to algorithmically design structures more energetically favorable than have been built through natural selection. In particular: (1) Natural Selection--we perform the first large-scale computational experiment comparing the stability of mRNA sequences from a variety of organisms to random synonymous sequences which respect the codon preferences of the organism. This experiment was conducted on over 27,000 sequences from 34 microbial species with 36 genomic structures. We provide evidence that in all genomic structures highly stable sequences are disproportionately abundant, and in 19 of 36 cases highly unstable sequences are disproportionately abundant. This suggests that the stability of mRNA sequences is subject to natural selection. (2) Artificial Selection--motivated by these biological results, we examine the algorithmic problem of designing the most stable and unstable mRNA sequences which code for a target protein. We give a polynomial-time dynamic programming solution to the most stable sequence problem (MSSP), which is asymptotically no more complex

  19. Prediction of plant promoters based on hexamers and random triplet pair analysis

    Directory of Open Access Journals (Sweden)

    Noman Nasimul

    2011-06-01

    Full Text Available Abstract Background With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction accuracy of these still requires further improvement. The limitations of these methods can be addressed by selecting appropriate features for distinguishing promoters and non-promoters. Methods In this study, we proposed two feature selection approaches based on hexamer sequences: the Frequency Distribution Analyzed Feature Selection Algorithm (FDAFSA and the Random Triplet Pair Feature Selecting Genetic Algorithm (RTPFSGA. In FDAFSA, adjacent triplet-pairs (hexamer sequences were selected based on the difference in the frequency of hexamers between promoters and non-promoters. In RTPFSGA, random triplet-pairs (RTPs were selected by exploiting a genetic algorithm that distinguishes frequencies of non-adjacent triplet pairs between promoters and non-promoters. Then, a support vector machine (SVM, a nonlinear machine-learning algorithm, was used to classify promoters and non-promoters by combining these two feature selection approaches. We referred to this novel algorithm as PromoBot. Results Promoter sequences were collected from the PlantProm database. Non-promoter sequences were collected from plant mRNA, rRNA, and tRNA of PlantGDB and plant miRNA of miRBase. Then, in order to validate the proposed algorithm, we applied a 5-fold cross validation test. Training data sets were used to select features based on FDAFSA and RTPFSGA, and these features were used to train the SVM. We achieved 89% sensitivity and 86% specificity. Conclusions We compared our PromoBot algorithm to five other algorithms. It was found that the sensitivity and specificity of PromoBot performed well (or even better with the algorithms tested. These results show that the two proposed feature selection methods based on hexamer frequencies

  20. Interference-aware random beam selection schemes for spectrum sharing systems

    KAUST Repository

    Abdallah, Mohamed; Qaraqe, Khalid; Alouini, Mohamed-Slim

    2012-01-01

    users. In this work, we develop interference-aware random beam selection schemes that provide enhanced performance for the secondary network under the condition that the interference observed by the receivers of the primary network is below a

  1. A Novel Motion Compensation Method for Random Stepped Frequency Radar with M-sequence

    Science.gov (United States)

    Liao, Zhikun; Hu, Jiemin; Lu, Dawei; Zhang, Jun

    2018-01-01

    The random stepped frequency radar is a new kind of synthetic wideband radar. In the research, it has been found that it possesses a thumbtack-like ambiguity function which is considered to be the ideal one. This also means that only a precise motion compensation could result in the correct high resolution range profile. In this paper, we will introduce the random stepped frequency radar coded by M-sequence firstly and briefly analyse the effect of relative motion between target and radar on the distance imaging, which is called defocusing problem. Then, a novel motion compensation method, named complementary code cancellation, will be put forward to solve this problem. Finally, the simulated experiments will demonstrate its validity and the computational analysis will show up its efficiency.

  2. Transcriptional blockages in a cell-free system by sequence-selective DNA alkylating agents.

    Science.gov (United States)

    Ferguson, L R; Liu, A P; Denny, W A; Cullinane, C; Talarico, T; Phillips, D R

    2000-04-14

    There is considerable interest in DNA sequence-selective DNA-binding drugs as potential inhibitors of gene expression. Five compounds with distinctly different base pair specificities were compared in their effects on the formation and elongation of the transcription complex from the lac UV5 promoter in a cell-free system. All were tested at drug levels which killed 90% of cells in a clonogenic survival assay. Cisplatin, a selective alkylator at purine residues, inhibited transcription, decreasing the full-length transcript, and causing blockage at a number of GG or AG sequences, making it probable that intrastrand crosslinks are the blocking lesions. A cyclopropylindoline known to be an A-specific alkylator also inhibited transcription, with blocks at adenines. The aniline mustard chlorambucil, that targets primarily G but also A sequences, was also effective in blocking the formation of full-length transcripts. It produced transcription blocks either at, or one base prior to, AA or GG sequences, suggesting that intrastrand crosslinks could again be involved. The non-alkylating DNA minor groove binder Hoechst 33342 (a bisbenzimidazole) blocked formation of the full-length transcript, but without creating specific blockage sites. A bisbenzimidazole-linked aniline mustard analogue was a more effective transcription inhibitor than either chlorambucil or Hoechst 33342, with different blockage sites occurring immediately as compared with 2 h after incubation. The blockages were either immediately prior to AA or GG residues, or four to five base pairs prior to such sites, a pattern not predicted from in vitro DNA-binding studies. Minor groove DNA-binding ligands are of particular interest as inhibitors of gene expression, since they have the potential ability to bind selectively to long sequences of DNA. The results suggest that the bisbenzimidazole-linked mustard does cause alkylation and transcription blockage at novel DNA sites. in addition to sites characteristic of

  3. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events.

    Directory of Open Access Journals (Sweden)

    Stéphane Buhler

    2011-02-01

    Full Text Available Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies.Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model. However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used

  4. Growth rate for the expected value of a generalized random Fibonacci sequence

    International Nuclear Information System (INIS)

    Janvresse, Elise; De la Rue, Thierry; Rittaud, BenoIt

    2009-01-01

    We study the behaviour of generalized random Fibonacci sequences defined by the relation g n = |λg n-1 ± g n-2 |, where the ± sign is given by tossing an unbalanced coin, giving probability p to the + sign. We prove that the expected value of g n grows exponentially fast for any 0 (2 - λ)/4 when λ is of the form 2cos(π/k) for some fixed integer k ≥ 3. In both cases, we give an algebraic expression for the growth rate

  5. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  6. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  7. Continuous-Time Mean-Variance Portfolio Selection with Random Horizon

    International Nuclear Information System (INIS)

    Yu, Zhiyong

    2013-01-01

    This paper examines the continuous-time mean-variance optimal portfolio selection problem with random market parameters and random time horizon. Treating this problem as a linearly constrained stochastic linear-quadratic optimal control problem, I explicitly derive the efficient portfolios and efficient frontier in closed forms based on the solutions of two backward stochastic differential equations. Some related issues such as a minimum variance portfolio and a mutual fund theorem are also addressed. All the results are markedly different from those in the problem with deterministic exit time. A key part of my analysis involves proving the global solvability of a stochastic Riccati equation, which is interesting in its own right

  8. Continuous-Time Mean-Variance Portfolio Selection with Random Horizon

    Energy Technology Data Exchange (ETDEWEB)

    Yu, Zhiyong, E-mail: yuzhiyong@sdu.edu.cn [Shandong University, School of Mathematics (China)

    2013-12-15

    This paper examines the continuous-time mean-variance optimal portfolio selection problem with random market parameters and random time horizon. Treating this problem as a linearly constrained stochastic linear-quadratic optimal control problem, I explicitly derive the efficient portfolios and efficient frontier in closed forms based on the solutions of two backward stochastic differential equations. Some related issues such as a minimum variance portfolio and a mutual fund theorem are also addressed. All the results are markedly different from those in the problem with deterministic exit time. A key part of my analysis involves proving the global solvability of a stochastic Riccati equation, which is interesting in its own right.

  9. Pseudo-random number generator based on asymptotic deterministic randomness

    Science.gov (United States)

    Wang, Kai; Pei, Wenjiang; Xia, Haishan; Cheung, Yiu-ming

    2008-06-01

    A novel approach to generate the pseudorandom-bit sequence from the asymptotic deterministic randomness system is proposed in this Letter. We study the characteristic of multi-value correspondence of the asymptotic deterministic randomness constructed by the piecewise linear map and the noninvertible nonlinearity transform, and then give the discretized systems in the finite digitized state space. The statistic characteristics of the asymptotic deterministic randomness are investigated numerically, such as stationary probability density function and random-like behavior. Furthermore, we analyze the dynamics of the symbolic sequence. Both theoretical and experimental results show that the symbolic sequence of the asymptotic deterministic randomness possesses very good cryptographic properties, which improve the security of chaos based PRBGs and increase the resistance against entropy attacks and symbolic dynamics attacks.

  10. Pseudo-random number generator based on asymptotic deterministic randomness

    International Nuclear Information System (INIS)

    Wang Kai; Pei Wenjiang; Xia Haishan; Cheung Yiuming

    2008-01-01

    A novel approach to generate the pseudorandom-bit sequence from the asymptotic deterministic randomness system is proposed in this Letter. We study the characteristic of multi-value correspondence of the asymptotic deterministic randomness constructed by the piecewise linear map and the noninvertible nonlinearity transform, and then give the discretized systems in the finite digitized state space. The statistic characteristics of the asymptotic deterministic randomness are investigated numerically, such as stationary probability density function and random-like behavior. Furthermore, we analyze the dynamics of the symbolic sequence. Both theoretical and experimental results show that the symbolic sequence of the asymptotic deterministic randomness possesses very good cryptographic properties, which improve the security of chaos based PRBGs and increase the resistance against entropy attacks and symbolic dynamics attacks

  11. Augmented brain function by coordinated reset stimulation with slowly varying sequences

    Directory of Open Access Journals (Sweden)

    Magteld eZeitler

    2015-03-01

    Full Text Available Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e. an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS. In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e. CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  12. Augmented brain function by coordinated reset stimulation with slowly varying sequences.

    Science.gov (United States)

    Zeitler, Magteld; Tass, Peter A

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  13. TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD

    Directory of Open Access Journals (Sweden)

    A. Shamsoddini

    2017-09-01

    Full Text Available Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  14. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    Science.gov (United States)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  15. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi; Kandaswamy, Krishna Kumar Umar; Chou -, Kuochen; Vivekanandan, Saravanan; Kolatkar, Prasanna R.

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  16. Hebbian Learning in a Random Network Captures Selectivity Properties of the Prefrontal Cortex

    Science.gov (United States)

    Lindsay, Grace W.

    2017-01-01

    Complex cognitive behaviors, such as context-switching and rule-following, are thought to be supported by the prefrontal cortex (PFC). Neural activity in the PFC must thus be specialized to specific tasks while retaining flexibility. Nonlinear “mixed” selectivity is an important neurophysiological trait for enabling complex and context-dependent behaviors. Here we investigate (1) the extent to which the PFC exhibits computationally relevant properties, such as mixed selectivity, and (2) how such properties could arise via circuit mechanisms. We show that PFC cells recorded from male and female rhesus macaques during a complex task show a moderate level of specialization and structure that is not replicated by a model wherein cells receive random feedforward inputs. While random connectivity can be effective at generating mixed selectivity, the data show significantly more mixed selectivity than predicted by a model with otherwise matched parameters. A simple Hebbian learning rule applied to the random connectivity, however, increases mixed selectivity and enables the model to match the data more accurately. To explain how learning achieves this, we provide analysis along with a clear geometric interpretation of the impact of learning on selectivity. After learning, the model also matches the data on measures of noise, response density, clustering, and the distribution of selectivities. Of two styles of Hebbian learning tested, the simpler and more biologically plausible option better matches the data. These modeling results provide clues about how neural properties important for cognition can arise in a circuit and make clear experimental predictions regarding how various measures of selectivity would evolve during animal training. SIGNIFICANCE STATEMENT The prefrontal cortex is a brain region believed to support the ability of animals to engage in complex behavior. How neurons in this area respond to stimuli—and in particular, to combinations of stimuli (

  17. Performance Evaluation of User Selection Protocols in Random Networks with Energy Harvesting and Hardware Impairments

    Directory of Open Access Journals (Sweden)

    Tan Nhat Nguyen

    2016-01-01

    Full Text Available In this paper, we evaluate performances of various user selection protocols under impact of hardware impairments. In the considered protocols, a Base Station (BS selects one of available Users (US to serve, while the remaining USs harvest the energy from the Radio Frequency (RF transmitted by the BS. We assume that all of the US randomly appear around the BS. In the Random Selection Protocol (RAN, the BS randomly selects a US to transmit the data. In the second proposed protocol, named Minimum Distance Protocol (MIND, the US that is nearest to the BS will be chosen. In the Optimal Selection Protocol (OPT, the US providing the highest channel gain between itself and the BS will be served. For performance evaluation, we derive exact and asymptotic closed-form expressions of average Outage Probability (OP over Rayleigh fading channels. We also consider average harvested energy per a US. Finally, Monte-Carlo simulations are then performed to verify the theoretical results.

  18. Genetic selection and DNA sequences of 4.5S RNA homologs

    DEFF Research Database (Denmark)

    Brown, S; Thon, G; Tolentino, E

    1989-01-01

    A general strategy for cloning the functional homologs of an Escherichia coli gene was used to clone homologs of 4.5S RNA from other bacteria. The genes encoding these homologs were selected by their ability to complement a deletion of the gene for 4.5S RNA. DNA sequences of the regions encoding...

  19. Neutral evolution of proteins: The superfunnel in sequence space and its relation to mutational robustness

    Science.gov (United States)

    Noirel, Josselin; Simonson, Thomas

    2008-11-01

    Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate μ and the population size N, the biological population can evolve purely randomly (μN ≪1) or it can evolve in such a way as to select for sequences of higher mutational robustness (μN ≫1). The stringency of the selection depends not only on the product μN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular

  20. Novel Zn2+-chelating peptides selected from a fimbria-displayed random peptide library

    DEFF Research Database (Denmark)

    Kjærgaard, Kristian; Schembri, Mark; Klemm, Per

    2001-01-01

    The display of peptide sequences on the surface of bacteria is a technology that offers exciting applications in biotechnology and medical research. Type 1 fimbriae are surface organelles of Escherichia coli which mediate D-mannose-sensitive binding to different host surfaces by virtue of the Fim......H adhesin. FimH is a component of the fimbrial organelle that can accommodate and display a diverse range of peptide sequences on the E. coli cell surface. In this study we have constructed a random peptide library in FimH. The library, consisting of similar to 40 million individual clones, was screened...

  1. A new phase modulated binomial-like selective-inversion sequence for solvent signal suppression in NMR.

    Science.gov (United States)

    Chen, Johnny; Zheng, Gang; Price, William S

    2017-02-01

    A new 8-pulse Phase Modulated binomial-like selective inversion pulse sequence, dubbed '8PM', was developed by optimizing the nutation and phase angles of the constituent radio-frequency pulses so that the inversion profile resembled a target profile. Suppression profiles were obtained for both the 8PM and W5 based excitation sculpting sequences with equal inter-pulse delays. Significant distortions were observed in both profiles because of the offset effect of the radio frequency pulses. These distortions were successfully reduced by adjusting the inter-pulse delays. With adjusted inter-pulse delays, the 8PM and W5 based excitation sculpting sequences were tested on an aqueous lysozyme solution. The 8 PM based sequence provided higher suppression selectivity than the W5 based sequence. Two-dimensional nuclear Overhauser effect spectroscopy experiments were also performed on the lysozyme sample with 8PM and W5 based water signal suppression. The 8PM based suppression provided a spectrum with significantly increased (~ doubled) cross-peak intensity around the suppressed water resonance compared to the W5 based suppression. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  2. The reliability of randomly selected final year pharmacy students in ...

    African Journals Online (AJOL)

    Employing ANOVA, factorial experimental analysis, and the theory of error, reliability studies were conducted on the assessment of the drug product chloroquine phosphate tablets. The G–Study employed equal numbers of the factors for uniform control, and involved three analysts (randomly selected final year Pharmacy ...

  3. Positive Selection or Free to Vary? Assessing the Functional Significance of Sequence Change Using Molecular Dynamics.

    Directory of Open Access Journals (Sweden)

    Jane R Allison

    Full Text Available Evolutionary arms races between pathogens and their hosts may be manifested as selection for rapid evolutionary change of key genes, and are sometimes detectable through sequence-level analyses. In the case of protein-coding genes, such analyses frequently predict that specific codons are under positive selection. However, detecting positive selection can be non-trivial, and false positive predictions are a common concern in such analyses. It is therefore helpful to place such predictions within a structural and functional context. Here, we focus on the p19 protein from tombusviruses. P19 is a homodimer that sequesters siRNAs, thereby preventing the host RNAi machinery from shutting down viral infection. Sequence analysis of the p19 gene is complicated by the fact that it is constrained at the sequence level by overprinting of a viral movement protein gene. Using homology modeling, in silico mutation and molecular dynamics simulations, we assess how non-synonymous changes to two residues involved in forming the dimer interface-one invariant, and one predicted to be under positive selection-impact molecular function. Interestingly, we find that both observed variation and potential variation (where a non-synonymous change to p19 would be synonymous for the overprinted movement protein does not significantly impact protein structure or RNA binding. Consequently, while several methods identify residues at the dimer interface as being under positive selection, MD results suggest they are functionally indistinguishable from a site that is free to vary. Our analyses serve as a caveat to using sequence-level analyses in isolation to detect and assess positive selection, and emphasize the importance of also accounting for how non-synonymous changes impact structure and function.

  4. Wide brick tunnel randomization - an unequal allocation procedure that limits the imbalance in treatment totals.

    Science.gov (United States)

    Kuznetsova, Olga M; Tymofyeyev, Yevgen

    2014-04-30

    In open-label studies, partial predictability of permuted block randomization provides potential for selection bias. To lessen the selection bias in two-arm studies with equal allocation, a number of allocation procedures that limit the imbalance in treatment totals at a pre-specified level but do not require the exact balance at the ends of the blocks were developed. In studies with unequal allocation, however, the task of designing a randomization procedure that sets a pre-specified limit on imbalance in group totals is not resolved. Existing allocation procedures either do not preserve the allocation ratio at every allocation or do not include all allocation sequences that comply with the pre-specified imbalance threshold. Kuznetsova and Tymofyeyev described the brick tunnel randomization for studies with unequal allocation that preserves the allocation ratio at every step and, in the two-arm case, includes all sequences that satisfy the smallest possible imbalance threshold. This article introduces wide brick tunnel randomization for studies with unequal allocation that allows all allocation sequences with imbalance not exceeding any pre-specified threshold while preserving the allocation ratio at every step. In open-label studies, allowing a larger imbalance in treatment totals lowers selection bias because of the predictability of treatment assignments. The applications of the technique in two-arm and multi-arm open-label studies with unequal allocation are described. Copyright © 2013 John Wiley & Sons, Ltd.

  5. Random selection of items. Selection of n1 samples among N items composing a stratum

    International Nuclear Information System (INIS)

    Jaech, J.L.; Lemaire, R.J.

    1987-02-01

    STR-224 provides generalized procedures to determine required sample sizes, for instance in the course of a Physical Inventory Verification at Bulk Handling Facilities. The present report describes procedures to generate random numbers and select groups of items to be verified in a given stratum through each of the measurement methods involved in the verification. (author). 3 refs

  6. Expectation violations in sensorimotor sequences: shifting from LTM-based attentional selection to visual search.

    Science.gov (United States)

    Foerster, Rebecca M; Schneider, Werner X

    2015-03-01

    Long-term memory (LTM) delivers important control signals for attentional selection. LTM expectations have an important role in guiding the task-driven sequence of covert attention and gaze shifts, especially in well-practiced multistep sensorimotor actions. What happens when LTM expectations are disconfirmed? Does a sensory-based visual-search mode of attentional selection replace the LTM-based mode? What happens when prior LTM expectations become valid again? We investigated these questions in a computerized version of the number-connection test. Participants clicked on spatially distributed numbered shapes in ascending order while gaze was recorded. Sixty trials were performed with a constant spatial arrangement. In 20 consecutive trials, either numbers, shapes, both, or no features switched position. In 20 reversion trials, participants worked on the original arrangement. Only the sequence-affecting number switches elicited slower clicking, visual search-like scanning, and lower eye-hand synchrony. The effects were neither limited to the exchanged numbers nor to the corresponding actions. Thus, expectation violations in a well-learned sensorimotor sequence cause a regression from LTM-based attentional selection to visual search beyond deviant-related actions and locations. Effects lasted for several trials and reappeared during reversion. © 2015 New York Academy of Sciences.

  7. Bias in random forest variable importance measures: Illustrations, sources and a solution

    Directory of Open Access Journals (Sweden)

    Hothorn Torsten

    2007-01-01

    Full Text Available Abstract Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Results Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. Conclusion We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and

  8. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies.

    Science.gov (United States)

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-07-07

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. Copyright © 2016 Chen et al.

  9. The mathematics of random mutation and natural selection for multiple simultaneous selection pressures and the evolution of antimicrobial drug resistance.

    Science.gov (United States)

    Kleinman, Alan

    2016-12-20

    The random mutation and natural selection phenomenon act in a mathematically predictable behavior, which when understood leads to approaches to reduce and prevent the failure of the use of these selection pressures when treating infections and cancers. The underlying principle to impair the random mutation and natural selection phenomenon is to use combination therapy, which forces the population to evolve to multiple selection pressures simultaneously that invoke the multiplication rule of probabilities simultaneously as well. Recently, it has been seen that combination therapy for the treatment of malaria has failed to prevent the emergence of drug-resistant variants. Using this empirical example and the principles of probability theory, the derivation of the equations describing this treatment failure is carried out. These equations give guidance as to how to use combination therapy for the treatment of cancers and infectious diseases and prevent the emergence of drug resistance. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Detecting Positive Selection of Korean Native Goat Populations Using Next-Generation Sequencing

    Science.gov (United States)

    Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal

    2016-01-01

    Goats (Capra hircus) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat’s selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome. PMID:27989103

  11. Detecting Positive Selection of Korean Native Goat Populations Using Next-Generation Sequencing.

    Science.gov (United States)

    Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal

    2016-12-01

    Goats ( Capra hircus ) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat's selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome.

  12. Classic selective sweeps revealed by massive sequencing in cattle.

    Directory of Open Access Journals (Sweden)

    Saber Qanbari

    2014-02-01

    Full Text Available Human driven selection during domestication and subsequent breed formation has likely left detectable signatures within the genome of modern cattle. The elucidation of these signatures of selection is of interest from the perspective of evolutionary biology, and for identifying domestication-related genes that ultimately may help to further genetically improve this economically important animal. To this end, we employed a panel of more than 15 million autosomal SNPs identified from re-sequencing of 43 Fleckvieh animals. We mainly applied two somewhat complementary statistics, the integrated Haplotype Homozygosity Score (iHS reflecting primarily ongoing selection, and the Composite of Likelihood Ratio (CLR having the most power to detect completed selection after fixation of the advantageous allele. We find 106 candidate selection regions, many of which are harboring genes related to phenotypes relevant in domestication, such as coat coloring pattern, neurobehavioral functioning and sensory perception including KIT, MITF, MC1R, NRG4, Erbb4, TMEM132D and TAS2R16, among others. To further investigate the relationship between genes with signatures of selection and genes identified in QTL mapping studies, we use a sample of 3062 animals to perform four genome-wide association analyses using appearance traits, body size and somatic cell count. We show that regions associated with coat coloring significantly (P<0.0001 overlap with the candidate selection regions, suggesting that the selection signals we identify are associated with traits known to be affected by selection during domestication. Results also provide further evidence regarding the complexity of the genetics underlying coat coloring in cattle. This study illustrates the potential of population genetic approaches for identifying genomic regions affecting domestication-related phenotypes and further helps to identify specific regions targeted by selection during speciation, domestication and

  13. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  14. Using a Calendar and Explanatory Instructions to Aid Within-Household Selection in Mail Surveys

    Science.gov (United States)

    Stange, Mathew; Smyth, Jolene D.; Olson, Kristen

    2016-01-01

    Although researchers can easily select probability samples of addresses using the U.S. Postal Service's Delivery Sequence File, randomly selecting respondents within households for surveys remains challenging. Researchers often place within-household selection instructions, such as the next or last birthday methods, in survey cover letters to…

  15. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    Science.gov (United States)

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only

  16. A Bayesian random effects discrete-choice model for resource selection: Population-level selection inference

    Science.gov (United States)

    Thomas, D.L.; Johnson, D.; Griffith, B.

    2006-01-01

    Modeling the probability of use of land units characterized by discrete and continuous measures, we present a Bayesian random-effects model to assess resource selection. This model provides simultaneous estimation of both individual- and population-level selection. Deviance information criterion (DIC), a Bayesian alternative to AIC that is sample-size specific, is used for model selection. Aerial radiolocation data from 76 adult female caribou (Rangifer tarandus) and calf pairs during 1 year on an Arctic coastal plain calving ground were used to illustrate models and assess population-level selection of landscape attributes, as well as individual heterogeneity of selection. Landscape attributes included elevation, NDVI (a measure of forage greenness), and land cover-type classification. Results from the first of a 2-stage model-selection procedure indicated that there is substantial heterogeneity among cow-calf pairs with respect to selection of the landscape attributes. In the second stage, selection of models with heterogeneity included indicated that at the population-level, NDVI and land cover class were significant attributes for selection of different landscapes by pairs on the calving ground. Population-level selection coefficients indicate that the pairs generally select landscapes with higher levels of NDVI, but the relationship is quadratic. The highest rate of selection occurs at values of NDVI less than the maximum observed. Results for land cover-class selections coefficients indicate that wet sedge, moist sedge, herbaceous tussock tundra, and shrub tussock tundra are selected at approximately the same rate, while alpine and sparsely vegetated landscapes are selected at a lower rate. Furthermore, the variability in selection by individual caribou for moist sedge and sparsely vegetated landscapes is large relative to the variability in selection of other land cover types. The example analysis illustrates that, while sometimes computationally intense, a

  17. Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences

    Directory of Open Access Journals (Sweden)

    Jiang Tao

    2011-10-01

    Full Text Available Abstract Background Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups. Results Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups. Conclusions Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.

  18. Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences.

    Science.gov (United States)

    Ruegger, Paul M; Della Vedova, Gianluca; Jiang, Tao; Borneman, James

    2011-10-10

    Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint) to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups. Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups. Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.

  19. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    Science.gov (United States)

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  20. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains.

    Directory of Open Access Journals (Sweden)

    Julie A J Clément

    Full Text Available BACKGROUND: The trematode flatworms of the genus Schistosoma, the causative agents of schistosomiasis, are among the most prevalent parasites in humans, affecting more than 200 million people worldwide. In this study, we focused on two well-characterized strains of S. mansoni, to explore signatures of selection. Both strains are highly inbred and exhibit differences in life history traits, in particular in their compatibility with the intermediate host Biomphalaria glabrata. METHODOLOGY/PRINCIPAL FINDINGS: We performed high throughput sequencing of DNA from pools of individuals of each strain using Illumina technology and identified single nucleotide polymorphisms (SNP and copy number variations (CNV. In total, 708,898 SNPs were identified and roughly 2,000 CNVs. The SNPs revealed low nucleotide diversity (π = 2 × 10(-4 within each strain and a high differentiation level (Fst = 0.73 between them. Based on a recently developed in-silico approach, we further detected 12 and 19 private (i.e. specific non-overlapping selective sweeps among the 121 and 151 sweeps found in total for each strain. CONCLUSIONS/SIGNIFICANCE: Functional annotation of transcripts lying in the private selective sweeps revealed specific selection for functions related to parasitic interaction (e.g. cell-cell adhesion or redox reactions. Despite high differentiation between strains, we identified evolutionary convergence of genes related to proteolysis, known as a key virulence factor and a potential target of drug and vaccine development. Our data show that pool-sequencing can be used for the detection of selective sweeps in parasite populations and enables one to identify biological functions under selection.

  1. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion.

    Science.gov (United States)

    Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David

    2018-03-19

    To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.

  2. The signature of positive selection at randomly chosen loci.

    Science.gov (United States)

    Przeworski, Molly

    2002-03-01

    In Drosophila and humans, there are accumulating examples of loci with a significant excess of high-frequency-derived alleles or high levels of linkage disequilibrium, relative to a neutral model of a random-mating population of constant size. These are features expected after a recent selective sweep. Their prevalence suggests that positive directional selection may be widespread in both species. However, as I show here, these features do not persist long after the sweep ends: The high-frequency alleles drift to fixation and no longer contribute to polymorphism, while linkage disequilibrium is broken down by recombination. As a result, loci chosen without independent evidence of recent selection are not expected to exhibit either of these features, even if they have been affected by numerous sweeps in their genealogical history. How then can we explain the patterns in the data? One possibility is population structure, with unequal sampling from different subpopulations. Alternatively, positive selection may not operate as is commonly modeled. In particular, the rate of fixation of advantageous mutations may have increased in the recent past.

  3. Variable depth recursion algorithm for leaf sequencing

    International Nuclear Information System (INIS)

    Siochi, R. Alfredo C.

    2007-01-01

    The processes of extraction and sweep are basic segmentation steps that are used in leaf sequencing algorithms. A modified version of a commercial leaf sequencer changed the way that the extracts are selected and expanded the search space, but the modification maintained the basic search paradigm of evaluating multiple solutions, each one consisting of up to 12 extracts and a sweep sequence. While it generated the best solutions compared to other published algorithms, it used more computation time. A new, faster algorithm selects one extract at a time but calls itself as an evaluation function a user-specified number of times, after which it uses the bidirectional sweeping window algorithm as the final evaluation function. To achieve a performance comparable to that of the modified commercial leaf sequencer, 2-3 calls were needed, and in all test cases, there were only slight improvements beyond two calls. For the 13 clinical test maps, computation speeds improved by a factor between 12 and 43, depending on the constraints, namely the ability to interdigitate and the avoidance of the tongue-and-groove under dose. The new algorithm was compared to the original and modified versions of the commercial leaf sequencer. It was also compared to other published algorithms for 1400, random, 15x15, test maps with 3-16 intensity levels. In every single case the new algorithm provided the best solution

  4. Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions.

    Science.gov (United States)

    Sato, Mitsuhiko P; Makino, Takashi; Kawata, Masakado

    2016-02-09

    Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning

  5. Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity.

    OpenAIRE

    Mulligan, M E; Hawley, D K; Entriken, R; McClure, W R

    1984-01-01

    We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KBk2, the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of +/-4.1 in KBk2 over a range of 10(4) in the same parameter. The quantitative evaluation was linked to ...

  6. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    Science.gov (United States)

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code

  7. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification

    Science.gov (United States)

    Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu

    2018-01-01

    Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844

  8. Comparative analysis of idiom selection and sequencing 5 in Estonian basic school EFL coursebooks

    Directory of Open Access Journals (Sweden)

    Rita Anita Forssten

    2017-05-01

    Full Text Available The article investigates the selection and sequencing of the idioms encountered in two locally-produced and international coursebook series currently employed in Estonian basic schools. It is hypothesized that there exists a positive correlation between idioms’ difficulty and coursebooks’ language proficiency level. The hypothesis is tested through a statistical analysis of the idioms found which are categorized in terms of their analysability into three categories where category 1 includes analysable semi-literal idioms, category 2 comprises analysable semi-transparent idioms, and category 3 encompasses non-analysable opaque idioms, and then analysed through an online language corpus (British National Corpus. The results of the study reveal that the coursebook authors under discussion have disregarded idioms’ frequency as a criterion for selection or sequencing, whereas the factor utilized to some extent is the degree of analysability.

  9. Interference-aware random beam selection schemes for spectrum sharing systems

    KAUST Repository

    Abdallah, Mohamed

    2012-10-19

    Spectrum sharing systems have been recently introduced to alleviate the problem of spectrum scarcity by allowing secondary unlicensed networks to share the spectrum with primary licensed networks under acceptable interference levels to the primary users. In this work, we develop interference-aware random beam selection schemes that provide enhanced performance for the secondary network under the condition that the interference observed by the receivers of the primary network is below a predetermined/acceptable value. We consider a secondary link composed of a transmitter equipped with multiple antennas and a single-antenna receiver sharing the same spectrum with a primary link composed of a single-antenna transmitter and a single-antenna receiver. The proposed schemes select a beam, among a set of power-optimized random beams, that maximizes the signal-to-interference-plus-noise ratio (SINR) of the secondary link while satisfying the primary interference constraint for different levels of feedback information describing the interference level at the primary receiver. For the proposed schemes, we develop a statistical analysis for the SINR statistics as well as the capacity and bit error rate (BER) of the secondary link.

  10. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  11. Topology-selective jamming of fully-connected, code-division random-access networks

    Science.gov (United States)

    Polydoros, Andreas; Cheng, Unjeng

    1990-01-01

    The purpose is to introduce certain models of topology selective stochastic jamming and examine its impact on a class of fully-connected, spread-spectrum, slotted ALOHA-type random access networks. The theory covers dedicated as well as half-duplex units. The dominant role of the spatial duty factor is established, and connections with the dual concept of time selective jamming are discussed. The optimal choices of coding rate and link access parameters (from the users' side) and the jamming spatial fraction are numerically established for DS and FH spreading.

  12. Succinylcholine versus rocuronium for rapid sequence intubation in intensive care: a prospective, randomized controlled trial

    Science.gov (United States)

    2011-01-01

    Introduction Succinylcholine and rocuronium are widely used to facilitate rapid sequence induction (RSI) intubation in intensive care. Concerns relate to the side effects of succinylcholine and to slower onset and inferior intubation conditions associated with rocuronium. So far, succinylcholine and rocuronium have not been compared in an adequately powered randomized trial in intensive care. Accordingly, the aim of the present study was to compare the incidence of hypoxemia after rocuronium or succinylcholine in critically ill patients requiring an emergent RSI. Methods This was a prospective randomized controlled single-blind trial conducted from 2006 to 2010 at the University Hospital of Basel. Participants were 401 critically ill patients requiring emergent RSI. Patients were randomized to receive 1 mg/kg succinylcholine or 0.6 mg/kg rocuronium for neuromuscular blockade. The primary outcome was the incidence of oxygen desaturations defined as a decrease in oxygen saturation ≥ 5%, assessed by continuous pulse oxymetry, at any time between the start of the induction sequence and two minutes after the completion of the intubation. A severe oxygen desaturation was defined as a decrease in oxygen saturation ≥ 5% leading to a saturation value of ≤ 80%. Results There was no difference between succinylcholine and rocuronium regarding oxygen desaturations (succinylcholine 73/196; rocuronium 66/195; P = 0.67); severe oxygen desaturations (succinylcholine 20/196; rocuronium 20/195; P = 1.0); and extent of oxygen desaturations (succinylcholine -14 ± 12%; rocuronium -16 ± 13%; P = 0.77). The duration of the intubation sequence was shorter after succinycholine than after rocuronium (81 ± 38 sec versus 95 ± 48 sec; P = 0.002). Intubation conditions (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups. Conclusions In critically ill

  13. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  14. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  15. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

    Science.gov (United States)

    Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

    2015-06-17

    High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This

  16. Screening the sequence selectivity of DNA-binding molecules using a gold nanoparticle-based colorimetric approach.

    Science.gov (United States)

    Hurst, Sarah J; Han, Min Su; Lytton-Jean, Abigail K R; Mirkin, Chad A

    2007-09-15

    We have developed a novel competition assay that uses a gold nanoparticle (Au NP)-based, high-throughput colorimetric approach to screen the sequence selectivity of DNA-binding molecules. This assay hinges on the observation that the melting behavior of DNA-functionalized Au NP aggregates is sensitive to the concentration of the DNA-binding molecule in solution. When short, oligomeric hairpin DNA sequences were added to a reaction solution consisting of DNA-functionalized Au NP aggregates and DNA-binding molecules, these molecules may either bind to the Au NP aggregate interconnects or the hairpin stems based on their relative affinity for each. This relative affinity can be measured as a change in the melting temperature (Tm) of the DNA-modified Au NP aggregates in solution. As a proof of concept, we evaluated the selectivity of 4',6-diamidino-2-phenylindone (an AT-specific binder), ethidium bromide (a nonspecific binder), and chromomycin A (a GC-specific binder) for six sequences of hairpin DNA having different numbers of AT pairs in a five-base pair variable stem region. Our assay accurately and easily confirmed the known trends in selectivity for the DNA binders in question without the use of complicated instrumentation. This novel assay will be useful in assessing large libraries of potential drug candidates that work by binding DNA to form a drug/DNA complex.

  17. No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution

    DEFF Research Database (Denmark)

    Workman, Christopher; Krogh, Anders Stærmose

    1999-01-01

    This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly diff...

  18. Peculiarities of the statistics of spectrally selected fluorescence radiation in laser-pumped dye-doped random media

    Science.gov (United States)

    Yuvchenko, S. A.; Ushakova, E. V.; Pavlova, M. V.; Alonova, M. V.; Zimnyakov, D. A.

    2018-04-01

    We consider the practical realization of a new optical probe method of the random media which is defined as the reference-free path length interferometry with the intensity moments analysis. A peculiarity in the statistics of the spectrally selected fluorescence radiation in laser-pumped dye-doped random medium is discussed. Previously established correlations between the second- and the third-order moments of the intensity fluctuations in the random interference patterns, the coherence function of the probe radiation, and the path difference probability density for the interfering partial waves in the medium are confirmed. The correlations were verified using the statistical analysis of the spectrally selected fluorescence radiation emitted by a laser-pumped dye-doped random medium. Water solution of Rhodamine 6G was applied as the doping fluorescent agent for the ensembles of the densely packed silica grains, which were pumped by the 532 nm radiation of a solid state laser. The spectrum of the mean path length for a random medium was reconstructed.

  19. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    Directory of Open Access Journals (Sweden)

    Huilin Wang

    Full Text Available X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM. Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I. Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II, which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization

  20. Key Aspects of Nucleic Acid Library Design for in Vitro Selection

    Science.gov (United States)

    Vorobyeva, Maria A.; Davydova, Anna S.; Vorobjev, Pavel E.; Pyshnyi, Dmitrii V.; Venyaminova, Alya G.

    2018-01-01

    Nucleic acid aptamers capable of selectively recognizing their target molecules have nowadays been established as powerful and tunable tools for biospecific applications, be it therapeutics, drug delivery systems or biosensors. It is now generally acknowledged that in vitro selection enables one to generate aptamers to almost any target of interest. However, the success of selection and the affinity of the resulting aptamers depend to a large extent on the nature and design of an initial random nucleic acid library. In this review, we summarize and discuss the most important features of the design of nucleic acid libraries for in vitro selection such as the nature of the library (DNA, RNA or modified nucleotides), the length of a randomized region and the presence of fixed sequences. We also compare and contrast different randomization strategies and consider computer methods of library design and some other aspects. PMID:29401748

  1. The genealogy of samples in models with selection.

    Science.gov (United States)

    Neuhauser, C; Krone, S M

    1997-02-01

    We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models. DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.

  2. Random matrices and random difference equations

    International Nuclear Information System (INIS)

    Uppuluri, V.R.R.

    1975-01-01

    Mathematical models leading to products of random matrices and random difference equations are discussed. A one-compartment model with random behavior is introduced, and it is shown how the average concentration in the discrete time model converges to the exponential function. This is of relevance to understanding how radioactivity gets trapped in bone structure in blood--bone systems. The ideas are then generalized to two-compartment models and mammillary systems, where products of random matrices appear in a natural way. The appearance of products of random matrices in applications in demography and control theory is considered. Then random sequences motivated from the following problems are studied: constant pulsing and random decay models, random pulsing and constant decay models, and random pulsing and random decay models

  3. In vitro site selection of a consensus binding site for the Drosophila melanogaster Tbx20 homolog midline.

    Directory of Open Access Journals (Sweden)

    Nima Najand

    Full Text Available We employed in vitro site selection to identify a consensus binding sequence for the Drosophila melanogaster Tbx20 T-box transcription factor homolog Midline. We purified a bacterially expressed T-box DNA binding domain of Midline, and used it in four rounds of precipitation and polymerase-chain-reaction based amplification. We cloned and sequenced 54 random oligonucleotides selected by Midline. Electromobility shift-assays confirmed that 27 of these could bind the Midline T-box. Sequence alignment of these 27 clones suggests that Midline binds as a monomer to a consensus sequence that contains an AGGTGT core. Thus, the Midline consensus binding site we define in this study is similar to that defined for vertebrate Tbx20, but differs from a previously reported Midline binding sequence derived through site selection.

  4. Randomized Prediction Games for Adversarial Machine Learning.

    Science.gov (United States)

    Rota Bulo, Samuel; Biggio, Battista; Pillai, Ignazio; Pelillo, Marcello; Roli, Fabio

    In spam and malware detection, attackers exploit randomization to obfuscate malicious data and increase their chances of evading detection at test time, e.g., malware code is typically obfuscated using random strings or byte sequences to hide known exploits. Interestingly, randomization has also been proposed to improve security of learning algorithms against evasion attacks, as it results in hiding information about the classifier to the attacker. Recent work has proposed game-theoretical formulations to learn secure classifiers, by simulating different evasion attacks and modifying the classification function accordingly. However, both the classification function and the simulated data manipulations have been modeled in a deterministic manner, without accounting for any form of randomization. In this paper, we overcome this limitation by proposing a randomized prediction game, namely, a noncooperative game-theoretic formulation in which the classifier and the attacker make randomized strategy selections according to some probability distribution defined over the respective strategy set. We show that our approach allows one to improve the tradeoff between attack detection and false alarms with respect to the state-of-the-art secure classifiers, even against attacks that are different from those hypothesized during design, on application examples including handwritten digit recognition, spam, and malware detection.In spam and malware detection, attackers exploit randomization to obfuscate malicious data and increase their chances of evading detection at test time, e.g., malware code is typically obfuscated using random strings or byte sequences to hide known exploits. Interestingly, randomization has also been proposed to improve security of learning algorithms against evasion attacks, as it results in hiding information about the classifier to the attacker. Recent work has proposed game-theoretical formulations to learn secure classifiers, by simulating different

  5. Evolutionary dynamics on networks of selectively neutral genotypes: Effects of topology and sequence stability

    Science.gov (United States)

    Aguirre, Jacobo; Buldú, Javier M.; Manrubia, Susanna C.

    2009-12-01

    Networks of selectively neutral genotypes underlie the evolution of populations of replicators in constant environments. Previous theoretical analysis predicted that such populations will evolve toward highly connected regions of the genome space. We first study the evolution of populations of replicators on simple networks and quantify how the transient time to equilibrium depends on the initial distribution of sequences on the neutral network, on the topological properties of the latter, and on the mutation rate. Second, network neutrality is broken through the introduction of an energy for each sequence. This allows to study the competition between two features (neutrality and energetic stability) relevant for survival and subjected to different selective pressures. In cases where the two features are negatively correlated, the population experiences sudden migrations in the genome space for values of the relevant parameters that we calculate. The numerical study of larger networks indicates that the qualitative behavior to be expected in more realistic cases is already seen in representative examples of small networks.

  6. Evolutionary dynamics on networks of selectively neutral genotypes: effects of topology and sequence stability.

    Science.gov (United States)

    Aguirre, Jacobo; Buldú, Javier M; Manrubia, Susanna C

    2009-12-01

    Networks of selectively neutral genotypes underlie the evolution of populations of replicators in constant environments. Previous theoretical analysis predicted that such populations will evolve toward highly connected regions of the genome space. We first study the evolution of populations of replicators on simple networks and quantify how the transient time to equilibrium depends on the initial distribution of sequences on the neutral network, on the topological properties of the latter, and on the mutation rate. Second, network neutrality is broken through the introduction of an energy for each sequence. This allows to study the competition between two features (neutrality and energetic stability) relevant for survival and subjected to different selective pressures. In cases where the two features are negatively correlated, the population experiences sudden migrations in the genome space for values of the relevant parameters that we calculate. The numerical study of larger networks indicates that the qualitative behavior to be expected in more realistic cases is already seen in representative examples of small networks.

  7. QUASI-RANDOM TESTING OF COMPUTER SYSTEMS

    Directory of Open Access Journals (Sweden)

    S. V. Yarmolik

    2013-01-01

    Full Text Available Various modified random testing approaches have been proposed for computer system testing in the black box environment. Their effectiveness has been evaluated on the typical failure patterns by employing three measures, namely, P-measure, E-measure and F-measure. A quasi-random testing, being a modified version of the random testing, has been proposed and analyzed. The quasi-random Sobol sequences and modified Sobol sequences are used as the test patterns. Some new methods for Sobol sequence generation have been proposed and analyzed.

  8. Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS).

    Science.gov (United States)

    Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T

    2017-04-15

    RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. A Developmental and Sequenced One-to-One Educational Intervention for Autism Spectrum Disorder: A Randomized Single-Blind Controlled Trial.

    Science.gov (United States)

    Tanet, Antoine; Hubert-Barthelemy, Annik; Crespin, Graciela C; Bodeau, Nicolas; Cohen, David; Saint-Georges, Catherine

    2016-01-01

    Individuals with autism spectrum disorder (ASD) who also exhibit severe-to-moderate ranges of intellectual disability (ID) still face many challenges (i.e., less evidence-based trials, less inclusion in school with peers). We implemented a novel model called the "Developmental and Sequenced One-to-One Educational Intervention" (DS1-EI) in 5- to 9-year-old children with co-occurring ASD and ID. The treatment protocol was adapted for school implementation by designing it using an educational agenda. The intervention was based on intensity, regular assessments, updating objectives, encouraging spontaneous communication, promoting skills through play with peers, supporting positive behaviors, providing supervision, capitalizing on teachers' unique skills, and providing developmental and sequenced learning. Developmental learning implies that the focus of training is what is close to the developmental expectations given a child's development in a specific domain. Sequenced learning means that the teacher changes the learning activities every 10-15 min to maintain the child's attention in the context of an anticipated time agenda. We selected 11 French institutions in which we implemented the model in small classrooms. Each institution recruited participants per dyads matched by age, sex, and developmental quotient. Patients from each dyad were then randomized to a DS1-EI group or a Treatment as usual (TAU) group for 36 months. The primary variables - the Childhood Autism Rating scale (CARS) and the psychoeducational profile (PEP-3) - will be blindly assessed by independent raters at the 18-month and 36-month follow-up. We enrolled 75 participants: 38 were randomized to the DS1-EI and 37 to the TAU groups. At enrollment, we found no significant differences in participants' characteristics between groups. As expected, exposure to school was the only significant difference [9.4 (±4.1) h/week in the DS1-EI group vs. 3.4 (±4.5) h/week in the TAU group, Student's t

  10. QDD: a user-friendly program to select microsatellite markers and design primers from large sequencing projects.

    Science.gov (United States)

    Meglécz, Emese; Costedoat, Caroline; Dubut, Vincent; Gilles, André; Malausa, Thibaut; Pech, Nicolas; Martin, Jean-François

    2010-02-01

    QDD is an open access program providing a user-friendly tool for microsatellite detection and primer design from large sets of DNA sequences. The program is designed to deal with all steps of treatment of raw sequences obtained from pyrosequencing of enriched DNA libraries, but it is also applicable to data obtained through other sequencing methods, using FASTA files as input. The following tasks are completed by QDD: tag sorting, adapter/vector removal, elimination of redundant sequences, detection of possible genomic multicopies (duplicated loci or transposable elements), stringent selection of target microsatellites and customizable primer design. It can treat up to one million sequences of a few hundred base pairs in the tag-sorting step, and up to 50,000 sequences in a single input file for the steps involving estimation of sequence similarity. QDD is freely available under the GPL licence for Windows and Linux from the following web site: http://www.univ-provence.fr/gsite/Local/egee/dir/meglecz/QDD.html. Supplementary data are available at Bioinformatics online.

  11. Blind Measurement Selection: A Random Matrix Theory Approach

    KAUST Repository

    Elkhalil, Khalil

    2016-12-14

    This paper considers the problem of selecting a set of $k$ measurements from $n$ available sensor observations. The selected measurements should minimize a certain error function assessing the error in estimating a certain $m$ dimensional parameter vector. The exhaustive search inspecting each of the $n\\\\choose k$ possible choices would require a very high computational complexity and as such is not practical for large $n$ and $k$. Alternative methods with low complexity have recently been investigated but their main drawbacks are that 1) they require perfect knowledge of the measurement matrix and 2) they need to be applied at the pace of change of the measurement matrix. To overcome these issues, we consider the asymptotic regime in which $k$, $n$ and $m$ grow large at the same pace. Tools from random matrix theory are then used to approximate in closed-form the most important error measures that are commonly used. The asymptotic approximations are then leveraged to select properly $k$ measurements exhibiting low values for the asymptotic error measures. Two heuristic algorithms are proposed: the first one merely consists in applying the convex optimization artifice to the asymptotic error measure. The second algorithm is a low-complexity greedy algorithm that attempts to look for a sufficiently good solution for the original minimization problem. The greedy algorithm can be applied to both the exact and the asymptotic error measures and can be thus implemented in blind and channel-aware fashions. We present two potential applications where the proposed algorithms can be used, namely antenna selection for uplink transmissions in large scale multi-user systems and sensor selection for wireless sensor networks. Numerical results are also presented and sustain the efficiency of the proposed blind methods in reaching the performances of channel-aware algorithms.

  12. Random Tagging Genotyping by Sequencing (rtGBS, an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome.

    Directory of Open Access Journals (Sweden)

    Elena Hilario

    Full Text Available Genotyping by sequencing (GBS is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al.some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS. By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145 of BamH I sites shared with the reference genome, compared to only 14% (11,513 by stdGBS.

  13. A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

    Science.gov (United States)

    Tong, Xiaojun; Cui, Minggen; Wang, Zhu

    2009-07-01

    The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.

  14. Sequence analysis of chromosome 1 revealed different selection patterns between Chinese wild mice and laboratory strains.

    Science.gov (United States)

    Xu, Fuyi; Hu, Shixian; Chao, Tianzhu; Wang, Maochun; Li, Kai; Zhou, Yuxun; Xu, Hongyan; Xiao, Junhua

    2017-10-01

    Both natural and artificial selection play a critical role in animals' adaptation to the environment. Detection of the signature of selection in genomic regions can provide insights for understanding the function of specific phenotypes. It is generally assumed that laboratory mice may experience intense artificial selection while wild mice more natural selection. However, the differences of selection signature in the mouse genome and underlying genes between wild and laboratory mice remain unclear. In this study, we used two mouse populations: chromosome 1 (Chr 1) substitution lines (C1SLs) derived from Chinese wild mice and mouse genome project (MGP) sequenced inbred strains and two selection detection statistics: Fst and Tajima's D to identify the signature of selection footprint on Chr 1. For the differentiation between the C1SLs and MGP, 110 candidate selection regions containing 47 protein coding genes were detected. A total of 149 selection regions which encompass 7.215 Mb were identified in the C1SLs by Tajima's D approach. While for the MGP, we identified nearly twice selection regions (243) compared with the C1SLs which accounted for 13.27 Mb Chr 1 sequence. Through functional annotation, we identified several biological processes with significant enrichment including seven genes in the olfactory transduction pathway. In addition, we searched the phenotypes associated with the 47 candidate selection genes identified by Fst. These genes were involved in behavior, growth or body weight, mortality or aging, and immune systems which align well with the phenotypic differences between wild and laboratory mice. Therefore, the findings would be helpful for our understanding of the phenotypic differences between wild and laboratory mice and applications for using this new mouse resource (C1SLs) for further genetics studies.

  15. The role of upstream sequences in selecting the reading frame on tmRNA

    Directory of Open Access Journals (Sweden)

    Dewey Jonathan D

    2008-06-01

    Full Text Available Abstract Background tmRNA acts first as a tRNA and then as an mRNA to rescue stalled ribosomes in eubacteria. Two unanswered questions about tmRNA function remain: how does tmRNA, lacking an anticodon, bypass the decoding machinery and enter the ribosome? Secondly, how does the ribosome choose the proper codon to resume translation on tmRNA? According to the -1 triplet hypothesis, the answer to both questions lies in the unique properties of the three nucleotides upstream of the first tmRNA codon. These nucleotides assume an A-form conformation that mimics the codon-anticodon interaction, leading to recognition by the decoding center and choice of the reading frame. The -1 triplet hypothesis is important because it is the most credible model in which direct binding and recognition by the ribosome sets the reading frame on tmRNA. Results Conformational analysis predicts that 18 triplets cannot form the correct structure to function as the -1 triplet of tmRNA. We tested the tmRNA activity of all possible -1 triplet mutants using a genetic assay in Escherichia coli. While many mutants displayed reduced activity, our findings do not match the predictions of this model. Additional mutagenesis identified sequences further upstream that are required for tmRNA function. An immunoblot assay for translation of the tmRNA tag revealed that certain mutations in U85, A86, and the -1 triplet sequence result in improper selection of the first codon and translation in the wrong frame (-1 or +1 in vivo. Conclusion Our findings disprove the -1 triplet hypothesis. The -1 triplet is not required for accommodation of tmRNA into the ribosome, although it plays a minor role in frame selection. Our results strongly disfavor direct ribosomal recognition of the upstream sequence, instead supporting a model in which the binding of a separate ligand to A86 is primarily responsible for frame selection.

  16. Entropy of finite random binary sequences with weak long-range correlations.

    Science.gov (United States)

    Melnik, S S; Usatenko, O V

    2014-11-01

    We study the N-step binary stationary ergodic Markov chain and analyze its differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain through the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses the two-point correlators instead of the block probability, it makes it possible to calculate the entropy of strings at much longer distances than using standard methods. A fluctuation contribution to the entropy due to finiteness of random chains is examined. This contribution can be of the same order as its regular part even at the relatively short lengths of subsequences. A self-similar structure of entropy with respect to the decimation transformations is revealed for some specific forms of the pair correlation function. Application of the theory to the DNA sequence of the R3 chromosome of Drosophila melanogaster is presented.

  17. Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti.

    Science.gov (United States)

    Wampler, Peter J; Rediske, Richard R; Molla, Azizur R

    2013-01-18

    A remote sensing technique was developed which combines a Geographic Information System (GIS); Google Earth, and Microsoft Excel to identify home locations for a random sample of households in rural Haiti. The method was used to select homes for ethnographic and water quality research in a region of rural Haiti located within 9 km of a local hospital and source of health education in Deschapelles, Haiti. The technique does not require access to governmental records or ground based surveys to collect household location data and can be performed in a rapid, cost-effective manner. The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth, Microsoft Excel, and handheld Garmin GPSmap 76CSx GPS units. Homes were identified and mapped in Google Earth, exported to ArcMap 10.0, and a random list of homes was generated using Microsoft Excel which was then loaded onto handheld GPS units for field location. The development and use of a remote sensing method was essential to the selection and location of random households. A total of 537 homes initially were mapped and a randomized subset of 96 was identified as potential survey locations. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings. Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes visited were not occupied at the time of the visit due to work away from the home or market days. A total of 55 households were located using this method during the 10 days of fieldwork in May and June of 2012. The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only rarely was local knowledge required to identify and locate households. This

  18. Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti

    Directory of Open Access Journals (Sweden)

    Wampler Peter J

    2013-01-01

    Full Text Available Abstract Background A remote sensing technique was developed which combines a Geographic Information System (GIS; Google Earth, and Microsoft Excel to identify home locations for a random sample of households in rural Haiti. The method was used to select homes for ethnographic and water quality research in a region of rural Haiti located within 9 km of a local hospital and source of health education in Deschapelles, Haiti. The technique does not require access to governmental records or ground based surveys to collect household location data and can be performed in a rapid, cost-effective manner. Methods The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth, Microsoft Excel, and handheld Garmin GPSmap 76CSx GPS units. Homes were identified and mapped in Google Earth, exported to ArcMap 10.0, and a random list of homes was generated using Microsoft Excel which was then loaded onto handheld GPS units for field location. The development and use of a remote sensing method was essential to the selection and location of random households. Results A total of 537 homes initially were mapped and a randomized subset of 96 was identified as potential survey locations. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings. Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes visited were not occupied at the time of the visit due to work away from the home or market days. A total of 55 households were located using this method during the 10 days of fieldwork in May and June of 2012. Conclusions The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only

  19. Genome Sequencing Reveals Loci under Artificial Selection that Underlie Disease Phenotypes in the Laboratory Rat

    NARCIS (Netherlands)

    Atanur, Santosh S.; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R.; Kaisaki, Pamela J.; Otto, Georg W.; Ma, Man Chun John; Keane, Thomas M.; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R.; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J.; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J.

    2013-01-01

    Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and

  20. Prehospital rapid sequence intubation improves functional outcome for patients with severe traumatic brain injury: a randomized controlled trial.

    Science.gov (United States)

    Bernard, Stephen A; Nguyen, Vina; Cameron, Peter; Masci, Kevin; Fitzgerald, Mark; Cooper, David J; Walker, Tony; Std, B Paramed; Myles, Paul; Murray, Lynne; David; Taylor; Smith, Karen; Patrick, Ian; Edington, John; Bacon, Andrew; Rosenfeld, Jeffrey V; Judson, Rodney

    2010-12-01

    To determine whether paramedic rapid sequence intubation in patients with severe traumatic brain injury (TBI) improves neurologic outcomes at 6 months compared with intubation in the hospital. Severe TBI is associated with a high rate of mortality and long-term morbidity. Comatose patients with TBI routinely undergo endo-tracheal intubation to protect the airway, prevent hypoxia, and control ventilation. In many places, paramedics perform intubation prior to hospital arrival. However, it is unknown whether this approach improves outcomes. In a prospective, randomized, controlled trial, we assigned adults with severe TBI in an urban setting to either prehospital rapid sequence intubation by paramedics or transport to a hospital emergency department for intubation by physicians. The primary outcome measure was the median extended Glasgow Outcome Scale (GOSe) score at 6 months. Secondary end-points were favorable versus unfavorable outcome at 6 months, length of intensive care and hospital stay, and survival to hospital discharge. A total of 312 patients with severe TBI were randomly assigned to paramedic rapid sequence intubation or hospital intubation. The success rate for paramedic intubation was 97%. At 6 months, the median GOSe score was 5 (interquartile range, 1-6) in patients intubated by paramedics compared with 3 (interquartile range, 1-6) in the patients intubated at hospital (P = 0.28).The proportion of patients with favorable outcome (GOSe, 5-8) was 80 of 157 patients (51%) in the paramedic intubation group compared with 56 of 142 patients (39%) in the hospital intubation group (risk ratio, 1.28; 95% confidence interval, 1.00-1.64; P = 0.046). There were no differences in intensive care or hospital length of stay, or in survival to hospital discharge. In adults with severe TBI, prehospital rapid sequence intubation by paramedics increases the rate of favorable neurologic outcome at 6 months compared with intubation in the hospital.

  1. Optimizing Event Selection with the Random Grid Search

    Energy Technology Data Exchange (ETDEWEB)

    Bhat, Pushpalatha C. [Fermilab; Prosper, Harrison B. [Florida State U.; Sekmen, Sezen [Kyungpook Natl. U.; Stewart, Chip [Broad Inst., Cambridge

    2017-06-29

    The random grid search (RGS) is a simple, but efficient, stochastic algorithm to find optimal cuts that was developed in the context of the search for the top quark at Fermilab in the mid-1990s. The algorithm, and associated code, have been enhanced recently with the introduction of two new cut types, one of which has been successfully used in searches for supersymmetry at the Large Hadron Collider. The RGS optimization algorithm is described along with the recent developments, which are illustrated with two examples from particle physics. One explores the optimization of the selection of vector boson fusion events in the four-lepton decay mode of the Higgs boson and the other optimizes SUSY searches using boosted objects and the razor variables.

  2. Non-random mating for selection with restricted rates of inbreeding and overlapping generations

    NARCIS (Netherlands)

    Sonesson, A.K.; Meuwissen, T.H.E.

    2002-01-01

    Minimum coancestry mating with a maximum of one offspring per mating pair (MC1) is compared with random mating schemes for populations with overlapping generations. Optimum contribution selection is used, whereby $\\\\\\\\Delta F$ is restricted. For schemes with $\\\\\\\\Delta F$ restricted to 0.25% per

  3. Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana.

    Science.gov (United States)

    Hoffmann, Robert D; Palmgren, Michael

    2016-06-13

    Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. However, how these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown. Here, we analyzed the coding and non-coding sequences of paralogous genes in Arabidopsis thaliana and compared these sequences with those of orthologous genes in Arabidopsis lyrata. Paralogs with lower expression than their duplicate had more nonsynonymous substitutions, were more likely to fractionate, and exhibited less similar expression patterns with their orthologs in the other species. Also, lower-expressed genes had greater tissue specificity. Orthologous conserved non-coding sequences in the promoters, introns, and 3' untranslated regions were less abundant at lower-expressed genes compared to their higher-expressed paralogs. A gene ontology (GO) term enrichment analysis showed that paralogs with similar expression levels were enriched in GO terms related to ribosomes, whereas paralogs with different expression levels were enriched in terms associated with stress responses. Loss of conserved non-coding sequences in one gene of a paralogous gene pair correlates with reduced expression levels that are more tissue specific. Together with increased mutation rates in the coding sequences, this suggests that similar forces of purifying selection act on coding and non-coding sequences. We propose that coding and non-coding sequences evolve concurrently following gene duplication.

  4. Comparative Evaluations of Randomly Selected Four Point-of-Care Glucometer Devices in Addis Ababa, Ethiopia.

    Science.gov (United States)

    Wolde, Mistire; Tarekegn, Getahun; Kebede, Tedla

    2018-05-01

    Point-of-care glucometer (PoCG) devices play a significant role in self-monitoring of the blood sugar level, particularly in the follow-up of high blood sugar therapeutic response. The aim of this study was to evaluate blood glucose test results performed with four randomly selected glucometers on diabetes and control subjects versus standard wet chemistry (hexokinase) methods in Addis Ababa, Ethiopia. A prospective cross-sectional study was conducted on randomly selected 200 study participants (100 participants with diabetes and 100 healthy controls). Four randomly selected PoCG devices (CareSens N, DIAVUE Prudential, On Call Extra, i-QARE DS-W) were evaluated against hexokinase method and ISO 15197:2003 and ISO 15197:2013 standards. The minimum and maximum blood sugar values were recorded by CareSens N (21 mg/dl) and hexokinase method (498.8 mg/dl), respectively. The mean sugar values of all PoCG devices except On Call Extra showed significant differences compared with the reference hexokinase method. Meanwhile, all four PoCG devices had strong positive relationship (>80%) with the reference method (hexokinase). On the other hand, none of the four PoCG devices fulfilled the minimum accuracy measurement set by ISO 15197:2003 and ISO 15197:2013 standards. In addition, the linear regression analysis revealed that all four selected PoCG overestimated the glucose concentrations. The overall evaluation of the selected four PoCG measurements were poorly correlated with standard reference method. Therefore, before introducing PoCG devices to the market, there should be a standardized evaluation platform for validation. Further similar large-scale studies on other PoCG devices also need to be undertaken.

  5. Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Ramoni Marco F

    2007-03-01

    Full Text Available Abstract Background Recent studies have shown that when individuals are grouped on the basis of genetic similarity, group membership corresponds closely to continental origin. There has been considerable debate about the implications of these findings in the context of larger debates about race and the extent of genetic variation between groups. Some have argued that clustering according to continental origin demonstrates the existence of significant genetic differences between groups and that these differences may have important implications for differences in health and disease. Others argue that clustering according to continental origin requires the use of large amounts of genetic data or specifically chosen markers and is indicative only of very subtle genetic differences that are unlikely to have biomedical significance. Results We used small numbers of randomly selected single nucleotide polymorphisms (SNPs from the International HapMap Project to train naïve Bayes classifiers for prediction of ancestral continent of origin. Predictive accuracy was tested on two independent data sets. Genetically similar groups should be difficult to distinguish, especially if only a small number of genetic markers are used. The genetic differences between continentally defined groups are sufficiently large that one can accurately predict ancestral continent of origin using only a minute, randomly selected fraction of the genetic variation present in the human genome. Genotype data from only 50 random SNPs was sufficient to predict ancestral continent of origin in our primary test data set with an average accuracy of 95%. Genetic variations informative about ancestry were common and widely distributed throughout the genome. Conclusion Accurate characterization of ancestry is possible using small numbers of randomly selected SNPs. The results presented here show how investigators conducting genetic association studies can use small numbers of arbitrarily

  6. Refining the Results of a Classical SELEX Experiment by Expanding the Sequence Data Set of an Aptamer Pool Selected for Protein A

    Directory of Open Access Journals (Sweden)

    Regina Stoltenburg

    2018-02-01

    Full Text Available New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus. In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of KD = 20 ± 1 nM.

  7. Refining the Results of a Classical SELEX Experiment by Expanding the Sequence Data Set of an Aptamer Pool Selected for Protein A.

    Science.gov (United States)

    Stoltenburg, Regina; Strehlitz, Beate

    2018-02-24

    New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus . In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of K D = 20 ± 1 nM.

  8. Partial Sequencing of 16S rRNA Gene of Selected Staphylococcus aureus Isolates and its Antibiotic Resistance

    Directory of Open Access Journals (Sweden)

    Harsi Dewantari Kusumaningrum

    2016-08-01

    Full Text Available The choice of primer used in 16S rRNA sequencing for identification of Staphylococcus species found in food is important. This study aimed to characterize Staphylococcus aureus isolates by partial sequencing based on 16S rRNA gene employing primers 16sF, 63F or 1387R. The isolates were isolated from milk, egg dishes and chicken dishes and selected based on the presence of sea gene that responsible for formation of enterotoxin-A. Antibiotic susceptibility of the isolates towards six antibiotics was also tested. The use of 16sF resulted generally in higher identity percentage and query coverage compared to the sequencing by 63F or 1387R. BLAST results of all isolates, sequenced by 16sF, showed 99% homology to complete genome of four S. aureus strains, with different characteristics on enterotoxin production and antibiotic resistance. Considering that all isolates were carrying sea gene, indicated by the occurence of 120 bp amplicon after PCR amplification using primer SEA1/SEA2,  the isolates were most in agreeing to S. aureus subsp. aureus ST288. This study indicated that 4 out of 8 selected isolates were resistant towards streptomycin. The 16S rRNA gene sequencing using 16sF is useful for identification of S. aureus. However, additional analysis such as PCR employing specific gene target, should give a valuable supplementary information, when specific characteristic is expected.

  9. Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction.

    Science.gov (United States)

    Marques, Yuri Bento; de Paiva Oliveira, Alcione; Ribeiro Vasconcelos, Ana Tereza; Cerqueira, Fabio Ribeiro

    2016-12-15

    MicroRNAs (miRNAs) are key gene expression regulators in plants and animals. Therefore, miRNAs are involved in several biological processes, making the study of these molecules one of the most relevant topics of molecular biology nowadays. However, characterizing miRNAs in vivo is still a complex task. As a consequence, in silico methods have been developed to predict miRNA loci. A common ab initio strategy to find miRNAs in genomic data is to search for sequences that can fold into the typical hairpin structure of miRNA precursors (pre-miRNAs). The current ab initio approaches, however, have selectivity issues, i.e., a high number of false positives is reported, which can lead to laborious and costly attempts to provide biological validation. This study presents an extension of the ab initio method miRNAFold, with the aim of improving selectivity through machine learning techniques, namely, random forest combined with the SMOTE procedure that copes with imbalance datasets. By comparing our method, termed Mirnacle, with other important approaches in the literature, we demonstrate that Mirnacle substantially improves selectivity without compromising sensitivity. For the three datasets used in our experiments, our method achieved at least 97% of sensitivity and could deliver a two-fold, 20-fold, and 6-fold increase in selectivity, respectively, compared with the best results of current computational tools. The extension of miRNAFold by the introduction of machine learning techniques, significantly increases selectivity in pre-miRNA ab initio prediction, which optimally contributes to advanced studies on miRNAs, as the need of biological validations is diminished. Hopefully, new research, such as studies of severe diseases caused by miRNA malfunction, will benefit from the proposed computational tool.

  10. Discrete least squares polynomial approximation with random evaluations - application to PDEs with Random parameters

    KAUST Repository

    Nobile, Fabio

    2015-01-07

    We consider a general problem F(u, y) = 0 where u is the unknown solution, possibly Hilbert space valued, and y a set of uncertain parameters. We specifically address the situation in which the parameterto-solution map u(y) is smooth, however y could be very high (or even infinite) dimensional. In particular, we are interested in cases in which F is a differential operator, u a Hilbert space valued function and y a distributed, space and/or time varying, random field. We aim at reconstructing the parameter-to-solution map u(y) from random noise-free or noisy observations in random points by discrete least squares on polynomial spaces. The noise-free case is relevant whenever the technique is used to construct metamodels, based on polynomial expansions, for the output of computer experiments. In the case of PDEs with random parameters, the metamodel is then used to approximate statistics of the output quantity. We discuss the stability of discrete least squares on random points show convergence estimates both in expectation and probability. We also present possible strategies to select, either a-priori or by adaptive algorithms, sequences of approximating polynomial spaces that allow to reduce, and in some cases break, the curse of dimensionality

  11. Fuzzy Random λ-Mean SAD Portfolio Selection Problem: An Ant Colony Optimization Approach

    Science.gov (United States)

    Thakur, Gour Sundar Mitra; Bhattacharyya, Rupak; Mitra, Swapan Kumar

    2010-10-01

    To reach the investment goal, one has to select a combination of securities among different portfolios containing large number of securities. Only the past records of each security do not guarantee the future return. As there are many uncertain factors which directly or indirectly influence the stock market and there are also some newer stock markets which do not have enough historical data, experts' expectation and experience must be combined with the past records to generate an effective portfolio selection model. In this paper the return of security is assumed to be Fuzzy Random Variable Set (FRVS), where returns are set of random numbers which are in turn fuzzy numbers. A new λ-Mean Semi Absolute Deviation (λ-MSAD) portfolio selection model is developed. The subjective opinions of the investors to the rate of returns of each security are taken into consideration by introducing a pessimistic-optimistic parameter vector λ. λ-Mean Semi Absolute Deviation (λ-MSAD) model is preferred as it follows absolute deviation of the rate of returns of a portfolio instead of the variance as the measure of the risk. As this model can be reduced to Linear Programming Problem (LPP) it can be solved much faster than quadratic programming problems. Ant Colony Optimization (ACO) is used for solving the portfolio selection problem. ACO is a paradigm for designing meta-heuristic algorithms for combinatorial optimization problem. Data from BSE is used for illustration.

  12. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

    KAUST Repository

    Chen, Peng

    2014-12-03

    Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

  13. Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq.

    Science.gov (United States)

    Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen; Joyce, Adam; Yang, Xiaoxiao; Zuo, Zheng; Havranek, James J; Stormo, Gary D; Jauch, Ralf

    2017-01-25

    Cooperative binding of transcription factors is known to be important in the regulation of gene expression programs conferring cellular identities. However, current methods to measure cooperativity parameters have been laborious and therefore limited to studying only a few sequence variants at a time. We developed Coop-seq (cooperativity by sequencing) that is capable of efficiently and accurately determining the cooperativity parameters for hundreds of different DNA sequences in a single experiment. We apply Coop-seq to 12 dimer pairs from the Sox and POU families of transcription factors using 324 unique sequences with changed half-site orientation, altered spacing and discrete randomization within the binding elements. The study reveals specific dimerization profiles of different Sox factors with Oct4. By contrast, Oct4 and the three neural class III POU factors Brn2, Brn4 and Oct6 assemble with Sox2 in a surprisingly indistinguishable manner. Two novel half-site configurations can support functional Sox/Oct dimerization in addition to known composite motifs. Moreover, Coop-seq uncovers a nucleotide switch within the POU half-site when spacing is altered, which is mirrored in genomic loci bound by Sox2/Oct4 complexes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955-2013.

    Directory of Open Access Journals (Sweden)

    Humam Saltaji

    Full Text Available To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time.We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics.Sequence generation was assessed to be inadequate (at unclear or high risk of bias in 68% (n = 367 of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%. Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154 and 40.5% (n = 219 of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427 of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95 of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%, while the method of blinding was appropriate in 53% (n = 286 of the trials. We identified a significant decrease over time (1955-2013 in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05 in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias.The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent

  15. Pediatric selective mutism therapy: a randomized controlled trial.

    Science.gov (United States)

    Esposito, Maria; Gimigliano, Francesca; Barillari, Maria R; Precenzano, Francesco; Ruberto, Maria; Sepe, Joseph; Barillari, Umberto; Gimigliano, Raffaele; Militerni, Roberto; Messina, Giovanni; Carotenuto, Marco

    2017-10-01

    Selective mutism (SM) is a rare disease in children coded by DSM-5 as an anxiety disorder. Despite the disabling nature of the disease, there is still no specific treatment. The aims of this study were to verify the efficacy of six-month standard psychomotor treatment and the positive changes in lifestyle, in a population of children affected by SM. Randomized controlled trial registered in the European Clinical Trials Registry (EuDract 2015-001161-36). University third level Centre (Child and Adolescent Neuropsychiatry Clinic). Study population was composed by 67 children in group A (psychomotricity treatment) (35 M, mean age 7.84±1.15) and 71 children in group B (behavioral and educational counseling) (37 M, mean age 7.75±1.36). Psychomotor treatment was administered by trained child therapists in residential settings three times per week. Each child was treated for the whole period by the same therapist and all the therapists shared the same protocol. The standard psychomotor session length is of 45 minutes. At T0 and after 6 months (T1) of treatments, patients underwent a behavioral and SM severity assessment. To verify the effects of the psychomotor management, the Child Behavior Checklist questionnaire (CBCL) and Selective Mutism Questionnaire (SMQ) were administered to the parents. After 6 months of psychomotor treatment SM children showed a significant reduction among CBCL scores such as in social relations, anxious/depressed, social problems and total problems (Pselective mutism, even if further studies are needed. The present study identifies in psychomotricity a safe and efficacy therapy for pediatric selective mutism.

  16. Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries.

    Science.gov (United States)

    Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao

    2005-01-01

    We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.

  17. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  18. The simultaneous use of several pseudo-random binary sequences in the identification of linear multivariable dynamic systems

    International Nuclear Information System (INIS)

    Cummins, J.D.

    1965-02-01

    With several white noise sources the various transmission paths of a linear multivariable system may be determined simultaneously. This memorandum considers the restrictions on pseudo-random two state sequences to effect simultaneous identification of several transmission paths and the consequential rejection of cross-coupled signals in linear multivariable systems. The conditions for simultaneous identification are established by an example, which shows that the integration time required is large i.e. tends to infinity, as it does when white noise sources are used. (author)

  19. Analysis of using interpulse intervals to generate 128-bit biometric random binary sequences for securing wireless body sensor networks.

    Science.gov (United States)

    Zhang, Guang-He; Poon, Carmen C Y; Zhang, Yuan-Ting

    2012-01-01

    Wireless body sensor network (WBSN), a key building block for m-Health, demands extremely stringent resource constraints and thus lightweight security methods are preferred. To minimize resource consumption, utilizing information already available to a WBSN, particularly common to different sensor nodes of a WBSN, for security purposes becomes an attractive solution. In this paper, we tested the randomness and distinctiveness of the 128-bit biometric binary sequences (BSs) generated from interpulse intervals (IPIs) of 20 healthy subjects as well as 30 patients suffered from myocardial infarction and 34 subjects with other cardiovascular diseases. The encoding time of a biometric BS on a WBSN node is on average 23 ms and memory occupation is 204 bytes for any given IPI sequence. The results from five U.S. National Institute of Standards and Technology statistical tests suggest that random biometric BSs can be generated from both healthy subjects and cardiovascular patients and can potentially be used as authentication identifiers for securing WBSNs. Ultimately, it is preferred that these biometric BSs can be used as encryption keys such that key distribution over the WBSN can be avoided.

  20. Cryptographic pseudo-random sequences from the chaotic Hénon ...

    Indian Academy of Sciences (India)

    dimensional discrete-time Hénon map is proposed. Properties of the proposed sequences pertaining to linear complexity, linear complexity profile, correlation and auto-correlation are investigated. All these properties of the sequences suggest a ...

  1. The administration sequence of propofol and remifentanil does not affect the ED50 and ED95 of rocuronium in rapid sequence induction of anesthesia: a double-blind randomized controlled trial.

    Science.gov (United States)

    Ozcelik, M; Guclu, C; Bermede, O; Baytas, V; Altay, N; Karahan, M A; Erdogan, B; Can, O

    2016-04-01

    The topic of drug administration sequence in rapid sequence induction (RSI) is still an object of interest in terms of rocuronium effectiveness. The aim of this prospective, randomized trial was to evaluate the effect of administration sequence of propofol and remifentanil on ED50 and ED95 of rocuronium in a RSI model. Eighty-four patients were randomized into Group Remifentanil (Group R, n = 43), where induction of general anesthesia started with remifentanil (2 µg/kg) and followed by propofol (2 mg/kg) and rocuronium administrations; and Group Propofol (Group P, n = 41), where induction of general anesthesia started with propofol and followed by remifentanil and rocuronium. First patients in each group were paralyzed by 0.8 mg/kg rocuronium. In case of acceptable intubation as evaluated according to the criteria described by Viby-Mogensen et al, rocuronium dose was decreased by 0.1 mg/kg for the next patient; otherwise, rocuronium dose was increased by 0.1 mg/kg. After three crossover points, increments or decrements in rocuronium dosage were set to 0.05 mg/kg. The process was repeated until a total of ten crossover points were obtained. The ED50 and ED95 doses of rocuronium were similar in Group R (0.182 mg/kg, and 0.244 mg/kg, respectively) and Group P (0.121 mg/kg, and 0.243 mg/kg, respectively) according to 95% CI of the estimates. There was no statistically significant difference in terms of clinically acceptable intubation conditions between the two groups (56.1% in Group R vs. 59% in Group P, p = 0.795). The choice of administration sequence of propofol and remifentanil does not have an impact on estimated ED50 and ED95 of rocuronium in providing acceptable intubation conditions in the RSI technique.

  2. Exploring pseudo- and chaotic random Monte Carlo simulations

    Science.gov (United States)

    Blais, J. A. Rod; Zhang, Zhan

    2011-07-01

    Computer simulations are an increasingly important area of geoscience research and development. At the core of stochastic or Monte Carlo simulations are the random number sequences that are assumed to be distributed with specific characteristics. Computer-generated random numbers, uniformly distributed on (0, 1), can be very different depending on the selection of pseudo-random number (PRN) or chaotic random number (CRN) generators. In the evaluation of some definite integrals, the resulting error variances can even be of different orders of magnitude. Furthermore, practical techniques for variance reduction such as importance sampling and stratified sampling can be applied in most Monte Carlo simulations and significantly improve the results. A comparative analysis of these strategies has been carried out for computational applications in planar and spatial contexts. Based on these experiments, and on some practical examples of geodetic direct and inverse problems, conclusions and recommendations concerning their performance and general applicability are included.

  3. Non-random alkylation of DNA sequences induced in vivo by chemical mutagens

    Energy Technology Data Exchange (ETDEWEB)

    Durante, M.; Geri, C.; Bonatti, S.; Parenti, R. (Universita di Pisa (Italy))

    1989-08-01

    Previous studies of the interaction of alkylating agents on the eukaryotic genome support the idea that induction of DNA adducts is at specific genomic sites. Here we show molecular and cytological evidence that alkylation is rather specific. Mammalian cell cultures were exposed to different doses of mutagens and the DNA was analyzed by density gradient ultracentrifugation, hydroxylapatite fractionation, and by restriction enzyme analysis. Studies with the labelled mutagens N-ethyl-N-nitrosourea and N-methyl-N'-nitro-N-nitrosoguanidine show that there is a non-random distribution of the adducts. The adducts are found more frequently in A-T, G-C rich satellite DNA and highly repetitive sequences. Analysis with restriction enzymes shows that both methyl and ethyl groups influence the restriction patterns of the enzymes HpaII and MspI that recognize specific endogenous DNA methylation. These data suggest, as a subsequent mechanism, a modification in the pattern of the normal endogenous methylation of 5-methylcytosine.

  4. Translational database selection and multiplexed sequence capture for up front filtering of reliable breast cancer biomarker candidates.

    Directory of Open Access Journals (Sweden)

    Patrik L Ståhl

    Full Text Available Biomarker identification is of utmost importance for the development of novel diagnostics and therapeutics. Here we make use of a translational database selection strategy, utilizing data from the Human Protein Atlas (HPA on differentially expressed protein patterns in healthy and breast cancer tissues as a means to filter out potential biomarkers for underlying genetic causatives of the disease. DNA was isolated from ten breast cancer biopsies, and the protein coding and flanking non-coding genomic regions corresponding to the selected proteins were extracted in a multiplexed format from the samples using a single DNA sequence capture array. Deep sequencing revealed an even enrichment of the multiplexed samples and a great variation of genetic alterations in the tumors of the sampled individuals. Benefiting from the upstream filtering method, the final set of biomarker candidates could be completely verified through bidirectional Sanger sequencing, revealing a 40 percent false positive rate despite high read coverage. Of the variants encountered in translated regions, nine novel non-synonymous variations were identified and verified, two of which were present in more than one of the ten tumor samples.

  5. Randomizing Roaches: Exploring the "Bugs" of Randomization in Experimental Design

    Science.gov (United States)

    Wagler, Amy; Wagler, Ron

    2014-01-01

    Understanding the roles of random selection and random assignment in experimental design is a central learning objective in most introductory statistics courses. This article describes an activity, appropriate for a high school or introductory statistics course, designed to teach the concepts, values and pitfalls of random selection and assignment…

  6. Genetic alterations of hepatocellular carcinoma by random amplified polymorphic DNA analysis and cloning sequencing of tumor differential DNA fragment

    Science.gov (United States)

    Xian, Zhi-Hong; Cong, Wen-Ming; Zhang, Shu-Hui; Wu, Meng-Chao

    2005-01-01

    AIM: To study the genetic alterations and their association with clinicopathological characteristics of hepatocellular carcinoma (HCC), and to find the tumor related DNA fragments. METHODS: DNA isolated from tumors and corresponding noncancerous liver tissues of 56 HCC patients was amplified by random amplified polymorphic DNA (RAPD) with 10 random 10-mer arbitrary primers. The RAPD bands showing obvious differences in tumor tissue DNA corresponding to that of normal tissue were separated, purified, cloned and sequenced. DNA sequences were analyzed and compared with GenBank data. RESULTS: A total of 56 cases of HCC were demonstrated to have genetic alterations, which were detected by at least one primer. The detestability of genetic alterations ranged from 20% to 70% in each case, and 17.9% to 50% in each primer. Serum HBV infection, tumor size, histological grade, tumor capsule, as well as tumor intrahepatic metastasis, might be correlated with genetic alterations on certain primers. A band with a higher intensity of 480 bp or so amplified fragments in tumor DNA relative to normal DNA could be seen in 27 of 56 tumor samples using primer 4. Sequence analysis of these fragments showed 91% homology with Homo sapiens double homeobox protein DUX10 gene. CONCLUSION: Genetic alterations are a frequent event in HCC, and tumor related DNA fragments have been found in this study, which may be associated with hepatocarcin-ogenesis. RAPD is an effective method for the identification and analysis of genetic alterations in HCC, and may provide new information for further evaluating the molecular mechanism of hepatocarcinogenesis. PMID:15996039

  7. Selection of aptamers for Candida albicans by cell-SELEX

    International Nuclear Information System (INIS)

    Miranda, Alessandra Nunes Duarte

    2017-01-01

    The growing concern with invasive fungal infections, responsible for an alarming mortality rate of immunosuppressed patients and in Intensive Care Units, evidences the need for a fast and specific method for the Candida albicans detection, since this species is identified as one of the main causes of septicemia. Commonly, it is a challenge for clinicians to determine the primary infection foci, the dissemination degree, or whether the site of a particular surgery is involved. Although scintigraphic imaging represents a promising tool for infectious foci detection, it still lacks a methodology for C. albicans diagnosis due to the absence of specific radiotracers for this microorganism. Aptamers are molecules that have almost ideal properties for use as diagnostic radiopharmaceuticals, such as high specificity for their molecular targets, lack of immunogenicity and toxicity, high tissue penetration and rapid blood clearance. Aptamers can also be labeled with different radionuclides. This work aims to obtain aptamers for specific binding to C. albicans cells for future application as a radiopharmaceutical. It was used a variation of the SELEX (Systematic Evolution of Ligands by EXponential Enrichment) technique, termed cell-SELEX, in which cells are the targets for selection. A selection protocol was standardized using a random library of single-stranded oligonucleotides, each containing two fixed regions flanking a sequence of 40 random nucleotides. This library was incubated with C. albicans cells in the presence of competitors. Then, the binding sequences were separated by centrifugation, resuspended and amplified by PCR. The amplification was confirmed by agarose gel electrophoresis. After that, the ligands were purified to obtain a new pool of ssDNA, from which a new incubation was carried out. The selection parameters were gradually modified in order to increase stringency. This cycle was repeated 12 times to allow the selection of sequences with the maximum

  8. Materials selection for oxide-based resistive random access memories

    International Nuclear Information System (INIS)

    Guo, Yuzheng; Robertson, John

    2014-01-01

    The energies of atomic processes in resistive random access memories (RRAMs) are calculated for four typical oxides, HfO 2 , TiO 2 , Ta 2 O 5 , and Al 2 O 3 , to define a materials selection process. O vacancies have the lowest defect formation energy in the O-poor limit and dominate the processes. A band diagram defines the operating Fermi energy and O chemical potential range. It is shown how the scavenger metal can be used to vary the O vacancy formation energy, via controlling the O chemical potential, and the mean Fermi energy. The high endurance of Ta 2 O 5 RRAM is related to its more stable amorphous phase and the adaptive lattice rearrangements of its O vacancy

  9. Selective enrichment and sequencing of whole mitochondrial genomes in the presence of nuclear encoded mitochondrial pseudogenes (numts.

    Directory of Open Access Journals (Sweden)

    Jonci N Wolff

    Full Text Available Numts are an integral component of many eukaryote genomes offering a snapshot of the evolutionary process that led from the incorporation of an α-proteobacterium into a larger eukaryotic cell some 1.8 billion years ago. Although numt sequence can be harnessed as molecular marker, these sequences often remain unidentified and are mistaken for genuine mtDNA leading to erroneous interpretation of mtDNA data sets. It is therefore indispensable that during the process of amplifying and sequencing mitochondrial genes, preventive measures are taken to ensure the exclusion of numts to guarantee the recovery of genuine mtDNA. This applies to mtDNA analyses in general but especially to studies where mtDNAs are sequenced de novo as the launch pad for subsequent mtDNA-based research. By using a combination of dilution series and nested rolling circle amplification (RCA, we present a novel strategy to selectively amplify mtDNA and exclude the amplification of numt sequence. We have successfully applied this strategy to de novo sequence the mtDNA of the Black Field Cricket Teleogryllus commodus, a species known to contain numts. Aligning our assembled sequence to the reference genome of Teleogryllus emma (GenBank EU557269.1 led to the identification of a numt sequence in the reference sequence. This unexpected result further highlights the need of a reliable and accessible strategy to eliminate this source of error.

  10. Emergence of multilevel selection in the prisoner's dilemma game on coevolving random networks

    International Nuclear Information System (INIS)

    Szolnoki, Attila; Perc, Matjaz

    2009-01-01

    We study the evolution of cooperation in the prisoner's dilemma game, whereby a coevolutionary rule is introduced that molds the random topology of the interaction network in two ways. First, existing links are deleted whenever a player adopts a new strategy or its degree exceeds a threshold value; second, new links are added randomly after a given number of game iterations. These coevolutionary processes correspond to the generic formation of new links and deletion of existing links that, especially in human societies, appear frequently as a consequence of ongoing socialization, change of lifestyle or death. Due to the counteraction of deletions and additions of links the initial heterogeneity of the interaction network is qualitatively preserved, and thus cannot be held responsible for the observed promotion of cooperation. Indeed, the coevolutionary rule evokes the spontaneous emergence of a powerful multilevel selection mechanism, which despite the sustained random topology of the evolving network, maintains cooperation across the whole span of defection temptation values.

  11. Random-effects linear modeling and sample size tables for two special crossover designs of average bioequivalence studies: the four-period, two-sequence, two-formulation and six-period, three-sequence, three-formulation designs.

    Science.gov (United States)

    Diaz, Francisco J; Berg, Michel J; Krebill, Ron; Welty, Timothy; Gidal, Barry E; Alloway, Rita; Privitera, Michael

    2013-12-01

    Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

  12. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    Science.gov (United States)

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  13. Optimization of the Dutch Matrix Test by Random Selection of Sentences From a Preselected Subset

    Directory of Open Access Journals (Sweden)

    Rolph Houben

    2015-04-01

    Full Text Available Matrix tests are available for speech recognition testing in many languages. For an accurate measurement, a steep psychometric function of the speech materials is required. For existing tests, it would be beneficial if it were possible to further optimize the available materials by increasing the function’s steepness. The objective is to show if the steepness of the psychometric function of an existing matrix test can be increased by selecting a homogeneous subset of recordings with the steepest sentence-based psychometric functions. We took data from a previous multicenter evaluation of the Dutch matrix test (45 normal-hearing listeners. Based on half of the data set, first the sentences (140 out of 311 with a similar speech reception threshold and with the steepest psychometric function (≥9.7%/dB were selected. Subsequently, the steepness of the psychometric function for this selection was calculated from the remaining (unused second half of the data set. The calculation showed that the slope increased from 10.2%/dB to 13.7%/dB. The resulting subset did not allow the construction of enough balanced test lists. Therefore, the measurement procedure was changed to randomly select the sentences during testing. Random selection may interfere with a representative occurrence of phonemes. However, in our material, the median phonemic occurrence remained close to that of the original test. This finding indicates that phonemic occurrence is not a critical factor. The work highlights the possibility that existing speech tests might be improved by selecting sentences with a steep psychometric function.

  14. Autonomous Byte Stream Randomizer

    Science.gov (United States)

    Paloulian, George K.; Woo, Simon S.; Chow, Edward T.

    2013-01-01

    Net-centric networking environments are often faced with limited resources and must utilize bandwidth as efficiently as possible. In networking environments that span wide areas, the data transmission has to be efficient without any redundant or exuberant metadata. The Autonomous Byte Stream Randomizer software provides an extra level of security on top of existing data encryption methods. Randomizing the data s byte stream adds an extra layer to existing data protection methods, thus making it harder for an attacker to decrypt protected data. Based on a generated crypto-graphically secure random seed, a random sequence of numbers is used to intelligently and efficiently swap the organization of bytes in data using the unbiased and memory-efficient in-place Fisher-Yates shuffle method. Swapping bytes and reorganizing the crucial structure of the byte data renders the data file unreadable and leaves the data in a deconstructed state. This deconstruction adds an extra level of security requiring the byte stream to be reconstructed with the random seed in order to be readable. Once the data byte stream has been randomized, the software enables the data to be distributed to N nodes in an environment. Each piece of the data in randomized and distributed form is a separate entity unreadable on its own right, but when combined with all N pieces, is able to be reconstructed back to one. Reconstruction requires possession of the key used for randomizing the bytes, leading to the generation of the same cryptographically secure random sequence of numbers used to randomize the data. This software is a cornerstone capability possessing the ability to generate the same cryptographically secure sequence on different machines and time intervals, thus allowing this software to be used more heavily in net-centric environments where data transfer bandwidth is limited.

  15. Frequency characteristic measurement of a fiber optic gyroscope using a correlation spectrum analysis method based on a pseudo-random sequence

    International Nuclear Information System (INIS)

    Li, Yang; Chen, Xingfan; Liu, Cheng

    2015-01-01

    The frequency characteristic is an important indicator of a system’s dynamic performance. The identification of a fiber optic gyroscope (FOG)’s frequency characteristic using a correlation spectrum analysis method based on a pseudo-random sequence is proposed. Taking the angle vibrator as the source of the test rotation stimulation and a pseudo-random sequence as the test signal, the frequency characteristic of a FOG is calculated according to the power spectral density of the rotation rate signal and the cross-power spectral density of the FOG’s output signal and rotation rate signal. A theoretical simulation is done to confirm the validity of this method. An experiment system is built and the test results indicate that the measurement error of the normalized amplitude–frequency response is less than 0.01, that the error of the phase–frequency response is less than 0.3 rad, and the overall measurement accuracy is superior to the traditional frequency-sweep method. By using this method, the FOG’s amplitude–frequency response and phase–frequency response can be measured simultaneously, quickly, accurately, and with a high frequency resolution. The described method meets the requirements of engineering applications. (paper)

  16. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  17. Random drift versus selection in academic vocabulary: an evolutionary analysis of published keywords.

    Science.gov (United States)

    Bentley, R Alexander

    2008-08-27

    The evolution of vocabulary in academic publishing is characterized via keyword frequencies recorded in the ISI Web of Science citations database. In four distinct case-studies, evolutionary analysis of keyword frequency change through time is compared to a model of random copying used as the null hypothesis, such that selection may be identified against it. The case studies from the physical sciences indicate greater selection in keyword choice than in the social sciences. Similar evolutionary analyses can be applied to a wide range of phenomena; wherever the popularity of multiple items through time has been recorded, as with web searches, or sales of popular music and books, for example.

  18. Selection of optimal pulse sequences for conventional and dynamic MR imaging with Gd-DTPA; A fundamental study

    Energy Technology Data Exchange (ETDEWEB)

    Maeda, Miho; Kita, Keisuke; Maeda, Masayuki (Wakayama Medical Coll. (Japan)) (and others)

    1989-11-01

    Gadolinium-DTPA (Gd-DTPA) enhances contrast between tissues in magnetic resonance (MR) imaging. The enhancement of tissues depends partly upon the pulse sequences, and the optimal pulse sequence is also influenced by the tissue cncentration of Gd-DTPA. We prepared phantoms of 25% albumin solutions with various concentrations of Gd-DTPA, and imaged them using various pulse sequences with 1.5-T MR system. We also performed MR imaging of 16 patients with tumors (10 brain tumors and 6 hepatic tumors) before and after intravenous administration of Gd-DTPA (0.1 mmol/kg); 6 patients with hepatic tumors underwent dynamic MR imaging during suspended respiration. We made a theoretical equation to calculate the concentration of Gd-DTPA and estimated its tissue concentration in tumors at 0{approx}0.2 mmol/kg. Within these tissue concentrations, the enhancement-to-noise (E/N) ratio was larger in FISP (flip angle of 90deg, TR pf 300 msec, minimal TE) and SE (TR of 400 msec, minimal TE) sequences than in other sequences observed. These sequences may be preferable for conventional enhanced-MRI. Among the pulse sequences with TR of less than 100 msec, FISP (flip angle of 90deg, TR of less than 100 msec, minimal TE) had the largest E/N ratio; which may be useful for dynamic MRI during suspended respiration. The importance of selecting the optimal pulse sequences according to the imaging modality used will be discussed. (author).

  19. Materials selection for oxide-based resistive random access memories

    Energy Technology Data Exchange (ETDEWEB)

    Guo, Yuzheng; Robertson, John [Engineering Department, Cambridge University, Cambridge CB2 1PZ (United Kingdom)

    2014-12-01

    The energies of atomic processes in resistive random access memories (RRAMs) are calculated for four typical oxides, HfO{sub 2}, TiO{sub 2}, Ta{sub 2}O{sub 5}, and Al{sub 2}O{sub 3}, to define a materials selection process. O vacancies have the lowest defect formation energy in the O-poor limit and dominate the processes. A band diagram defines the operating Fermi energy and O chemical potential range. It is shown how the scavenger metal can be used to vary the O vacancy formation energy, via controlling the O chemical potential, and the mean Fermi energy. The high endurance of Ta{sub 2}O{sub 5} RRAM is related to its more stable amorphous phase and the adaptive lattice rearrangements of its O vacancy.

  20. The sequence of spacers between the consensus sequences modulates the strength of procaryotic promoters

    DEFF Research Database (Denmark)

    Jensen, Peter Ruhdal; Hammer, Karin

    1998-01-01

    A library of synthetic promoters for Lactococcus lactis was constructed, in which the known consensus sequences were kept constant while the sequences of the separating spacers were randomized. The library consists of 38 promoters which differ in strength from 0.3 relative units, and up to more t......-reactors and cell factories....

  1. ERP Indices of Stimulus Prediction in Letter Sequences

    Directory of Open Access Journals (Sweden)

    Edith Kaan

    2014-10-01

    Full Text Available Given the current focus on anticipation in perception, action and cognition, including language processing, there is a need for a method to tap into predictive processing in situations in which cue and feedback stimuli are not explicitly marked as such. To this aim, event related potentials (ERPs were obtained while participants viewed alphabetic letter sequences (“A”, “B”, “C”, “D”, “E”, …, in which the letters were highly predictable, and random sequences (“S”, “B”, “A”, “I”, “F”, “M”, …, without feedback. Occasionally, the presentation of a letter in a sequence was delayed by 300 ms. During this delay period, an increased negativity was observed for predictive versus random sequences. In addition, the early positivity following the delay was larger for predictive compared with random sequences. These results suggest that expectation-sensitive ERP modulations can be elicited in anticipation of stimuli that are not explicit targets, rewards, feedback or instructions, and that a delay can strengthen the prediction for a particular stimulus. Applications to language processing will be discussed.

  2. A power set-based statistical selection procedure to locate susceptible rare variants associated with complex traits with sequencing data.

    Science.gov (United States)

    Sun, Hokeun; Wang, Shuang

    2014-08-15

    Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. 40 CFR 761.306 - Sampling 1 meter square surfaces by random selection of halves.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 30 2010-07-01 2010-07-01 false Sampling 1 meter square surfaces by...(b)(3) § 761.306 Sampling 1 meter square surfaces by random selection of halves. (a) Divide each 1 meter square portion where it is necessary to collect a surface wipe test sample into two equal (or as...

  4. From Protocols to Publications: A Study in Selective Reporting of Outcomes in Randomized Trials in Oncology

    Science.gov (United States)

    Raghav, Kanwal Pratap Singh; Mahajan, Sminil; Yao, James C.; Hobbs, Brian P.; Berry, Donald A.; Pentz, Rebecca D.; Tam, Alda; Hong, Waun K.; Ellis, Lee M.; Abbruzzese, James; Overman, Michael J.

    2015-01-01

    Purpose The decision by journals to append protocols to published reports of randomized trials was a landmark event in clinical trial reporting. However, limited information is available on how this initiative effected transparency and selective reporting of clinical trial data. Methods We analyzed 74 oncology-based randomized trials published in Journal of Clinical Oncology, the New England Journal of Medicine, and The Lancet in 2012. To ascertain integrity of reporting, we compared published reports with their respective appended protocols with regard to primary end points, nonprimary end points, unplanned end points, and unplanned analyses. Results A total of 86 primary end points were reported in 74 randomized trials; nine trials had greater than one primary end point. Nine trials (12.2%) had some discrepancy between their planned and published primary end points. A total of 579 nonprimary end points (median, seven per trial) were planned, of which 373 (64.4%; median, five per trial) were reported. A significant positive correlation was found between the number of planned and nonreported nonprimary end points (Spearman r = 0.66; P < .001). Twenty-eight studies (37.8%) reported a total of 65 unplanned end points; 52 (80.0%) of which were not identified as unplanned. Thirty-one (41.9%) and 19 (25.7%) of 74 trials reported a total of 52 unplanned analyses involving primary end points and 33 unplanned analyses involving nonprimary end points, respectively. Studies reported positive unplanned end points and unplanned analyses more frequently than negative outcomes in abstracts (unplanned end points odds ratio, 6.8; P = .002; unplanned analyses odd ratio, 8.4; P = .007). Conclusion Despite public and reviewer access to protocols, selective outcome reporting persists and is a major concern in the reporting of randomized clinical trials. To foster credible evidence-based medicine, additional initiatives are needed to minimize selective reporting. PMID:26304898

  5. Joint random beam and spectrum selection for spectrum sharing systems with partial channel state information

    KAUST Repository

    Abdallah, Mohamed M.

    2013-11-01

    In this work, we develop joint interference-aware random beam and spectrum selection scheme that provide enhanced performance for the secondary network under the condition that the interference observed at the primary receiver is below a predetermined acceptable value. We consider a secondary link composed of a transmitter equipped with multiple antennas and a single-antenna receiver sharing the same spectrum with a set of primary links composed of a single-antenna transmitter and a single-antenna receiver. The proposed schemes jointly select a beam, among a set of power-optimized random beams, as well as the primary spectrum that maximizes the signal-to-interference-plus-noise ratio (SINR) of the secondary link while satisfying the primary interference constraint. In particular, we consider the case where the interference level is described by a q-bit description of its magnitude, whereby we propose a technique to find the optimal quantizer thresholds in a mean square error (MSE) sense. © 2013 IEEE.

  6. Joint random beam and spectrum selection for spectrum sharing systems with partial channel state information

    KAUST Repository

    Abdallah, Mohamed M.; Sayed, Mostafa M.; Alouini, Mohamed-Slim; Qaraqe, Khalid A.

    2013-01-01

    In this work, we develop joint interference-aware random beam and spectrum selection scheme that provide enhanced performance for the secondary network under the condition that the interference observed at the primary receiver is below a predetermined acceptable value. We consider a secondary link composed of a transmitter equipped with multiple antennas and a single-antenna receiver sharing the same spectrum with a set of primary links composed of a single-antenna transmitter and a single-antenna receiver. The proposed schemes jointly select a beam, among a set of power-optimized random beams, as well as the primary spectrum that maximizes the signal-to-interference-plus-noise ratio (SINR) of the secondary link while satisfying the primary interference constraint. In particular, we consider the case where the interference level is described by a q-bit description of its magnitude, whereby we propose a technique to find the optimal quantizer thresholds in a mean square error (MSE) sense. © 2013 IEEE.

  7. Analysis and applications of a frequency selective surface via a random distribution method

    International Nuclear Information System (INIS)

    Xie Shao-Yi; Huang Jing-Jian; Yuan Nai-Chang; Liu Li-Guo

    2014-01-01

    A novel frequency selective surface (FSS) for reducing radar cross section (RCS) is proposed in this paper. This FSS is based on the random distribution method, so it can be called random surface. In this paper, the stacked patches serving as periodic elements are employed for RCS reduction. Previous work has demonstrated the efficiency by utilizing the microstrip patches, especially for the reflectarray. First, the relevant theory of the method is described. Then a sample of a three-layer variable-sized stacked patch random surface with a dimension of 260 mm×260 mm is simulated, fabricated, and measured in order to demonstrate the validity of the proposed design. For the normal incidence, the 8-dB RCS reduction can be achieved both by the simulation and the measurement in 8 GHz–13 GHz. The oblique incidence of 30° is also investigated, in which the 7-dB RCS reduction can be obtained in a frequency range of 8 GHz–14 GHz. (condensed matter: electronic structure, electrical, magnetic, and optical properties)

  8. Random drift versus selection in academic vocabulary: an evolutionary analysis of published keywords.

    Directory of Open Access Journals (Sweden)

    R Alexander Bentley

    Full Text Available The evolution of vocabulary in academic publishing is characterized via keyword frequencies recorded in the ISI Web of Science citations database. In four distinct case-studies, evolutionary analysis of keyword frequency change through time is compared to a model of random copying used as the null hypothesis, such that selection may be identified against it. The case studies from the physical sciences indicate greater selection in keyword choice than in the social sciences. Similar evolutionary analyses can be applied to a wide range of phenomena; wherever the popularity of multiple items through time has been recorded, as with web searches, or sales of popular music and books, for example.

  9. Randomness at the root of things 1: Random walks

    Science.gov (United States)

    Ogborn, Jon; Collins, Simon; Brown, Mick

    2003-09-01

    This is the first of a pair of articles about randomness in physics. In this article, we use some variations on the idea of a `random walk' to consider first the path of a particle in Brownian motion, and then the random variation to be expected in radioactive decay. The arguments are set in the context of the general importance of randomness both in physics and in everyday life. We think that the ideas could usefully form part of students' A-level work on random decay and quantum phenomena, as well as being good for their general education. In the second article we offer a novel and simple approach to Poisson sequences.

  10. Theoretical and empirical convergence results for additive congruential random number generators

    Science.gov (United States)

    Wikramaratna, Roy S.

    2010-03-01

    Additive Congruential Random Number (ACORN) generators represent an approach to generating uniformly distributed pseudo-random numbers that is straightforward to implement efficiently for arbitrarily large order and modulus; if it is implemented using integer arithmetic, it becomes possible to generate identical sequences on any machine. This paper briefly reviews existing results concerning ACORN generators and relevant theory concerning sequences that are well distributed mod 1 in k dimensions. It then demonstrates some new theoretical results for ACORN generators implemented in integer arithmetic with modulus M=2[mu] showing that they are a family of generators that converge (in a sense that is defined in the paper) to being well distributed mod 1 in k dimensions, as [mu]=log2M tends to infinity. By increasing k, it is possible to increase without limit the number of dimensions in which the resulting sequences approximate to well distributed. The paper concludes by applying the standard TestU01 test suite to ACORN generators for selected values of the modulus (between 260 and 2150), the order (between 4 and 30) and various odd seed values. On the basis of these and earlier results, it is recommended that an order of at least 9 be used together with an odd seed and modulus equal to 230p, for a small integer value of p. While a choice of p=2 should be adequate for most typical applications, increasing p to 3 or 4 gives a sequence that will consistently pass all the tests in the TestU01 test suite, giving additional confidence in more demanding applications. The results demonstrate that the ACORN generators are a reliable source of uniformly distributed pseudo-random numbers, and that in practice (as suggested by the theoretical convergence results) the quality of the ACORN sequences increases with increasing modulus and order.

  11. Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

    Science.gov (United States)

    Deroost, Natacha; Coomans, Daphné

    2018-02-01

    We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Comparison of single-use and reusable metal laryngoscope blades for orotracheal intubation during rapid sequence induction of anesthesia: a multicenter cluster randomized study.

    Science.gov (United States)

    Amour, Julien; Le Manach, Yannick Le; Borel, Marie; Lenfant, François; Nicolas-Robin, Armelle; Carillion, Aude; Ripart, Jacques; Riou, Bruno; Langeron, Olivier

    2010-02-01

    Single-use metal laryngoscope blades are cheaper and carry a lower risk of infection than reusable metal blades. The authors compared single-use and reusable metal blades during rapid sequence induction of anesthesia in a multicenter cluster randomized trial. One thousand seventy-two adult patients undergoing general anesthesia under emergency conditions and requiring rapid sequence induction were randomly assigned on a weekly basis to either single-use or reusable metal blades (cluster randomization). After induction, a 60-s period was allowed to complete intubation. In the case of failed intubation, a second attempt was performed using the opposite type of blade. The primary endpoint was the rate of failed intubation, and the secondary endpoints were the incidence of complications (oxygen desaturation, lung aspiration, and/or oropharynx trauma) and the Cormack and Lehane score. Both groups were similar in their main characteristics, including the risk factors for difficult intubation. The rate of failed intubation was significantly decreased with single-use metal blades at the first attempt compared with reusable blades (2.8 vs. 5.4%, P < 0.05). In addition, the proportion of grades III and IV in Cormack and Lehane score were also significantly decreased with single-use metal blades (6 vs. 10%, P < 0.05). The global complication rate did not reach statistical significance, although the same trend was noted (6.8% vs. 11.5%, P = not significant). An investigator survey and a measure of illumination pointed that illumination might have been responsible for this result. The single-use metal blade was more efficient than a reusable metal blade in rapid sequence induction of anesthesia.

  13. From Protocols to Publications: A Study in Selective Reporting of Outcomes in Randomized Trials in Oncology.

    Science.gov (United States)

    Raghav, Kanwal Pratap Singh; Mahajan, Sminil; Yao, James C; Hobbs, Brian P; Berry, Donald A; Pentz, Rebecca D; Tam, Alda; Hong, Waun K; Ellis, Lee M; Abbruzzese, James; Overman, Michael J

    2015-11-01

    The decision by journals to append protocols to published reports of randomized trials was a landmark event in clinical trial reporting. However, limited information is available on how this initiative effected transparency and selective reporting of clinical trial data. We analyzed 74 oncology-based randomized trials published in Journal of Clinical Oncology, the New England Journal of Medicine, and The Lancet in 2012. To ascertain integrity of reporting, we compared published reports with their respective appended protocols with regard to primary end points, nonprimary end points, unplanned end points, and unplanned analyses. A total of 86 primary end points were reported in 74 randomized trials; nine trials had greater than one primary end point. Nine trials (12.2%) had some discrepancy between their planned and published primary end points. A total of 579 nonprimary end points (median, seven per trial) were planned, of which 373 (64.4%; median, five per trial) were reported. A significant positive correlation was found between the number of planned and nonreported nonprimary end points (Spearman r = 0.66; P medicine, additional initiatives are needed to minimize selective reporting. © 2015 by American Society of Clinical Oncology.

  14. Subjective randomness as statistical inference.

    Science.gov (United States)

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Why the null matters: statistical tests, random walks and evolution.

    Science.gov (United States)

    Sheets, H D; Mitchell, C E

    2001-01-01

    A number of statistical tests have been developed to determine what type of dynamics underlie observed changes in morphology in evolutionary time series, based on the pattern of change within the time series. The theory of the 'scaled maximum', the 'log-rate-interval' (LRI) method, and the Hurst exponent all operate on the same principle of comparing the maximum change, or rate of change, in the observed dataset to the maximum change expected of a random walk. Less change in a dataset than expected of a random walk has been interpreted as indicating stabilizing selection, while more change implies directional selection. The 'runs test' in contrast, operates on the sequencing of steps, rather than on excursion. Applications of these tests to computer generated, simulated time series of known dynamical form and various levels of additive noise indicate that there is a fundamental asymmetry in the rate of type II errors of the tests based on excursion: they are all highly sensitive to noise in models of directional selection that result in a linear trend within a time series, but are largely noise immune in the case of a simple model of stabilizing selection. Additionally, the LRI method has a lower sensitivity than originally claimed, due to the large range of LRI rates produced by random walks. Examination of the published results of these tests show that they have seldom produced a conclusion that an observed evolutionary time series was due to directional selection, a result which needs closer examination in light of the asymmetric response of these tests.

  16. Effect of ionic strength and cationic DNA affinity binders on the DNA sequence selective alkylation of guanine N7-positions by nitrogen mustards

    International Nuclear Information System (INIS)

    Hartley, J.A.; Forrow, S.M.; Souhami, R.L.

    1990-01-01

    Large variations in alkylation intensities exist among guanines in a DNA sequence following treatment with chemotherapeutic alkylating agents such as nitrogen mustards, and the substituent attached to the reactive group can impose a distinct sequence preference for reaction. In order to understand further the structural and electrostatic factors which determine the sequence selectivity of alkylation reactions, the effect of increase ionic strength, the intercalator ethidium bromide, AT-specific minor groove binders distamycin A and netropsin, and the polyamine spermine on guanine N7-alkylation by L-phenylalanine mustard (L-Pam), uracil mustard (UM), and quinacrine mustard (QM) was investigated with a modification of the guanine-specific chemical cleavage technique for DNA sequencing. The result differed with both the nitrogen mustard and the cationic agent used. The effect, which resulted in both enhancement and suppression of alkylation sites, was most striking in the case of netropsin and distamycin A, which differed from each other. DNA footprinting indicated that selective binding to AT sequences in the minor groove of DNA can have long-range effects on the alkylation pattern of DNA in the major groove

  17. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  18. High Entropy Random Selection Protocols

    NARCIS (Netherlands)

    H. Buhrman (Harry); M. Christandl (Matthias); M. Koucky (Michal); Z. Lotker (Zvi); B. Patt-Shamir; M. Charikar; K. Jansen; O. Reingold; J. Rolim

    2007-01-01

    textabstractIn this paper, we construct protocols for two parties that do not trust each other, to generate random variables with high Shannon entropy. We improve known bounds for the trade off between the number of rounds, length of communication and the entropy of the outcome.

  19. SAAS: Short Amino Acid Sequence - A Promising Protein Secondary Structure Prediction Method of Single Sequence

    Directory of Open Access Journals (Sweden)

    Zhou Yuan Wu

    2013-07-01

    Full Text Available In statistical methods of predicting protein secondary structure, many researchers focus on single amino acid frequencies in α-helices, β-sheets, and so on, or the impact near amino acids on an amino acid forming a secondary structure. But the paper considers a short sequence of amino acids (3, 4, 5 or 6 amino acids as integer, and statistics short sequence's probability forming secondary structure. Also, many researchers select low homologous sequences as statistical database. But this paper select whole PDB database. In this paper we propose a strategy to predict protein secondary structure using simple statistical method. Numerical computation shows that, short amino acids sequence as integer to statistics, which can easy see trend of short sequence forming secondary structure, and it will work well to select large statistical database (whole PDB database without considering homologous, and Q3 accuracy is ca. 74% using this paper proposed simple statistical method, but accuracy of others statistical methods is less than 70%.

  20. Detecting consistent patterns of directional adaptation using differential selection codon models.

    Science.gov (United States)

    Parto, Sahar; Lartillot, Nicolas

    2017-06-23

    Phylogenetic codon models are often used to characterize the selective regimes acting on protein-coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by modeling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions. Here, we introduce a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles. Our differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.

  1. Integrated Behavior Therapy for Selective Mutism: a randomized controlled pilot study.

    Science.gov (United States)

    Bergman, R Lindsey; Gonzalez, Araceli; Piacentini, John; Keller, Melody L

    2013-10-01

    To evaluate the feasibility, acceptability, and preliminary efficacy of a novel behavioral intervention for reducing symptoms of selective mutism and increasing functional speech. A total of 21 children ages 4 to 8 with primary selective mutism were randomized to 24 weeks of Integrated Behavior Therapy for Selective Mutism (IBTSM) or a 12-week Waitlist control. Clinical outcomes were assessed using blind independent evaluators, parent-, and teacher-report, and an objective behavioral measure. Treatment recipients completed a three-month follow-up to assess durability of treatment gains. Data indicated increased functional speaking behavior post-treatment as rated by parents and teachers, with a high rate of treatment responders as rated by blind independent evaluators (75%). Conversely, children in the Waitlist comparison group did not experience significant improvements in speaking behaviors. Children who received IBTSM also demonstrated significant improvements in number of words spoken at school compared to baseline, however, significant group differences did not emerge. Treatment recipients also experienced significant reductions in social anxiety per parent, but not teacher, report. Clinical gains were maintained over 3 month follow-up. IBTSM appears to be a promising new intervention that is efficacious in increasing functional speaking behaviors, feasible, and acceptable to parents and teachers. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Random walks in Euclidean space

    OpenAIRE

    Varjú, Péter Pál

    2012-01-01

    Consider a sequence of independent random isometries of Euclidean space with a previously fixed probability law. Apply these isometries successively to the origin and consider the sequence of random points that we obtain this way. We prove a local limit theorem under a suitable moment condition and a necessary non-degeneracy condition. Under stronger hypothesis, we prove a limit theorem on a wide range of scales: between e^(-cl^(1/4)) and l^(1/2), where l is the number of steps.

  3. Field-based random sampling without a sampling frame: control selection for a case-control study in rural Africa.

    Science.gov (United States)

    Crampin, A C; Mwinuka, V; Malema, S S; Glynn, J R; Fine, P E

    2001-01-01

    Selection bias, particularly of controls, is common in case-control studies and may materially affect the results. Methods of control selection should be tailored both for the risk factors and disease under investigation and for the population being studied. We present here a control selection method devised for a case-control study of tuberculosis in rural Africa (Karonga, northern Malawi) that selects an age/sex frequency-matched random sample of the population, with a geographical distribution in proportion to the population density. We also present an audit of the selection process, and discuss the potential of this method in other settings.

  4. Complete Genome Sequence of the Probiotic Lactic Acid Bacterium Lactobacillus Rhamnosus

    Directory of Open Access Journals (Sweden)

    Samat Kozhakhmetov

    2014-01-01

    Full Text Available Introduction: Lactobacilli are a bacteria commonly found in the gastrointestinal tract. Some species of this genus have probiotic properties. The most common of these is Lactobacillus rhamnosus, a microoganism, generally regarded as safe (GRAS. It is also a homofermentative L-(+-lactic acid producer. The genus Lactobacillus is characterized by an extraordinary degree of the phenotypic and genotypic diversity. However, the studies of the genus were conducted mostly with the unequally distributed, non-random choice of species for sequencing; thus, there is only one representative genome from the Lactobacillus rhamnosus clade available to date. The aim of this study was to characterize the genome sequencing of selected strains of Lactobacilli. Methods: 109 samples were isolated from national domestic dairy products in the laboratory of Center for life sciences. After screaning isolates for probiotic properties, a highly active Lactobacillus spp strain was chosen. Genomic DNA was extracted according to the manufacturing protocol (Wizard® Genomic DNA Purification Kit. The Lactobacillus rhamnosus strain was identified as the highly active Lactobacillus strain accoridng to its morphological, cultural, physiological, and biochemical properties, and a genotypic analysis. Results: The genome of Lactobacillus rhamnosus was sequenced using the Roche 454 GS FLX (454 GS FLX platforms. The initial draft assembly was prepared from 14 large contigs (20 all contigs by the Newbler gsAssembler 2.3 (454 Life Sciences, Branford, CT. Conclusion: A full genome-sequencing of selected strains of lactic acid bacteria was made during the study.

  5. Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves.

    Science.gov (United States)

    Ma, Yunbei; Zhou, Xiao-Hua

    2017-02-01

    For time-to-event data in a randomized clinical trial, we proposed two new methods for selecting an optimal treatment for a patient based on the covariate-specific treatment effect curve, which is used to represent the clinical utility of a predictive biomarker. To select an optimal treatment for a patient with a specific biomarker value, we proposed pointwise confidence intervals for each covariate-specific treatment effect curve and the difference between covariate-specific treatment effect curves of two treatments. Furthermore, to select an optimal treatment for a future biomarker-defined subpopulation of patients, we proposed confidence bands for each covariate-specific treatment effect curve and the difference between each pair of covariate-specific treatment effect curve over a fixed interval of biomarker values. We constructed the confidence bands based on a resampling technique. We also conducted simulation studies to evaluate finite-sample properties of the proposed estimation methods. Finally, we illustrated the application of the proposed method in a real-world data set.

  6. Refining the Results of a Classical SELEX Experiment by Expanding the Sequence Data Set of an Aptamer Pool Selected for Protein A

    OpenAIRE

    Regina Stoltenburg; Beate Strehlitz

    2018-01-01

    New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aur...

  7. Phage display peptide libraries: deviations from randomness and correctives

    Science.gov (United States)

    Ryvkin, Arie; Ashkenazy, Haim; Weiss-Ottolenghi, Yael; Piller, Chen; Pupko, Tal; Gershoni, Jonathan M

    2018-01-01

    Abstract Peptide-expressing phage display libraries are widely used for the interrogation of antibodies. Affinity selected peptides are then analyzed to discover epitope mimetics, or are subjected to computational algorithms for epitope prediction. A critical assumption for these applications is the random representation of amino acids in the initial naïve peptide library. In a previous study, we implemented next generation sequencing to evaluate a naïve library and discovered severe deviations from randomness in UAG codon over-representation as well as in high G phosphoramidite abundance causing amino acid distribution biases. In this study, we demonstrate that the UAG over-representation can be attributed to the burden imposed on the phage upon the assembly of the recombinant Protein 8 subunits. This was corrected by constructing the libraries using supE44-containing bacteria which suppress the UAG driven abortive termination. We also demonstrate that the overabundance of G stems from variant synthesis-efficiency and can be corrected using compensating oligonucleotide-mixtures calibrated by mass spectroscopy. Construction of libraries implementing these correctives results in markedly improved libraries that display random distribution of amino acids, thus ensuring that enriched peptides obtained in biopanning represent a genuine selection event, a fundamental assumption for phage display applications. PMID:29420788

  8. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  9. Quantitative measure of randomness and order for complete genomes

    Science.gov (United States)

    Kong, Sing-Guan; Fan, Wen-Lang; Chen, Hong-Da; Wigger, Jan; Torda, Andrew E.; Lee, H. C.

    2009-06-01

    We propose an order index, ϕ , which gives a quantitative measure of randomness and order of complete genomic sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length. The 786 complete genomic sequences in GenBank were found to have ϕ values in a very narrow range, ϕg=0.031-0.015+0.028 . We show this implies that genomes are halfway toward being completely random, or, at the “edge of chaos.” We further show that artificial “genomes” converted from literary classics have ϕ ’s that almost exactly coincide with ϕg , but sequences of low information content do not. We infer that ϕg represents a high information-capacity “fixed point” in sequence space, and that genomes are driven to it by the dynamics of a robust growth and evolution process. We show that a growth process characterized by random segmental duplication can robustly drive genomes to the fixed point.

  10. Least squares deconvolution for leak detection with a pseudo random binary sequence excitation

    Science.gov (United States)

    Nguyen, Si Tran Nguyen; Gong, Jinzhe; Lambert, Martin F.; Zecchin, Aaron C.; Simpson, Angus R.

    2018-01-01

    Leak detection and localisation is critical for water distribution system pipelines. This paper examines the use of the time-domain impulse response function (IRF) for leak detection and localisation in a pressurised water pipeline with a pseudo random binary sequence (PRBS) signal excitation. Compared to the conventional step wave generated using a single fast operation of a valve closure, a PRBS signal offers advantageous correlation properties, in that the signal has very low autocorrelation for lags different from zero and low cross correlation with other signals including noise and other interference. These properties result in a significant improvement in the IRF signal to noise ratio (SNR), leading to more accurate leak localisation. In this paper, the estimation of the system IRF is formulated as an optimisation problem in which the l2 norm of the IRF is minimised to suppress the impact of noise and interference sources. Both numerical and experimental data are used to verify the proposed technique. The resultant estimated IRF provides not only accurate leak location estimation, but also good sensitivity to small leak sizes due to the improved SNR.

  11. Selective decontamination in pediatric liver transplants. A randomized prospective study.

    Science.gov (United States)

    Smith, S D; Jackson, R J; Hannakan, C J; Wadowsky, R M; Tzakis, A G; Rowe, M I

    1993-06-01

    Although it has been suggested that selective decontamination of the digestive tract (SDD) decreases postoperative aerobic Gram-negative and fungal infections in orthotopic liver transplantation (OLT), no controlled trials exist in pediatric patients. This prospective, randomized controlled study of 36 pediatric OLT patients examines the effect of short-term SDD on postoperative infection and digestive tract flora. Patients were randomized into two groups. The control group received perioperative parenteral antibiotics only. The SDD group received in addition polymyxin E, tobramycin, and amphotericin B enterally and by oropharyngeal swab postoperatively until oral intake was tolerated (6 +/- 4 days). Indications for operation, preoperative status, age, and intensive care unit and hospital length of stay were no different in SDD (n = 18) and control (n = 18) groups. A total of 14 Gram-negative infections (intraabdominal abscess 7, septicemia 5, pneumonia 1, urinary tract 1) developed in the 36 patients studied. Mortality was not significantly different in the two groups. However, there were significantly fewer patients with Gram-negative infections in the SDD group: 3/18 patients (11%) vs. 11/18 patients (50%) in the control group, P < 0.001. There was also significant reduction in aerobic Gram-negative flora in the stool and pharynx in patients receiving SDD. Gram-positive and anaerobic organisms were unaffected. We conclude that short-term postoperative SDD significantly reduces Gram-negative infections in pediatric OLT patients.

  12. Thermodynamics of protein folding: a random matrix formulation.

    Science.gov (United States)

    Shukla, Pragya

    2010-10-20

    The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters, e.g. the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows, however, that the evolution of the statistical measures, e.g. Gibbs free energy, heat capacity, and entropy, is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies. © 2010 IOP Publishing Ltd

  13. Day-ahead load forecast using random forest and expert input selection

    International Nuclear Information System (INIS)

    Lahouar, A.; Ben Hadj Slama, J.

    2015-01-01

    Highlights: • A model based on random forests for short term load forecast is proposed. • An expert feature selection is added to refine inputs. • Special attention is paid to customers behavior, load profile and special holidays. • The model is flexible and able to handle complex load signal. • A technical comparison is performed to assess the forecast accuracy. - Abstract: The electrical load forecast is getting more and more important in recent years due to the electricity market deregulation and integration of renewable resources. To overcome the incoming challenges and ensure accurate power prediction for different time horizons, sophisticated intelligent methods are elaborated. Utilization of intelligent forecast algorithms is among main characteristics of smart grids, and is an efficient tool to face uncertainty. Several crucial tasks of power operators such as load dispatch rely on the short term forecast, thus it should be as accurate as possible. To this end, this paper proposes a short term load predictor, able to forecast the next 24 h of load. Using random forest, characterized by immunity to parameter variations and internal cross validation, the model is constructed following an online learning process. The inputs are refined by expert feature selection using a set of if–then rules, in order to include the own user specifications about the country weather or market, and to generalize the forecast ability. The proposed approach is tested through a real historical set from the Tunisian Power Company, and the simulation shows accurate and satisfactory results for one day in advance, with an average error exceeding rarely 2.3%. The model is validated for regular working days and weekends, and special attention is paid to moving holidays, following non Gregorian calendar

  14. Secure self-calibrating quantum random-bit generator

    International Nuclear Information System (INIS)

    Fiorentino, M.; Santori, C.; Spillane, S. M.; Beausoleil, R. G.; Munro, W. J.

    2007-01-01

    Random-bit generators (RBGs) are key components of a variety of information processing applications ranging from simulations to cryptography. In particular, cryptographic systems require 'strong' RBGs that produce high-entropy bit sequences, but traditional software pseudo-RBGs have very low entropy content and therefore are relatively weak for cryptography. Hardware RBGs yield entropy from chaotic or quantum physical systems and therefore are expected to exhibit high entropy, but in current implementations their exact entropy content is unknown. Here we report a quantum random-bit generator (QRBG) that harvests entropy by measuring single-photon and entangled two-photon polarization states. We introduce and implement a quantum tomographic method to measure a lower bound on the 'min-entropy' of the system, and we employ this value to distill a truly random-bit sequence. This approach is secure: even if an attacker takes control of the source of optical states, a secure random sequence can be distilled

  15. Sequence selectivity of azinomycin B in DNA alkylation and cross-linking: a QM/MM study.

    Science.gov (United States)

    Senthilnathan, Dhurairajan; Kalaiselvan, Anbarasan; Venuvanalingam, Ponnambalam

    2013-01-01

    Azinomycin B--a well-known antitumor drug--forms cross-links with DNA through alkylation of purine bases and blocks tumor cell growth. This reaction has been modeled using the ONIOM (B3LYP/6-31+g(d):UFF) method to understand the mechanism and sequence selectivity. ONIOM results have been checked for reliability by comparing them with full quantum mechanics calculations for selected paths. Calculations reveal that, among the purine bases, guanine is more reactive and is alkylated by aziridine ring through the C10 position, followed by alkylation of the epoxide ring through the C21 position of Azinomycin B. While the mono alkylation is controlled kinetically, bis-alkylation is controlled thermodynamically. Solvent effects were included using polarized-continuum-model calculations and no significant change from gas phase results was observed.

  16. AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

    Science.gov (United States)

    Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

    2009-01-01

    The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.

  17. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding.

    Science.gov (United States)

    Bhat, Javaid A; Ali, Sajad; Salgotra, Romesh K; Mir, Zahoor A; Dutta, Sutapa; Jadon, Vasudha; Tyagi, Anshika; Mushtaq, Muntazir; Jain, Neelu; Singh, Pradeep K; Singh, Gyanendra P; Prabhu, K V

    2016-01-01

    Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes.

  18. Extreme Kinematics in Selected Hip Hop Dance Sequences.

    Science.gov (United States)

    Bronner, Shaw; Ojofeitimi, Sheyi; Woo, Helen

    2015-09-01

    Hip hop dance has many styles including breakdance (breaking), house, popping and locking, funk, streetdance, krumping, Memphis jookin', and voguing. These movements combine the complexity of dance choreography with the challenges of gymnastics and acrobatic movements. Despite high injury rates in hip hop dance, particularly in breakdance, to date there are no published biomechanical studies in this population. The purpose of this study was to compare representative hip hop steps found in breakdance (toprock and breaking) and house and provide descriptive statistics of the angular displacements that occurred in these sequences. Six expert female hip hop dancers performed three choreographed dance sequences, top rock, breaking, and house, to standardized music-based tempos. Hip, knee, and ankle kinematics were collected during sequences that were 18 to 30 sec long. Hip, knee, and ankle three-dimensional peak joint angles were compared in repeated measures ANOVAs with post hoc tests where appropriate (pHip hop maximal joint angles exceeded reported activities of daily living and high injury sports such as gymnastics. Hip hop dancers work at weight-bearing joint end ranges where muscles are at a functional disadvantage. These results may explain why lower extremity injury rates are high in this population.

  19. Reverse transcriptase sequences from mulberry LTR retrotransposons: characterization analysis

    Directory of Open Access Journals (Sweden)

    Ma Bi

    2017-10-01

    Full Text Available Copia and Gypsy play important roles in structural, functional and evolutionary dynamics of plant genomes. In this study, a total of 106 and 101, Copia and Gypsy reverse transcriptase (rt were amplified respectively in the Morus notabilis genome using degenerate primers. All sequences exhibited high levels of heterogeneity, were rich in AT and possessed higher sequence divergence of Copia rt in comparison to Gypsy rt. Two reasons are likely to account for this phenomenon: a these elements often experience deletions or fragmentation by illegitimate or unequal homologous recombination in the transposition process; b strong purifying selective pressure drives the evolution of these elements through “selective silencing” with random mutation and eventual deletion from the host genome. Interestingly, mulberry rt clustered with other rt from distantly related taxa according to the phylogenetic analysis. This phenomenon did not result from horizontal transposable element transfer. Results obtained from fluorescence in situ hybridization revealed that most of the hybridization signals were preferentially concentrated in pericentromeric and distal regions of chromosomes, and these elements may play important roles in the regions in which they are found. Results of this study support the continued pursuit of further functional studies of Copia and Gypsy in the mulberry genome.

  20. Distribution of orientation selectivity in recurrent networks of spiking neurons with different random topologies.

    Science.gov (United States)

    Sadeh, Sadra; Rotter, Stefan

    2014-01-01

    Neurons in the primary visual cortex are more or less selective for the orientation of a light bar used for stimulation. A broad distribution of individual grades of orientation selectivity has in fact been reported in all species. A possible reason for emergence of broad distributions is the recurrent network within which the stimulus is being processed. Here we compute the distribution of orientation selectivity in randomly connected model networks that are equipped with different spatial patterns of connectivity. We show that, for a wide variety of connectivity patterns, a linear theory based on firing rates accurately approximates the outcome of direct numerical simulations of networks of spiking neurons. Distance dependent connectivity in networks with a more biologically realistic structure does not compromise our linear analysis, as long as the linearized dynamics, and hence the uniform asynchronous irregular activity state, remain stable. We conclude that linear mechanisms of stimulus processing are indeed responsible for the emergence of orientation selectivity and its distribution in recurrent networks with functionally heterogeneous synaptic connectivity.

  1. Sequence diversity and natural selection at domain I of the apical membrane antigen 1 among Indian Plasmodium falciparum populations

    Directory of Open Access Journals (Sweden)

    Kumar Ashwani

    2007-11-01

    Full Text Available Abstract Background The Plasmodium falciparum apical membrane antigen 1 (AMA1 is a leading malaria vaccine candidate antigen. The complete AMA1 protein is comprised of three domains where domain I exhibits high sequence polymorphism and is thus named as the hyper-variable region (HVR. The present study describes the extent of genetic polymorphism and natural selection at domain I of the ama1 gene among Indian P. falciparum isolates. Methods The part of the ama1 gene covering domain I was PCR amplified and sequenced from 157 P. falciparum isolates collected from five different geographical regions of India. Statistical and phylogenetic analyses of the sequences were done using DnaSP ver. 4. 10. 9 and MEGA version 3.0 packages. Results A total of 57 AMA1 haplotypes were observed among 157 isolates sequenced. Forty-six of these 57 haplotypes are being reported here for the first time. The parasites collected from the high malaria transmission areas (Assam, Orissa, and Andaman and Nicobar Islands showed more haplotypes (H and nucleotide diversity π as compared to low malaria transmission areas (Uttar Pradesh and Goa. The comparison of all five Indian P. falciparum subpopulations indicated moderate level of genetic differentiation and limited gene flow (Fixation index ranging from 0.048 to 0.13 between populations. The difference between rates of non-synonymous and synonymous mutations, Tajima's D and McDonald-Kreitman test statistics suggested that the diversity at domain I of the AMA1 antigen is due to positive natural selection. The minimum recombination events were also high indicating the possible role of recombination in generating AMA1 allelic diversity. Conclusion The level of genetic diversity and diversifying selection were higher in Assam, Orissa, and Andaman and Nicobar Islands populations as compared to Uttar Pradesh and Goa. The amounts of gene flow among these populations were moderate. The data reported here will be valuable for the

  2. Direct selection of expressed sequences on a YAC clone revealed proline-rich-like genes and BARE-1 sequences physically linked to the complex ¤Mla¤ powdery mildew resistance locus of barley (¤Hordeum vulgare¤ L.)

    DEFF Research Database (Denmark)

    Schwarz, G.; Michalek, W.; Jahoor, A.

    2002-01-01

    homology to the copia-like retroelement BA REI of barley, putatively involved in evolution of disease resistance loci. The high degree of clones representing barley rRNA sequences or false positives is a major disadvantage of direct selection of cDNAs in barley. (C) 2002 Elsevier Science Ireland Ltd. All...... gene. Of 22 selected cDNA clones, six were re-located on the YAC by southern analysis. Two of these clones are predicted to encode members of the hydroxyproline-rich glycoprotein and proline-rich protein gene families which have been implicated in plant defense response. Four sequences showed high...

  3. Antibiotic selection of Escherichia coli sequence type 131 in a mouse intestinal colonization model

    DEFF Research Database (Denmark)

    Hertz, Frederik Boetius; Løbner-Olesen, Anders; Frimodt-Møller, Niels

    2014-01-01

    The ability of different antibiotics to select for extended-spectrum β-lactamase (ESBL)-producing Escherichia coli remains a topic of discussion. In a mouse intestinal colonization model, we evaluated the selective abilities of nine common antimicrobials (cefotaxime, cefuroxime, dicloxacillin...... day, antibiotic treatment was initiated and given subcutaneously once a day for three consecutive days. CFU of E. coli ST131, Bacteroides, and Gram-positive aerobic bacteria in fecal samples were studied, with intervals, until day 8. Bacteroides was used as an indicator organism for impact on the Gram......, clindamycin, penicillin, ampicillin, meropenem, ciprofloxacin, and amdinocillin) against a CTX-M-15-producing E. coli sequence type 131 (ST131) isolate with a fluoroquinolone resistance phenotype. Mice (8 per group) were orogastrically administered 0.25 ml saline with 10(8) CFU/ml E. coli ST131. On that same...

  4. Multiformity of inherent randomicity and visitation density in n symbolic dynamics

    International Nuclear Information System (INIS)

    Zhang Yagang; Wang Changjiang

    2007-01-01

    The multiformity of inherent randomicity and visitation density in n symbolic dynamics will be clarified in this paper. These stochastic symbolic sequences bear three features. The distribution of frequency, inter-occurrence times and the alignment of two random sequences are amplified in detail. The features of visitation density in surjective maps presents catholicity and the catholicity in n letters randomicity has the same measure foundation. We hope to offer a symbolic platform that satisfies these stochastic properties and to attempt to study certain properties of DNA base sequences, 20 amino acids symbolic sequences of proteid structure, and the time series that can be symbolic in finance market et al

  5. Effects of changing the random number stride in Monte Carlo calculations

    International Nuclear Information System (INIS)

    Hendricks, J.S.

    1991-01-01

    This paper reports on a common practice in Monte Carlo radiation transport codes which is to start each random walk a specified number of steps up the random number sequence from the previous one. This is called the stride in the random number sequence between source particles. It is used for correlated sampling or to provide tree-structured random numbers. A new random number generator algorithm for the major Monte Carlo code MCNP has been written to allow adjustment of the random number stride. This random number generator is machine portable. The effects of varying the stride for several sample problems are examined

  6. Statistical method to compare massive parallel sequencing pipelines.

    Science.gov (United States)

    Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

    2017-03-01

    Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.

  7. Complete genome sequences of six measles virus strains

    NARCIS (Netherlands)

    Phan, M.V.T. (My V.T.); C.M.E. Schapendonk (Claudia); B.B. Oude Munnink (Bas B.); M.P.G. Koopmans D.V.M. (Marion); R.L. de Swart (Rik); Cotten, M. (Matthew)

    2018-01-01

    textabstractGenetic characterization of wild-type measles virus (MV) strains is a critical component of measles surveillance and molecular epidemiology. We have obtained complete genome sequences of six MV strains belonging to different genotypes, using random-primed next generation sequencing.

  8. Chaotic generation of PN sequences : a VLSI implementation

    NARCIS (Netherlands)

    Dornbusch, A.; Pineda de Gyvez, J.

    1999-01-01

    Generation of repeatable pseudo-random sequences with chaotic analog electronics is not feasible using standard circuit topologies. Component variation caused by imperfect fabrication causes the same divergence of output sequences as does varying initial conditions. By quantizing the output of a

  9. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  10. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  11. Quasirandom geometric networks from low-discrepancy sequences

    Science.gov (United States)

    Estrada, Ernesto

    2017-08-01

    We define quasirandom geometric networks using low-discrepancy sequences, such as Halton, Sobol, and Niederreiter. The networks are built in d dimensions by considering the d -tuples of digits generated by these sequences as the coordinates of the vertices of the networks in a d -dimensional Id unit hypercube. Then, two vertices are connected by an edge if they are at a distance smaller than a connection radius. We investigate computationally 11 network-theoretic properties of two-dimensional quasirandom networks and compare them with analogous random geometric networks. We also study their degree distribution and their spectral density distributions. We conclude from this intensive computational study that in terms of the uniformity of the distribution of the vertices in the unit square, the quasirandom networks look more random than the random geometric networks. We include an analysis of potential strategies for generating higher-dimensional quasirandom networks, where it is know that some of the low-discrepancy sequences are highly correlated. In this respect, we conclude that up to dimension 20, the use of scrambling, skipping and leaping strategies generate quasirandom networks with the desired properties of uniformity. Finally, we consider a diffusive process taking place on the nodes and edges of the quasirandom and random geometric graphs. We show that the diffusion time is shorter in the quasirandom graphs as a consequence of their larger structural homogeneity. In the random geometric graphs the diffusion produces clusters of concentration that make the process more slow. Such clusters are a direct consequence of the heterogeneous and irregular distribution of the nodes in the unit square in which the generation of random geometric graphs is based on.

  12. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    Science.gov (United States)

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  13. Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

    Science.gov (United States)

    Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

    2014-01-01

    Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.

  14. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  15. Evoked potential correlates of selective attention with multi-channel auditory inputs

    Science.gov (United States)

    Schwent, V. L.; Hillyard, S. A.

    1975-01-01

    Ten subjects were presented with random, rapid sequences of four auditory tones which were separated in pitch and apparent spatial position. The N1 component of the auditory vertex evoked potential (EP) measured relative to a baseline was observed to increase with attention. It was concluded that the N1 enhancement reflects a finely tuned selective attention to one stimulus channel among several concurrent, competing channels. This EP enhancement probably increases with increased information load on the subject.

  16. Gene Discovery through Genomic Sequencing of Brucella abortus

    OpenAIRE

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposit...

  17. Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

    Science.gov (United States)

    Nuel, Gregory; Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2010-01-26

    In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of

  18. Using Random Forests to Select Optimal Input Variables for Short-Term Wind Speed Forecasting Models

    Directory of Open Access Journals (Sweden)

    Hui Wang

    2017-10-01

    Full Text Available Achieving relatively high-accuracy short-term wind speed forecasting estimates is a precondition for the construction and grid-connected operation of wind power forecasting systems for wind farms. Currently, most research is focused on the structure of forecasting models and does not consider the selection of input variables, which can have significant impacts on forecasting performance. This paper presents an input variable selection method for wind speed forecasting models. The candidate input variables for various leading periods are selected and random forests (RF is employed to evaluate the importance of all variable as features. The feature subset with the best evaluation performance is selected as the optimal feature set. Then, kernel-based extreme learning machine is constructed to evaluate the performance of input variables selection based on RF. The results of the case study show that by removing the uncorrelated and redundant features, RF effectively extracts the most strongly correlated set of features from the candidate input variables. By finding the optimal feature combination to represent the original information, RF simplifies the structure of the wind speed forecasting model, shortens the training time required, and substantially improves the model’s accuracy and generalization ability, demonstrating that the input variables selected by RF are effective.

  19. Drawing a random number

    DEFF Research Database (Denmark)

    Wanscher, Jørgen Bundgaard; Sørensen, Majken Vildrik

    2006-01-01

    Random numbers are used for a great variety of applications in almost any field of computer and economic sciences today. Examples ranges from stock market forecasting in economics, through stochastic traffic modelling in operations research to photon and ray tracing in graphics. The construction...... distributions into others with most of the required characteristics. In essence, a uniform sequence which is transformed into a new sequence with the required distribution. The subject of this article is to consider the well known highly uniform Halton sequence and modifications to it. The intent is to generate...

  20. Nonlinear analysis of sequence repeats of multi-domain proteins

    Energy Technology Data Exchange (ETDEWEB)

    Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: lmf_bill@sina.com

    2007-11-15

    Many multi-domain proteins have repetitive three-dimensional structures but nearly-random amino acid sequences. In the present paper, by using a modified recurrence plot proposed by us previously, we show that these amino acid sequences have hidden repetitions in fact. These results indicate that the repetitive domain structures are encoded by the repetitive sequences. This also gives a method to detect the repetitive domain structures directly from amino acid sequences.

  1. Multi-species sequence comparison reveals dynamic evolution of the elastin gene that has involved purifying selection and lineage-specific insertions/deletions

    Directory of Open Access Journals (Sweden)

    Green Eric D

    2004-05-01

    Full Text Available Abstract Background The elastin gene (ELN is implicated as a factor in both supravalvular aortic stenosis (SVAS and Williams Beuren Syndrome (WBS, two diseases involving pronounced complications in mental or physical development. Although the complete spectrum of functional roles of the processed gene product remains to be established, these roles are inferred to be analogous in human and mouse. This view is supported by genomic sequence comparison, in which there are no large-scale differences in the ~1.8 Mb sequence block encompassing the common region deleted in WBS, with the exception of an overall reversed physical orientation between human and mouse. Results Conserved synteny around ELN does not translate to a high level of conservation in the gene itself. In fact, ELN orthologs in mammals show more sequence divergence than expected for a gene with a critical role in development. The pattern of divergence is non-conventional due to an unusually high ratio of gaps to substitutions. Specifically, multi-sequence alignments of eight mammalian sequences reveal numerous non-aligning regions caused by species-specific insertions and deletions, in spite of the fact that the vast majority of aligning sites appear to be conserved and undergoing purifying selection. Conclusions The pattern of lineage-specific, in-frame insertions/deletions in the coding exons of ELN orthologous genes is unusual and has led to unique features of the gene in each lineage. These differences may indicate that the gene has a slightly different functional mechanism in mammalian lineages, or that the corresponding regions are functionally inert. Identified regions that undergo purifying selection reflect a functional importance associated with evolutionary pressure to retain those features.

  2. Inter simple sequence repeats (ISSR) and random amplified ...

    African Journals Online (AJOL)

    21 of 30 random amplified polymorphic DNA (RAPD) primers produced 220 reproducible bands with average of 10.47 bands per primer and 80.12% of polymorphism. OPR02 primer showed the highest number of effective allele (Ne), Shannon index (I) and genetic diversity (H). Some of the cultivars had specific bands, ...

  3. Large-scale Identification of Expressed Sequence Tags (ESTs from Nicotianatabacum by Normalized cDNA Library Sequencing

    Directory of Open Access Journals (Sweden)

    Alvarez S Perez

    2014-12-01

    Full Text Available An expressed sequence tags (EST resource for tobacco plants (Nicotianatabacum was established using high-throughput sequencing of randomly selected clones from one cDNA library representing a range of plant organs (leaf, stem, root and root base. Over 5000 ESTs were generated from the 3’ ends of 8000 clones, analyzed by BLAST searches and categorized functionally. All annotated ESTs were classified into 18 functional categories, unique transcripts involved in energy were the largest group accounting for 831 (32.32% of the annotated ESTs. After excluding 2450 non-significant tentative unique transcripts (TUTs, 100 unique sequences (1.67% of total TUTs were identified from the N. tabacum database. In the array result two genes strongly related to the tobacco mosaic virus (TMV were obtained, one basic form of pathogenesis-related protein 1 precursor (TBT012G08 and ubiquitin (TBT087G01. Both of them were found in the variety Hongda, some other important genes were classified into two groups, one of these implicated in plant development like those genes related to a photosynthetic process (chlorophyll a-b binding protein, photosystem I, ferredoxin I and III, ATP synthase and a further group including genes related to plant stress response (ubiquitin, ubiquitin-like protein SMT3, glycine-rich RNA binding protein, histones and methallothionein. The interesting finding in this study is that two of these genes have never been reported before in N. tabacum (ubiquitin-like protein SMT3 and methallothionein. The array results were confirmed using quantitative PCR.

  4. Brain potentials index executive functions during random number generation.

    Science.gov (United States)

    Joppich, Gregor; Däuper, Jan; Dengler, Reinhard; Johannes, Sönke; Rodriguez-Fornells, Antoni; Münte, Thomas F

    2004-06-01

    The generation of random sequences is considered to tax different executive functions. To explore the involvement of these functions further, brain potentials were recorded in 16 healthy young adults while either engaging in random number generation (RNG) by pressing the number keys on a computer keyboard in a random sequence or in ordered number generation (ONG) necessitating key presses in the canonical order. Key presses were paced by an external auditory stimulus to yield either fast (1 press/800 ms) or slow (1 press/1300 ms) sequences in separate runs. Attentional demands of random and ordered tasks were assessed by the introduction of a secondary task (key-press to a target tone). The P3 amplitude to the target tone of this secondary task was reduced during RNG, reflecting the greater consumption of attentional resources during RNG. Moreover, RNG led to a left frontal negativity peaking 140 ms after the onset of the pacing stimulus, whenever the subjects produced a true random response. This negativity could be attributed to the left dorsolateral prefrontal cortex and was absent when numbers were repeated. This negativity was interpreted as an index for the inhibition of habitual responses. Finally, in response locked ERPs a negative component was apparent peaking about 50 ms after the key-press that was more prominent during RNG. Source localization suggested a medial frontal source. This effect was tentatively interpreted as a reflection of the greater monitoring demands during random sequence generation.

  5. Cryptographic pseudo-random sequences from the chaotic Hénon ...

    Indian Academy of Sciences (India)

    2-dimensional chaotic maps for the generation of pseudorandom sequences. 3. ... map. Consider the bit-stream Bx formed by choosing every Pth bit of Sx, ... Similarly, the probability of the linear complexity C assuming the value c(c < N) when.

  6. Representation of the quantum Fourier transform on multilevel basic elements by a sequence of selective rotation operators

    Science.gov (United States)

    Ermilov, A. S.; Zobov, V. E.

    2007-12-01

    To experimentally realize quantum computations on d-level basic elements (qudits) at d > 2, it is necessary to develop schemes for the technical realization of elementary logical operators. We have found sequences of selective rotation operators that represent the operators of the quantum Fourier transform (Walsh-Hadamard matrices) for d = 3-10. For the prime numbers 3, 5, and 7, the well-known method of linear algebra is applied, whereas, for the factorable numbers 6, 9, and 10, the representation of virtual spins is used (which we previously applied for d = 4, 8). Selective rotations can be realized, for example, by means of pulses of an RF magnetic field for systems of quadrupole nuclei or laser pulses for atoms and ions in traps.

  7. Precision of working memory for visual motion sequences and transparent motion surfaces.

    Science.gov (United States)

    Zokaei, Nahid; Gorgoraptis, Nikos; Bahrami, Bahador; Bays, Paul M; Husain, Masud

    2011-12-01

    Recent studies investigating working memory for location, color, and orientation support a dynamic resource model. We examined whether this might also apply to motion, using random dot kinematograms (RDKs) presented sequentially or simultaneously. Mean precision for motion direction declined as sequence length increased, with precision being lower for earlier RDKs. Two alternative models of working memory were compared specifically to distinguish between the contributions of different sources of error that corrupt memory (W. Zhang & S. J. Luck, 2008 vs. P. M. Bays, R. F. G. Catalao, & M. Husain, 2009). The latter provided a significantly better fit for the data, revealing that decrease in memory precision for earlier items is explained by an increase in interference from other items in a sequence rather than random guessing or a temporal decay of information. Misbinding feature attributes is an important source of error in working memory. Precision of memory for motion direction decreased when two RDKs were presented simultaneously as transparent surfaces, compared to sequential RDKs. However, precision was enhanced when one motion surface was prioritized, demonstrating that selective attention can improve recall precision. These results are consistent with a resource model that can be used as a general conceptual framework for understanding working memory across a range of visual features.

  8. All-optical fast random number generator.

    Science.gov (United States)

    Li, Pu; Wang, Yun-Cai; Zhang, Jian-Zhong

    2010-09-13

    We propose a scheme of all-optical random number generator (RNG), which consists of an ultra-wide bandwidth (UWB) chaotic laser, an all-optical sampler and an all-optical comparator. Free from the electric-device bandwidth, it can generate 10Gbit/s random numbers in our simulation. The high-speed bit sequences can pass standard statistical tests for randomness after all-optical exclusive-or (XOR) operation.

  9. MELCOR 1.8.2 calculations of selected sequences for the ABWR

    International Nuclear Information System (INIS)

    Kmetyk, L.N.

    1994-07-01

    This report summarizes the results from MELCOR calculations of severe accident sequences in the ABWR and presents comparisons with MAAP calculations for the same sequences. MELCOR was run for two low-pressure and three high-pressure sequences to identify the materials which enter containment and are available for release to the environment (source terms), to study the potential effects of core-concrete interaction, and to obtain event timings during each sequence; the source terms include fission products and other materials such as those generated by core-concrete interactions. Sensitivity studies were done on the impact of assuming limestone rather than basaltic concrete and on the effect of quenching core debris in the cavity compared to having hot, unquenched debris present

  10. Quality pseudo-random number generator

    International Nuclear Information System (INIS)

    Tarasiuk, J.

    1996-01-01

    The pseudo-random number generator (RNG) was written to match needs of nuclear and high-energy physics computation which in some cases require very long and independent random number sequences. In this random number generator the repetition period is about 10 36 what should be sufficient for all computers in the world. In this article the test results of RNG correlation, speed and identity of computations for PC, Sun4 and VAX computer tests are presented

  11. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  12. Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map

    Directory of Open Access Journals (Sweden)

    Xu Xiangming

    2010-12-01

    Full Text Available Abstract Background Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. Conclusions We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.

  13. Simulation of a directed random-walk model: the effect of pseudo-random-number correlations

    OpenAIRE

    Shchur, L. N.; Heringa, J. R.; Blöte, H. W. J.

    1996-01-01

    We investigate the mechanism that leads to systematic deviations in cluster Monte Carlo simulations when correlated pseudo-random numbers are used. We present a simple model, which enables an analysis of the effects due to correlations in several types of pseudo-random-number sequences. This model provides qualitative understanding of the bias mechanism in a class of cluster Monte Carlo algorithms.

  14. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    Science.gov (United States)

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known

  15. A weak zero-one law for sequences of random distance graphs

    Energy Technology Data Exchange (ETDEWEB)

    Zhukovskii, Maksim E [M. V. Lomonosov Moscow State University, Faculty of Mechanics and Mathematics, Moscow (Russian Federation)

    2012-07-31

    We study zero-one laws for properties of random distance graphs. Properties written in a first-order language are considered. For p(N) such that pN{sup {alpha}}{yields}{infinity} as N{yields}{infinity}, and (1-p)N{sup {alpha}} {yields} {infinity} as N {yields} {infinity} for any {alpha}>0, we succeed in refuting the law. In this connection, we consider a weak zero-one j-law. For this law, we obtain results for random distance graphs which are similar to the assertions concerning the classical zero-one law for random graphs. Bibliography: 18 titles.

  16. Visual evoked potentials and selective attention to points in space

    Science.gov (United States)

    Van Voorhis, S.; Hillyard, S. A.

    1977-01-01

    Visual evoked potentials (VEPs) were recorded to sequences of flashes delivered to the right and left visual fields while subjects responded promptly to designated stimuli in one field at a time (focused attention), in both fields at once (divided attention), or to neither field (passive). Three stimulus schedules were used: the first was a replication of a previous study (Eason, Harter, and White, 1969) where left- and right-field flashes were delivered quasi-independently, while in the other two the flashes were delivered to the two fields in random order (Bernoulli sequence). VEPs to attended-field stimuli were enhanced at both occipital (O2) and central (Cz) recording sites under all stimulus sequences, but different components were affected at the two scalp sites. It was suggested that the VEP at O2 may reflect modality-specific processing events, while the response at Cz, like its auditory homologue, may index more general aspects of selective attention.

  17. An Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI for Spelling

    Science.gov (United States)

    Moghadamfalahi, Mohammad; Akcakaya, Murat; Nezamfar, Hooman; Sourati, Jamshid; Erdogmus, Deniz

    2017-10-01

    A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed.

  18. Single-Shell Tank (SST) Retrieval Sequence Fiscal Year 2000 Update

    International Nuclear Information System (INIS)

    GARFIELD, J.S.

    2000-01-01

    This document describes the baseline single-shell tank (SST) waste retrieval sequence for the River Protection Project (RPP) updated for Fiscal Year 2000. The SST retrieval sequence identifies the proposed retrieval order (sequence), the tank selection and prioritization rationale, and planned retrieval dates for Hanford SSTs. In addition, the tank selection criteria and reference retrieval method for this sequence are discussed

  19. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  20. Application of quasi-random numbers for simulation

    International Nuclear Information System (INIS)

    Kazachenko, O.N.; Takhtamyshev, G.G.

    1985-01-01

    Application of the Monte-Carlo method for multidimensional integration is discussed. The main goal is to check the statement that the application of quasi-random numbers instead of regular pseudo-random numbers provides more rapid convergency. The Sobol, Richtmayer and Halton algorithms of quasi-random sequences are described. Over 50 tests to compare these quasi-random numbers as well as pseudo-random numbers were fulfilled. In all cases quasi-random numbers have clearly demonstrated a more rapid convergency as compared with pseudo-random ones. Positive test results on quasi-random trend in Monte-Carlo method seem very promising

  1. CMOS Compressed Imaging by Random Convolution

    OpenAIRE

    Jacques, Laurent; Vandergheynst, Pierre; Bibet, Alexandre; Majidzadeh, Vahid; Schmid, Alexandre; Leblebici, Yusuf

    2009-01-01

    We present a CMOS imager with built-in capability to perform Compressed Sensing. The adopted sensing strategy is the random Convolution due to J. Romberg. It is achieved by a shift register set in a pseudo-random configuration. It acts as a convolutive filter on the imager focal plane, the current issued from each CMOS pixel undergoing a pseudo-random redirection controlled by each component of the filter sequence. A pseudo-random triggering of the ADC reading is finally applied to comp...

  2. A Comparison of Three Random Number Generators for Aircraft Dynamic Modeling Applications

    Science.gov (United States)

    Grauer, Jared A.

    2017-01-01

    Three random number generators, which produce Gaussian white noise sequences, were compared to assess their suitability in aircraft dynamic modeling applications. The first generator considered was the MATLAB (registered) implementation of the Mersenne-Twister algorithm. The second generator was a website called Random.org, which processes atmospheric noise measured using radios to create the random numbers. The third generator was based on synthesis of the Fourier series, where the random number sequences are constructed from prescribed amplitude and phase spectra. A total of 200 sequences, each having 601 random numbers, for each generator were collected and analyzed in terms of the mean, variance, normality, autocorrelation, and power spectral density. These sequences were then applied to two problems in aircraft dynamic modeling, namely estimating stability and control derivatives from simulated onboard sensor data, and simulating flight in atmospheric turbulence. In general, each random number generator had good performance and is well-suited for aircraft dynamic modeling applications. Specific strengths and weaknesses of each generator are discussed. For Monte Carlo simulation, the Fourier synthesis method is recommended because it most accurately and consistently approximated Gaussian white noise and can be implemented with reasonable computational effort.

  3. PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

    Science.gov (United States)

    Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

    2018-01-01

    The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.

  4. Deep sequencing of the Trypanosoma cruzi GP63 surface proteases reveals diversity and diversifying selection among chronic and congenital Chagas disease patients.

    Science.gov (United States)

    Llewellyn, Martin S; Messenger, Louisa A; Luquetti, Alejandro O; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B N; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A

    2015-04-01

    Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target--ND5--was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I gene family suggests a link between genetic diversity within this gene

  5. Law of Iterated Logarithm for NA Sequences with Non-Identical ...

    Indian Academy of Sciences (India)

    Based on a law of the iterated logarithm for independent random variables sequences, an iterated logarithm theorem for NA sequences with non-identical distributions is obtained. The proof is based on a Kolmogrov-type exponential inequality.

  6. SSRscanner: a program for reporting distribution and exact location of simple sequence repeats.

    Science.gov (United States)

    Anwar, Tamanna; Khan, Asad U

    2006-02-20

    Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com.

  7. Parallel random number generator for inexpensive configurable hardware cells

    Science.gov (United States)

    Ackermann, J.; Tangen, U.; Bödekker, B.; Breyer, J.; Stoll, E.; McCaskill, J. S.

    2001-11-01

    A new random number generator ( RNG) adapted to parallel processors has been created. This RNG can be implemented with inexpensive hardware cells. The correlation between neighboring cells is suppressed with smart connections. With such connection structures, sequences of pseudo-random numbers are produced. Numerical tests including a self-avoiding random walk test and the simulation of the order parameter and energy of the 2D Ising model give no evidence for correlation in the pseudo-random sequences. Because the new random number generator has suppressed the correlation between neighboring cells which is usually observed in cellular automaton implementations, it is applicable for extended time simulations. It gives an immense speed-up factor if implemented directly in configurable hardware, and has recently been used for long time simulations of spatially resolved molecular evolution.

  8. Substrate sequence selectivity of APOBEC3A implicates intra-DNA interactions.

    Science.gov (United States)

    Silvas, Tania V; Hou, Shurong; Myint, Wazo; Nalivaika, Ellen; Somasundaran, Mohan; Kelch, Brian A; Matsuo, Hiroshi; Kurt Yilmaz, Nese; Schiffer, Celia A

    2018-05-14

    The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A-ssDNA co-crystal structures, we propose a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A's involvement in mutation of endogenous or exogenous DNA.

  9. Fast physical random bit generation with chaotic semiconductor lasers

    Science.gov (United States)

    Uchida, Atsushi; Amano, Kazuya; Inoue, Masaki; Hirano, Kunihito; Naito, Sunao; Someya, Hiroyuki; Oowada, Isao; Kurashige, Takayuki; Shiki, Masaru; Yoshimori, Shigeru; Yoshimura, Kazuyuki; Davis, Peter

    2008-12-01

    Random number generators in digital information systems make use of physical entropy sources such as electronic and photonic noise to add unpredictability to deterministically generated pseudo-random sequences. However, there is a large gap between the generation rates achieved with existing physical sources and the high data rates of many computation and communication systems; this is a fundamental weakness of these systems. Here we show that good quality random bit sequences can be generated at very fast bit rates using physical chaos in semiconductor lasers. Streams of bits that pass standard statistical tests for randomness have been generated at rates of up to 1.7 Gbps by sampling the fluctuating optical output of two chaotic lasers. This rate is an order of magnitude faster than that of previously reported devices for physical random bit generators with verified randomness. This means that the performance of random number generators can be greatly improved by using chaotic laser devices as physical entropy sources.

  10. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark

    2015-01-01

    Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed

  11. Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

    Directory of Open Access Journals (Sweden)

    Regad Leslie

    2010-01-01

    Full Text Available Abstract Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.. Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with

  12. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Directory of Open Access Journals (Sweden)

    Suldbold Jargalmaa

    2017-07-01

    Full Text Available Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II. These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

  13. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Science.gov (United States)

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  14. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  15. Undesirable Choice Biases with Small Differences in the Spatial Structure of Chance Stimulus Sequences.

    Directory of Open Access Journals (Sweden)

    David Herrera

    Full Text Available In two-alternative discrimination tasks, experimenters usually randomize the location of the rewarded stimulus so that systematic behavior with respect to irrelevant stimuli can only produce chance performance on the learning curves. One way to achieve this is to use random numbers derived from a discrete binomial distribution to create a 'full random training schedule' (FRS. When using FRS, however, sporadic but long laterally-biased training sequences occur by chance and such 'input biases' are thought to promote the generation of laterally-biased choices (i.e., 'output biases'. As an alternative, a 'Gellerman-like training schedule' (GLS can be used. It removes most input biases by prohibiting the reward from appearing on the same location for more than three consecutive trials. The sequence of past rewards obtained from choosing a particular discriminative stimulus influences the probability of choosing that same stimulus on subsequent trials. Assuming that the long-term average ratio of choices matches the long-term average ratio of reinforcers, we hypothesized that a reduced amount of input biases in GLS compared to FRS should lead to a reduced production of output biases. We compared the choice patterns produced by a 'Rational Decision Maker' (RDM in response to computer-generated FRS and GLS training sequences. To create a virtual RDM, we implemented an algorithm that generated choices based on past rewards. Our simulations revealed that, although the GLS presented fewer input biases than the FRS, the virtual RDM produced more output biases with GLS than with FRS under a variety of test conditions. Our results reveal that the statistical and temporal properties of training sequences interacted with the RDM to influence the production of output biases. Thus, discrete changes in the training paradigms did not translate linearly into modifications in the pattern of choices generated by a RDM. Virtual RDMs could be further employed to guide

  16. Undesirable Choice Biases with Small Differences in the Spatial Structure of Chance Stimulus Sequences.

    Science.gov (United States)

    Herrera, David; Treviño, Mario

    2015-01-01

    In two-alternative discrimination tasks, experimenters usually randomize the location of the rewarded stimulus so that systematic behavior with respect to irrelevant stimuli can only produce chance performance on the learning curves. One way to achieve this is to use random numbers derived from a discrete binomial distribution to create a 'full random training schedule' (FRS). When using FRS, however, sporadic but long laterally-biased training sequences occur by chance and such 'input biases' are thought to promote the generation of laterally-biased choices (i.e., 'output biases'). As an alternative, a 'Gellerman-like training schedule' (GLS) can be used. It removes most input biases by prohibiting the reward from appearing on the same location for more than three consecutive trials. The sequence of past rewards obtained from choosing a particular discriminative stimulus influences the probability of choosing that same stimulus on subsequent trials. Assuming that the long-term average ratio of choices matches the long-term average ratio of reinforcers, we hypothesized that a reduced amount of input biases in GLS compared to FRS should lead to a reduced production of output biases. We compared the choice patterns produced by a 'Rational Decision Maker' (RDM) in response to computer-generated FRS and GLS training sequences. To create a virtual RDM, we implemented an algorithm that generated choices based on past rewards. Our simulations revealed that, although the GLS presented fewer input biases than the FRS, the virtual RDM produced more output biases with GLS than with FRS under a variety of test conditions. Our results reveal that the statistical and temporal properties of training sequences interacted with the RDM to influence the production of output biases. Thus, discrete changes in the training paradigms did not translate linearly into modifications in the pattern of choices generated by a RDM. Virtual RDMs could be further employed to guide the selection of

  17. Comparison of Muscle Onset Activation Sequences between a Golf or Tennis Swing and Common Training Exercises Using Surface Electromyography: A Pilot Study

    Directory of Open Access Journals (Sweden)

    John M. Vasudevan

    2016-01-01

    Full Text Available Aim. The purpose of this pilot study is to use surface electromyography to determine an individual athlete’s typical muscle onset activation sequence when performing a golf or tennis forward swing and to use the method to assess to what degree the sequence is reproduced with common conditioning exercises and a machine designed for this purpose. Methods. Data for 18 healthy male subjects were collected for 15 muscles of the trunk and lower extremities. Data were filtered and processed to determine the average onset of muscle activation for each motion. A Spearman correlation estimated congruence of activation order between the swing and each exercise. Correlations of each group were pooled with 95% confidence intervals using a random effects meta-analytic strategy. Results. The averaged sequences differed among each athlete tested, but pooled correlations demonstrated a positive association between each exercise and the participants’ natural muscle onset activation sequence. Conclusion. The selected training exercises and Turning Point™ device all partially reproduced our athletes’ averaged muscle onset activation sequences for both sports. The results support consideration of a larger, adequately powered study using this method to quantify to what degree each of the selected exercises is appropriate for use in both golf and tennis.

  18. Participant-selected music and physical activity in older adults following cardiac rehabilitation: a randomized controlled trial.

    Science.gov (United States)

    Clark, Imogen N; Baker, Felicity A; Peiris, Casey L; Shoebridge, Georgie; Taylor, Nicholas F

    2017-03-01

    To evaluate effects of participant-selected music on older adults' achievement of activity levels recommended in the physical activity guidelines following cardiac rehabilitation. A parallel group randomized controlled trial with measurements at Weeks 0, 6 and 26. A multisite outpatient rehabilitation programme of a publicly funded metropolitan health service. Adults aged 60 years and older who had completed a cardiac rehabilitation programme. Experimental participants selected music to support walking with guidance from a music therapist. Control participants received usual care only. The primary outcome was the proportion of participants achieving activity levels recommended in physical activity guidelines. Secondary outcomes compared amounts of physical activity, exercise capacity, cardiac risk factors, and exercise self-efficacy. A total of 56 participants, mean age 68.2 years (SD = 6.5), were randomized to the experimental ( n = 28) and control groups ( n = 28). There were no differences between groups in proportions of participants achieving activity recommended in physical activity guidelines at Week 6 or 26. Secondary outcomes demonstrated between-group differences in male waist circumference at both measurements (Week 6 difference -2.0 cm, 95% CI -4.0 to 0; Week 26 difference -2.8 cm, 95% CI -5.4 to -0.1), and observed effect sizes favoured the experimental group for amounts of physical activity (d = 0.30), exercise capacity (d = 0.48), and blood pressure (d = -0.32). Participant-selected music did not increase the proportion of participants achieving recommended amounts of physical activity, but may have contributed to exercise-related benefits.

  19. Transformed composite sequences for improved qubit addressing

    Science.gov (United States)

    Merrill, J. True; Doret, S. Charles; Vittorini, Grahame; Addison, J. P.; Brown, Kenneth R.

    2014-10-01

    Selective laser addressing of a single atom or atomic ion qubit can be improved using narrow-band composite pulse sequences. We describe a Lie-algebraic technique to generalize known narrow-band sequences and introduce sequences related by dilation and rotation of sequence generators. Our method improves known narrow-band sequences by decreasing both the pulse time and the residual error. Finally, we experimentally demonstrate these composite sequences using 40Ca+ ions trapped in a surface-electrode ion trap.

  20. r2VIM: A new variable selection method for random forests in genome-wide association studies.

    Science.gov (United States)

    Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit; Malley, James D; Molloy, Anne M; Mills, James L; Brody, Lawrence C; Stambolian, Dwight; Bailey-Wilson, Joan E

    2016-01-01

    Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.

  1. MR pulse sequences for selective relaxation time measurements: a phantom study

    DEFF Research Database (Denmark)

    Thomsen, C; Jensen, K E; Jensen, M

    1990-01-01

    a Siemens Magnetom wholebody magnetic resonance scanner operating at 1.5 Tesla was used. For comparison six imaging pulse sequences for relaxation time measurements were tested on the same phantom. The spectroscopic pulse sequences all had an accuracy better than 10% of the reference values....

  2. Epitope selection from an uncensored peptide library displayed on avian leukosis virus

    International Nuclear Information System (INIS)

    Khare, Pranay D.; Rosales, Ana G.; Bailey, Kent R.; Russell, Stephen J.; Federspiel, Mark J.

    2003-01-01

    Phage display libraries have provided an extraordinarily versatile technology to facilitate the isolation of peptides, growth factors, single chain antibodies, and enzymes with desired binding specificities or enzymatic activities. The overall diversity of peptides in phage display libraries can be significantly limited by Escherichia coli protein folding and processing machinery, which result in sequence censorship. To achieve an optimal diversity of displayed eukaryotic peptides, the library should be produced in the endoplasmic reticulum of eukaryotic cells using a eukaryotic display platform. In the accompanying article, we presented experiments that demonstrate that polypeptides of various sizes could be efficiently displayed on the envelope glycoproteins of a eukaryotic virus, avian leukosis virus (ALV), and the displayed polypeptides could efficiently attach to cognate receptors without interfering with viral attachment and entry into susceptible cells. In this study, methods were developed to construct a model library of randomized eight amino acid peptides using the ALV eukaryotic display platform and screen the library for specific epitopes using immobilized antibodies. A virus library with approximately 2 x 10 6 different members was generated from a plasmid library of approximately 5 x 10 6 diversity. The sequences of the randomized 24 nucleotide/eight amino acid regions of representatives of the plasmid and virus libraries were analyzed. No significant sequence censorship was observed in producing the virus display library from the plasmid library. Different populations of peptide epitopes were selected from the virus library when different monoclonal antibodies were used as the target. The results of these two studies clearly demonstrate the potential of ALV as a eukaryotic platform for the display and selection of eukaryotic polypeptides libraries

  3. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  4. How edge-reinforced random walk arises naturally

    NARCIS (Netherlands)

    Rolles, S.W.W.

    2003-01-01

    We give a characterization of a modified edge-reinforced random walk in terms of certain partially exchangeable sequences. In particular, we obtain a characterization of an edge-reinforced random walk (introduced by Coppersmith and Diaconis) on a 2-edge-connected graph. Modifying the notion of

  5. Analisis Teoritis dan Empiris Uji Craps dari Diehard Battery of Randomness Test untuk Pengujian Pembangkit Bilangan Acaksemu

    Directory of Open Access Journals (Sweden)

    Sari Agustini Hafman

    2013-05-01

    Full Text Available According to Kerchoffs (1883, the security system should only rely on cryptographic keys which is used in that system. Generally, the key sequences are generated by a Pseudo Random Number Generator (PRNG or Random Number Generator (RNG. There are three types of randomness sequences that generated by the RNG and PRNG i.e. pseudorandom sequence, cryptographically secure pseudorandom sequences, and real random sequences. Several statistical tests, including diehard battery of tests of randomness, is used to check the type of randomness sequences that generated by PRNG or RNG. Due to its purpose, the principle on taking the testing parameters and the test statistic are associated with the validity of the conclusion produced by a statistical test, then the theoretical analysis is performed by applying a variety of statistical theory to evaluate craps test, one of the test included in the diehard battery of randomness tests. Craps test, inspired by craps game, aims to examine whether a PRNG produces an independent and identically distributed (iid pseudorandom sequences. To demonstrate the process to produce a test statistics equation and to show how craps games applied on that test, will be carried out theoretical analysis by applying a variety of statistical theory. Furthermore, empirical observations will be done by applying craps test on a PRNG in order to check the test effectiveness in detecting the distribution and independency of sequences which produced by PRNG

  6. Improving the chances of successful protein structure determination with a random forest classifier

    Energy Technology Data Exchange (ETDEWEB)

    Jahandideh, Samad [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); Jaroszewski, Lukasz; Godzik, Adam, E-mail: adam@burnham.org [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); University of California, San Diego, La Jolla, California (United States)

    2014-03-01

    Using an extended set of protein features calculated separately for protein surface and interior, a new version of XtalPred based on a random forest classifier achieves a significant improvement in predicting the success of structure determination from the primary amino-acid sequence. Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007 ▶), Protein Sci.16, 2472–2482] was developed. XtalPred classifies proteins into five ‘crystallization classes’ based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted

  7. Phage display of the serpin alpha-1 proteinase inhibitor randomized at consecutive residues in the reactive centre loop and biopanned with or without thrombin.

    Directory of Open Access Journals (Sweden)

    Benjamin M Scott

    Full Text Available In spite of the power of phage display technology to identify variant proteins with novel properties in large libraries, it has only been previously applied to one member of the serpin superfamily. Here we describe phage display of human alpha-1 proteinase inhibitor (API in a T7 bacteriophage system. API M358R fused to the C-terminus of T7 capsid protein 10B was directly shown to form denaturation-resistant complexes with thrombin by electrophoresis and immunoblotting following exposure of intact phages to thrombin. We therefore developed a biopanning protocol in which thrombin-reactive phages were selected using biotinylated anti-thrombin antibodies and streptavidin-coated magnetic beads. A library consisting of displayed API randomized at residues 357 and 358 (P2-P1 yielded predominantly Pro-Arg at these positions after five rounds of thrombin selection; in contrast the same degree of mock selection yielded only non-functional variants. A more diverse library of API M358R randomized at residues 352-356 (P7-P3 was also probed, yielding numerous variants fitting a loose consensus of DLTVS as judged by sequencing of the inserts of plaque-purified phages. The thrombin-selected sequences were transferred en masse into bacterial expression plasmids, and lysates from individual colonies were screening for API-thrombin complexing. The most active candidates from this sixth round of screening contained DITMA and AAFVS at P7-P3 and inhibited thrombin 2.1-fold more rapidly than API M358R with no change in reaction stoichiometry. Deep sequencing using the Ion Torrent platform confirmed that over 800 sequences were significantly enriched in the thrombin-panned versus naïve phage display library, including some detected using the combined phage display/bacterial lysate screening approach. Our results show that API joins Plasminogen Activator Inhibitor-1 (PAI-1 as a serpin amenable to phage display and suggest the utility of this approach for the selection

  8. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  9. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  10. Aspects of insertion in random trees

    NARCIS (Netherlands)

    Bagchi, Arunabha; Reingold, E.M.

    1982-01-01

    A method formulated by Yao and used by Brown has yielded bounds on the fraction of nodes with specified properties in trees bult by a sequence of random internal nodes in a random tree built by binary search and insertion, and show that in such a tree about bounds better than those now known. We

  11. The genetic consequences of selection in natural populations.

    Science.gov (United States)

    Thurman, Timothy J; Barrett, Rowan D H

    2016-04-01

    The selection coefficient, s, quantifies the strength of selection acting on a genetic variant. Despite this parameter's central importance to population genetic models, until recently we have known relatively little about the value of s in natural populations. With the development of molecular genetic techniques in the late 20th century and the sequencing technologies that followed, biologists are now able to identify genetic variants and directly relate them to organismal fitness. We reviewed the literature for published estimates of natural selection acting at the genetic level and found over 3000 estimates of selection coefficients from 79 studies. Selection coefficients were roughly exponentially distributed, suggesting that the impact of selection at the genetic level is generally weak but can occasionally be quite strong. We used both nonparametric statistics and formal random-effects meta-analysis to determine how selection varies across biological and methodological categories. Selection was stronger when measured over shorter timescales, with the mean magnitude of s greatest for studies that measured selection within a single generation. Our analyses found conflicting trends when considering how selection varies with the genetic scale (e.g., SNPs or haplotypes) at which it is measured, suggesting a need for further research. Besides these quantitative conclusions, we highlight key issues in the calculation, interpretation, and reporting of selection coefficients and provide recommendations for future research. © 2016 John Wiley & Sons Ltd.

  12. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...... to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during...

  13. A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

    Science.gov (United States)

    Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

    2015-01-01

    A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961

  14. A New Dynamic Multicriteria Decision-Making Approach for Green Supplier Selection in Construction Projects under Time Sequence

    Directory of Open Access Journals (Sweden)

    Shi Yin

    2017-01-01

    Full Text Available Nowadays, due to the lack of natural resources and environment problems which have been appearing increasingly, green building is more and more involved in the construction industry. The evaluation and selection of green supplier are a significant part of the development of green building. In this paper, we propose a new dynamic multicriteria decision-making approach in construction projects under time sequence to deal with these problems. First, the paper establishes 4 main criteria and 17 subcriteria for green supplier selection in construction projects. Then, a method considering interaction between criteria and the influence of constructors subjective preference and objective criteria information is proposed. It uses the interval-valued intuitionistic fuzzy geometric weighted Heronian means (IVIFGWHM operator and multitarget nonlinear programming model to calculate the comprehensive evaluation results of potential green suppliers. The proposed method is much easier for constructors to select green supplier and make the localization of green supplier more practical and accurate in construction projects. Finally, a case study about a green building project is given to verify practicality and effectiveness of the proposed approach.

  15. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  16. Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy.

    Directory of Open Access Journals (Sweden)

    Lina Zhang

    Full Text Available Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information, PSSM (Position Specific Scoring Matrix, RSA (Relative Solvent Accessibility, and CTD (Composition, Transition, Distribution. The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest, SMO (Sequential Minimal Optimization, NNA (Nearest Neighbor Algorithm, and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew's Correlation Coefficient of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc.

  17. Shaping the spectrum of random-phase radar waveforms

    Science.gov (United States)

    Doerry, Armin W.; Marquette, Brandeis

    2017-05-09

    The various technologies presented herein relate to generation of a desired waveform profile in the form of a spectrum of apparently random noise (e.g., white noise or colored noise), but with precise spectral characteristics. Hence, a waveform profile that could be readily determined (e.g., by a spoofing system) is effectively obscured. Obscuration is achieved by dividing the waveform into a series of chips, each with an assigned frequency, wherein the sequence of chips are subsequently randomized. Randomization can be a function of the application of a key to the chip sequence. During processing of the echo pulse, a copy of the randomized transmitted pulse is recovered or regenerated against which the received echo is correlated. Hence, with the echo energy range-compressed in this manner, it is possible to generate a radar image with precise impulse response.

  18. The mechanism of selecting priority options and sequence of solutions to ensure the safety of towing caravan navigation

    Directory of Open Access Journals (Sweden)

    Brazhnyy A. I.

    2016-12-01

    Full Text Available The paper considers navigational safety movement of the towed object and tugboat. Some methods of alternatives choice of decision-maker persons (DMP to manage the safety of towing a caravan have been presented. Examples of navigation safety conditions, their operational relationship, critical and emergency character with respect to the predetermined position of the band have been given; the sequence of forming actions of the algorithm based on an assessment of possible losses for the sustainability of the towing operation has been proposed. The aim is to research and develop a sequence of logical conclusions for forming some mechanism to take corrective action when navigating the composition of the output of the towing caravan of navigation stable security situation for its return to the original settings of sustainable movement. To assess the risks of navigation functions the risk analysis method has been used, with further selection of management synthesized for towing process, taking into account the possible reaction of the "human element". For estimations well-known in the probability theory and mathematical statistics methods have been used. An iterative mathematical model of the tow caravan considering the posterior distribution of parameters caravan phase coordinates for the given cases has been built. The possibility of resource losses when choosing the optimal alternative has been investigated. It has been established that while selecting and adopting a single decision under risk, including alternatives to the losses, one can always find such an alternative which is able to ensure the management of the caravan navigational safety decreasing the probability of large losses. When operating the navigation state of towing caravan one should choose the optimal alternative that minimizes the possibility of the maximum cost, and then based on it to choose (with incomplete awareness of decision-makers optimal sequence of controls which in turn

  19. Two-year Randomized Clinical Trial of Self-etching Adhesives and Selective Enamel Etching.

    Science.gov (United States)

    Pena, C E; Rodrigues, J A; Ely, C; Giannini, M; Reis, A F

    2016-01-01

    The aim of this randomized, controlled prospective clinical trial was to evaluate the clinical effectiveness of restoring noncarious cervical lesions with two self-etching adhesive systems applied with or without selective enamel etching. A one-step self-etching adhesive (Xeno V(+)) and a two-step self-etching system (Clearfil SE Bond) were used. The effectiveness of phosphoric acid selective etching of enamel margins was also evaluated. Fifty-six cavities were restored with each adhesive system and divided into two subgroups (n=28; etch and non-etch). All 112 cavities were restored with the nanohybrid composite Esthet.X HD. The clinical effectiveness of restorations was recorded in terms of retention, marginal integrity, marginal staining, caries recurrence, and postoperative sensitivity after 3, 6, 12, 18, and 24 months (modified United States Public Health Service). The Friedman test detected significant differences only after 18 months for marginal staining in the groups Clearfil SE non-etch (p=0.009) and Xeno V(+) etch (p=0.004). One restoration was lost during the trial (Xeno V(+) etch; p>0.05). Although an increase in marginal staining was recorded for groups Clearfil SE non-etch and Xeno V(+) etch, the clinical effectiveness of restorations was considered acceptable for the single-step and two-step self-etching systems with or without selective enamel etching in this 24-month clinical trial.

  20. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  1. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

    Directory of Open Access Journals (Sweden)

    Himmelreich Uwe

    2009-07-01

    Full Text Available Abstract Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

  2. Pure-Phase Selective Excitation in Fast-Relaxing Systems

    Science.gov (United States)

    Zangger, Klaus; Oberer, Monika; Sterk, Heinz

    2001-09-01

    Selective pulses have been used frequently for small molecules. However, their application to proteins and other macromolecules has been limited. The long duration of shaped-selective pulses and the short T2 relaxation times in proteins often prohibited the use of highly selective pulses especially on larger biomolecules. A very selective excitation can be obtained within a short time by using the selective excitation sequence presented in this paper. Instead of using a shaped low-intensity radiofrequency pulse, a cluster of hard 90° pulses, delays of free precession, and pulsed field gradients can be used to selectively excite a narrow chemical shift range within a relatively short time. Thereby, off-resonance magnetization, which is allowed to evolve freely during the free precession intervals, is destroyed by the gradient pulses. Off-resonance excitation artifacts can be removed by random variation of the interpulse delays. This leads to an excitation profile with selectivity as well as phase and relaxation behavior superior to that of commonly used shaped-selective pulses. Since the evolution of scalar coupling is inherently suppressed during the double-selective excitation of two different scalar-coupled nuclei, the presented pulse cluster is especially suited for simultaneous highly selective excitation of N-H and C-H fragments. Experimental examples are demonstrated on hen egg white lysozyme (14 kD) and the bacterial antidote ParD (19 kD).

  3. Random effect selection in generalised linear models

    DEFF Research Database (Denmark)

    Denwood, Matt; Houe, Hans; Forkman, Björn

    We analysed abattoir recordings of meat inspection codes with possible relevance to onfarm animal welfare in cattle. Random effects logistic regression models were used to describe individual-level data obtained from 461,406 cattle slaughtered in Denmark. Our results demonstrate that the largest...

  4. Molecular study and nucleotide sequencing of Chlamydia abortus isolated from aborted sheep fetuses ewes of Alborz province

    Directory of Open Access Journals (Sweden)

    amirreza ebadi

    2015-02-01

    Full Text Available Chlamydia is an obligate intracellular and gram negative coccobacilli and one of the most important causes of abortion in ruminants especially in ewes. This investigation was performed with the purpose of molecular study and sequencing of Chlamydia abortus isolated from aborted sheep fetuses of Alborz Province. In this study, DNA extraction was performed on 100 samples from aborted fetuses of 32 sheep flocks from different areas of Alborz province. Then using specific primers of gene IGS-Sr- RNA, polymerase chain reaction was conducted and 10 samples were selected randomly from the positive cases were sent to Macrogene company in Korea for sequencing. In this study, 37 samples from a total of 100 aborted fetuses were positive for Chlamydia abortus. After sequencing, more than 99 percent of the positive samples were similar with sequences in gene bank. The sequencing results indicated that the samples were very similar to isolates LN554882/1, AF051935/1 and CR848038/1 of the gene bank and were in the same cluster. Also, this investigation indicated that Chlamydia abortus is one of the main reasons of ewe abortion in Alborz province.

  5. DNA Sequences of RAPD Fragments in the Egyptian cotton ...

    African Journals Online (AJOL)

    Random Amplified Polymorphic DNAs (RAPDs) is a DNA polymorphism assay based on the amplification of random DNA segments with single primers of arbitrary nucleotide sequence. Despite the fact that the RAPD technique has become a very powerful tool and has found use in numerous applications, yet, the nature of ...

  6. Selection of aptamers for Candida albicans by cell-SELEX; Selecao de aptameros para Candida albicans por cell-SELEX

    Energy Technology Data Exchange (ETDEWEB)

    Miranda, Alessandra Nunes Duarte

    2017-07-01

    The growing concern with invasive fungal infections, responsible for an alarming mortality rate of immunosuppressed patients and in Intensive Care Units, evidences the need for a fast and specific method for the Candida albicans detection, since this species is identified as one of the main causes of septicemia. Commonly, it is a challenge for clinicians to determine the primary infection foci, the dissemination degree, or whether the site of a particular surgery is involved. Although scintigraphic imaging represents a promising tool for infectious foci detection, it still lacks a methodology for C. albicans diagnosis due to the absence of specific radiotracers for this microorganism. Aptamers are molecules that have almost ideal properties for use as diagnostic radiopharmaceuticals, such as high specificity for their molecular targets, lack of immunogenicity and toxicity, high tissue penetration and rapid blood clearance. Aptamers can also be labeled with different radionuclides. This work aims to obtain aptamers for specific binding to C. albicans cells for future application as a radiopharmaceutical. It was used a variation of the SELEX (Systematic Evolution of Ligands by EXponential Enrichment) technique, termed cell-SELEX, in which cells are the targets for selection. A selection protocol was standardized using a random library of single-stranded oligonucleotides, each containing two fixed regions flanking a sequence of 40 random nucleotides. This library was incubated with C. albicans cells in the presence of competitors. Then, the binding sequences were separated by centrifugation, resuspended and amplified by PCR. The amplification was confirmed by agarose gel electrophoresis. After that, the ligands were purified to obtain a new pool of ssDNA, from which a new incubation was carried out. The selection parameters were gradually modified in order to increase stringency. This cycle was repeated 12 times to allow the selection of sequences with the maximum

  7. Comparison of three human papillomavirus DNA detection methods: Next generation sequencing, multiplex-PCR and nested-PCR followed by Sanger based sequencing.

    Science.gov (United States)

    da Fonseca, Allex Jardim; Galvão, Renata Silva; Miranda, Angelica Espinosa; Ferreira, Luiz Carlos de Lima; Chen, Zigui

    2016-05-01

    To compare the diagnostic performance for HPV infection using three laboratorial techniques. Ninty-five cervicovaginal samples were randomly selected; each was tested for HPV DNA and genotypes using 3 methods in parallel: Multiplex-PCR, the Nested PCR followed by Sanger sequencing, and the Next_Gen Sequencing (NGS) with two assays (NGS-A1, NGS-A2). The study was approved by the Brazilian National IRB (CONEP protocol 16,800). The prevalence of HPV by the NGS assays was higher than that using the Multiplex-PCR (64.2% vs. 45.2%, respectively; P = 0.001) and the Nested-PCR (64.2% vs. 49.5%, respectively; P = 0.003). NGS also showed better performance in detecting high-risk HPV (HR-HPV) and HPV16. There was a weak interobservers agreement between the results of Multiplex-PCR and Nested-PCR in relation to NGS for the diagnosis of HPV infection, and a moderate correlation for HR-HPV detection. Both NGS assays showed a strong correlation for detection of HPVs (k = 0.86), HR-HPVs (k = 0.91), HPV16 (k = 0.92) and HPV18 (k = 0.91). NGS is more sensitive than the traditional Sanger sequencing and the Multiplex PCR to genotype HPVs, with promising ability to detect multiple infections, and may have the potential to establish an alternative method for the diagnosis and genotyping of HPV. © 2015 Wiley Periodicals, Inc.

  8. Modelling estimation and analysis of dynamic processes from image sequences using temporal random closed sets and point processes with application to the cell exocytosis and endocytosis

    OpenAIRE

    Díaz Fernández, Ester

    2010-01-01

    In this thesis, new models and methodologies are introduced for the analysis of dynamic processes characterized by image sequences with spatial temporal overlapping. The spatial temporal overlapping exists in many natural phenomena and should be addressed properly in several Science disciplines such as Microscopy, Material Sciences, Biology, Geostatistics or Communication Networks. This work is related to the Point Process and Random Closed Set theories, within Stochastic Ge...

  9. A large-scale study of the random variability of a coding sequence: a study on the CFTR gene.

    Science.gov (United States)

    Modiano, Guido; Bombieri, Cristina; Ciminelli, Bianca Maria; Belpinati, Francesca; Giorgi, Silvia; Georges, Marie des; Scotet, Virginie; Pompei, Fiorenza; Ciccacci, Cinzia; Guittard, Caroline; Audrézet, Marie Pierre; Begnini, Angela; Toepfer, Michael; Macek, Milan; Ferec, Claude; Claustres, Mireille; Pignatti, Pier Franco

    2005-02-01

    Coding single nucleotide substitutions (cSNSs) have been studied on hundreds of genes using small samples (n(g) approximately 100-150 genes). In the present investigation, a large random European population sample (average n(g) approximately 1500) was studied for a single gene, the CFTR (Cystic Fibrosis Transmembrane conductance Regulator). The nonsynonymous (NS) substitutions exhibited, in accordance with previous reports, a mean probability of being polymorphic (q > 0.005), much lower than that of the synonymous (S) substitutions, but they showed a similar rate of subpolymorphic (q < 0.005) variability. This indicates that, in autosomal genes that may have harmful recessive alleles (nonduplicated genes with important functions), genetic drift overwhelms selection in the subpolymorphic range of variability, making disadvantageous alleles behave as neutral. These results imply that the majority of the subpolymorphic nonsynonymous alleles of these genes are selectively negative or even pathogenic.

  10. RAPD and Internal Transcribed Spacer Sequence Analyses Reveal Zea nicaraguensis as a Section Luxuriantes Species Close to Zea luxurians

    Science.gov (United States)

    Wang, Pei; Lu, Yanli; Zheng, Mingmin; Rong, Tingzhao; Tang, Qilin

    2011-01-01

    Genetic relationship of a newly discovered teosinte from Nicaragua, Zea nicaraguensis with waterlogging tolerance, was determined based on randomly amplified polymorphic DNA (RAPD) markers and the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA using 14 accessions from Zea species. RAPD analysis showed that a total of 5,303 fragments were produced by 136 random decamer primers, of which 84.86% bands were polymorphic. RAPD-based UPGMA analysis demonstrated that the genus Zea can be divided into section Luxuriantes including Zea diploperennis, Zea luxurians, Zea perennis and Zea nicaraguensis, and section Zea including Zea mays ssp. mexicana, Zea mays ssp. parviglumis, Zea mays ssp. huehuetenangensis and Zea mays ssp. mays. ITS sequence analysis showed the lengths of the entire ITS region of the 14 taxa in Zea varied from 597 to 605 bp. The average GC content was 67.8%. In addition to the insertion/deletions, 78 variable sites were recorded in the total ITS region with 47 in ITS1, 5 in 5.8S, and 26 in ITS2. Sequences of these taxa were analyzed with neighbor-joining (NJ) and maximum parsimony (MP) methods to construct the phylogenetic trees, selecting Tripsacum dactyloides L. as the outgroup. The phylogenetic relationships of Zea species inferred from the ITS sequences are highly concordant with the RAPD evidence that resolved two major subgenus clades. Both RAPD and ITS sequence analyses indicate that Zea nicaraguensis is more closely related to Zea luxurians than the other teosintes and cultivated maize, which should be regarded as a section Luxuriantes species. PMID:21525982

  11. Random Item Generation Is Affected by Age

    Science.gov (United States)

    Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal

    2016-01-01

    Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…

  12. Reconstruction of DNA sequences using genetic algorithms and cellular automata: towards mutation prediction?

    Science.gov (United States)

    Mizas, Ch; Sirakoulis, G Ch; Mardiris, V; Karafyllidis, I; Glykos, N; Sandaltzopoulos, R

    2008-04-01

    Change of DNA sequence that fuels evolution is, to a certain extent, a deterministic process because mutagenesis does not occur in an absolutely random manner. So far, it has not been possible to decipher the rules that govern DNA sequence evolution due to the extreme complexity of the entire process. In our attempt to approach this issue we focus solely on the mechanisms of mutagenesis and deliberately disregard the role of natural selection. Hence, in this analysis, evolution refers to the accumulation of genetic alterations that originate from mutations and are transmitted through generations without being subjected to natural selection. We have developed a software tool that allows modelling of a DNA sequence as a one-dimensional cellular automaton (CA) with four states per cell which correspond to the four DNA bases, i.e. A, C, T and G. The four states are represented by numbers of the quaternary number system. Moreover, we have developed genetic algorithms (GAs) in order to determine the rules of CA evolution that simulate the DNA evolution process. Linear evolution rules were considered and square matrices were used to represent them. If DNA sequences of different evolution steps are available, our approach allows the determination of the underlying evolution rule(s). Conversely, once the evolution rules are deciphered, our tool may reconstruct the DNA sequence in any previous evolution step for which the exact sequence information was unknown. The developed tool may be used to test various parameters that could influence evolution. We describe a paradigm relying on the assumption that mutagenesis is governed by a near-neighbour-dependent mechanism. Based on the satisfactory performance of our system in the deliberately simplified example, we propose that our approach could offer a starting point for future attempts to understand the mechanisms that govern evolution. The developed software is open-source and has a user-friendly graphical input interface.

  13. Differentiating Visual from Response Sequencing during Long-term Skill Learning.

    Science.gov (United States)

    Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy

    2017-01-01

    The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.

  14. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  15. Frequency characteristics of coordinate sequences of linear recurrences over Galois rings

    Science.gov (United States)

    Kamlovskii, O. V.

    2013-12-01

    We consider some properties of the coordinate sequences of linear recurrences over Galois rings which characterize the possibility of regarding them as pseudo-random sequences. We study the periodicity properties, linear complexity and frequency characteristics of these sequences. Up to now, these parameters have been studied mainly in the case when the linear recurring sequence has maximal possible period. We investigate the coordinate sequences of linear recurrences of not necessarily maximal period. We obtain sharpened and generalized estimates for the number of elements and r-patterns on the cycles and intervals of these sequences.

  16. Frequency characteristics of coordinate sequences of linear recurrences over Galois rings

    International Nuclear Information System (INIS)

    Certification Research Center, Moscow (Russian Federation))" data-affiliation=" (LLC Certification Research Center, Moscow (Russian Federation))" >Kamlovskii, O V

    2013-01-01

    We consider some properties of the coordinate sequences of linear recurrences over Galois rings which characterize the possibility of regarding them as pseudo-random sequences. We study the periodicity properties, linear complexity and frequency characteristics of these sequences. Up to now, these parameters have been studied mainly in the case when the linear recurring sequence has maximal possible period. We investigate the coordinate sequences of linear recurrences of not necessarily maximal period. We obtain sharpened and generalized estimates for the number of elements and r-patterns on the cycles and intervals of these sequences

  17. Random forest variable selection in spatial malaria transmission modelling in Mpumalanga Province, South Africa

    Directory of Open Access Journals (Sweden)

    Thandi Kapwata

    2016-11-01

    Full Text Available Malaria is an environmentally driven disease. In order to quantify the spatial variability of malaria transmission, it is imperative to understand the interactions between environmental variables and malaria epidemiology at a micro-geographic level using a novel statistical approach. The random forest (RF statistical learning method, a relatively new variable-importance ranking method, measures the variable importance of potentially influential parameters through the percent increase of the mean squared error. As this value increases, so does the relative importance of the associated variable. The principal aim of this study was to create predictive malaria maps generated using the selected variables based on the RF algorithm in the Ehlanzeni District of Mpumalanga Province, South Africa. From the seven environmental variables used [temperature, lag temperature, rainfall, lag rainfall, humidity, altitude, and the normalized difference vegetation index (NDVI], altitude was identified as the most influential predictor variable due its high selection frequency. It was selected as the top predictor for 4 out of 12 months of the year, followed by NDVI, temperature and lag rainfall, which were each selected twice. The combination of climatic variables that produced the highest prediction accuracy was altitude, NDVI, and temperature. This suggests that these three variables have high predictive capabilities in relation to malaria transmission. Furthermore, it is anticipated that the predictive maps generated from predictions made by the RF algorithm could be used to monitor the progression of malaria and assist in intervention and prevention efforts with respect to malaria.

  18. Selecting Optimal Parameters of Random Linear Network Coding for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Heide, J; Zhang, Qi; Fitzek, F H P

    2013-01-01

    This work studies how to select optimal code parameters of Random Linear Network Coding (RLNC) in Wireless Sensor Networks (WSNs). With Rateless Deluge [1] the authors proposed to apply Network Coding (NC) for Over-the-Air Programming (OAP) in WSNs, and demonstrated that with NC a significant...... reduction in the number of transmitted packets can be achieved. However, NC introduces additional computations and potentially a non-negligible transmission overhead, both of which depend on the chosen coding parameters. Therefore it is necessary to consider the trade-off that these coding parameters...... present in order to obtain the lowest energy consumption per transmitted bit. This problem is analyzed and suitable coding parameters are determined for the popular Tmote Sky platform. Compared to the use of traditional RLNC, these parameters enable a reduction in the energy spent per bit which grows...

  19. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  20. Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence

    NARCIS (Netherlands)

    Al-Shahib, A.; Breitling, R.; Gilbert, D.

    2005-01-01

    Abstract: When the standard approach to predict protein function by sequence homology fails, other alternative methods can be used that require only the amino acid sequence for predicting function. One such approach uses machine learning to predict protein function directly from amino acid sequence

  1. A Novel Indirect Sequence Readout Component in the E. coli Cyclic AMP Receptor Protein Operator

    DEFF Research Database (Denmark)

    Lindemose, Søren; Nielsen, Peter Eigil; Valentin-Hansen, Poul

    2014-01-01

    binding sites in the E. coli genome, but the exact role of the N6 region in CRP interaction has not previously been systematic examined. Here we employ an in vitro selection system based on a randomized N6 spacer region to demonstrate that CRP binding to the lacP1 site may be enhanced up to 14-fold......The cyclic AMP receptor protein (CRP) from Escherichia coli has been extensively studied for several decades. In particular, a detailed characterization of CRP interaction with DNA has been obtained. The CRP dimer recognizes a consensus sequence AANTGTGANNNNNNTCACANTT through direct amino acid...

  2. A Pareto-Based Adaptive Variable Neighborhood Search for Biobjective Hybrid Flow Shop Scheduling Problem with Sequence-Dependent Setup Time

    Directory of Open Access Journals (Sweden)

    Huixin Tian

    2016-01-01

    Full Text Available Different from most researches focused on the single objective hybrid flowshop scheduling (HFS problem, this paper investigates a biobjective HFS problem with sequence dependent setup time. The two objectives are the minimization of total weighted tardiness and the total setup time. To efficiently solve this problem, a Pareto-based adaptive biobjective variable neighborhood search (PABOVNS is developed. In the proposed PABOVNS, a solution is denoted as a sequence of all jobs and a decoding procedure is presented to obtain the corresponding complete schedule. In addition, the proposed PABOVNS has three major features that can guarantee a good balance of exploration and exploitation. First, an adaptive selection strategy of neighborhoods is proposed to automatically select the most promising neighborhood instead of the sequential selection strategy of canonical VNS. Second, a two phase multiobjective local search based on neighborhood search and path relinking is designed for each selected neighborhood. Third, an external archive with diversity maintenance is adopted to store the nondominated solutions and at the same time provide initial solutions for the local search. Computational results based on randomly generated instances show that the PABOVNS is efficient and even superior to some other powerful multiobjective algorithms in the literature.

  3. Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence.

    Science.gov (United States)

    Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E

    2016-01-01

    It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven

  4. Application of random coherence order selection in gradient-enhanced multidimensional NMR

    International Nuclear Information System (INIS)

    Bostock, Mark J.; Nietlispach, Daniel

    2016-01-01

    Development of multidimensional NMR is essential to many applications, for example in high resolution structural studies of biomolecules. Multidimensional techniques enable separation of NMR signals over several dimensions, improving signal resolution, whilst also allowing identification of new connectivities. However, these advantages come at a significant cost. The Fourier transform theorem requires acquisition of a grid of regularly spaced points to satisfy the Nyquist criterion, while frequency discrimination and acquisition of a pure phase spectrum require acquisition of both quadrature components for each time point in every indirect (non-acquisition) dimension, adding a factor of 2 N -1 to the number of free- induction decays which must be acquired, where N is the number of dimensions. Compressed sensing (CS) ℓ 1 -norm minimisation in combination with non-uniform sampling (NUS) has been shown to be extremely successful in overcoming the Nyquist criterion. Previously, maximum entropy reconstruction has also been used to overcome the limitation of frequency discrimination, processing data acquired with only one quadrature component at a given time interval, known as random phase detection (RPD), allowing a factor of two reduction in the number of points for each indirect dimension (Maciejewski et al. 2011 PNAS 108 16640). However, whilst this approach can be easily applied in situations where the quadrature components are acquired as amplitude modulated data, the same principle is not easily extended to phase modulated (P-/N-type) experiments where data is acquired in the form exp (iωt) or exp (-iωt), and which make up many of the multidimensional experiments used in modern NMR. Here we demonstrate a modification of the CS ℓ 1 -norm approach to allow random coherence order selection (RCS) for phase modulated experiments; we generalise the nomenclature for RCS and RPD as random quadrature detection (RQD). With this method, the power of RQD can be extended

  5. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  6. Transcriptome sequencing of mung bean (Vigna radiate L.) genes and the identification of EST-SSR markers.

    Science.gov (United States)

    Chen, Honglin; Wang, Lixia; Wang, Suhua; Liu, Chunji; Blair, Matthew Wohlgemuth; Cheng, Xuzhen

    2015-01-01

    Mung bean (Vigna radiate (L.) Wilczek) is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0%) and 23,235 (47.7%) showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5%) could be classified into gene ontology categories, 18,316 (37.6%) into Swiss-Prot categories and 10,918 (22.4%) into KOG database categories (E-value SSR), and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST). A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%), GAA/TTC (12.6%), AAAT/ATTT (6.8%), AAAAT/ATTTT (6.2%) and AAAAAT/ATTTTT (1.9%). A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association or comparative mapping and genetic analyses of various Vigna species.

  7. Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method.

    Science.gov (United States)

    Honda, Shozo; Morichika, Keisuke; Kirino, Yohei

    2016-03-01

    RNA digestions catalyzed by many ribonucleases generate RNA fragments that contain a 2',3'-cyclic phosphate (cP) at their 3' termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named cP-RNA-seq that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3' termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment; hence, subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ∼6 d, excluding the time required for sequencing and bioinformatics analyses, which are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ∼3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5'-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes.

  8. Targeted reduction of highly abundant transcripts using pseudo-random primers.

    Science.gov (United States)

    Arnaud, Ophélie; Kato, Sachi; Poulain, Stéphane; Plessy, Charles

    2016-04-01

    Transcriptome studies based on quantitative sequencing can estimate levels of gene expression by measuring target RNA abundance in sequencing libraries. Sequencing costs are proportional to the total number of sequenced reads, and in order to cover rare RNAs, considerable quantities of abundant and identical reads are needed. This major limitation can be addressed by depleting a proportion of the most abundant sequences from the library. However, such depletion strategies involve either extra handling of the input RNA sample or use of a large number of reverse transcription primers, termed not-so-random (NSR) primers, which are costly to synthesize. Taking advantage of the high tolerance of reverse transcriptase to mis-prime, we found that it is possible to use as few as 40 pseudo-random (PS) reverse transcription primers to decrease the rate of undesirable abundant sequences within a library without affecting the overall transcriptome diversity. PS primers are simple to design and can be used to deplete several undesirable RNAs simultaneously, thus creating a flexible tool for enriching transcriptome libraries for rare transcript sequences.

  9. Study of RNA structures with a connection to random matrix theory

    International Nuclear Information System (INIS)

    Bhadola, Pradeep; Deo, Nivedita

    2015-01-01

    This manuscript investigates the level of complexity and thermodynamic properties of the real RNA structures and compares the properties with the random RNA sequences. A discussion on the similarities of thermodynamical properties of the real structures with the non linear random matrix model of RNA folding is presented. The structural information contained in the PDB file is exploited to get the base pairing information. The complexity of an RNA structure is defined by a topological quantity called genus which is calculated from the base pairing information. Thermodynamic analysis of the real structures is done numerically. The real structures have a minimum free energy which is very small compared to the randomly generated sequences of the same length. This analysis suggests that there are specific patterns in the structures which are preserved during the evolution of the sequences and certain sequences are discarded by the evolutionary process. Further analyzing the sequences of a fixed length reveal that the RNA structures exist in ensembles i.e. although all the sequences in the ensemble have different series of nucleotides (sequence) they fold into structures that have the same pairs of hydrogen bonding as well as the same minimum free energy. The specific heat of the RNA molecule is numerically estimated at different lengths. The specific heat curve with temperature shows a bump and for some RNA, a double peak behavior is observed. The same behavior is seen in the study of the random matrix model with non linear interaction of RNA folding. The bump in the non linear matrix model can be controlled by the change in the interaction strength.

  10. On Random Numbers and Design

    Science.gov (United States)

    Ben-Ari, Morechai

    2004-01-01

    The term "random" is frequently used in discussion of the theory of evolution, even though the mathematical concept of randomness is problematic and of little relevance in the theory. Therefore, since the core concept of the theory of evolution is the non-random process of natural selection, the term random should not be used in teaching the…

  11. Chimera: construction of chimeric sequences for phylogenetic analysis

    NARCIS (Netherlands)

    Leunissen, J.A.M.

    2003-01-01

    Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output

  12. Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2014-10-01

    Full Text Available As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript—Slc2a9, likely related to desert osmoregulation—undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

  13. Selecting for Fast Protein-Protein Association As Demonstrated on a Random TEM1 Yeast Library Binding BLIP.

    Science.gov (United States)

    Cohen-Khait, Ruth; Schreiber, Gideon

    2018-04-27

    Protein-protein interactions mediate the vast majority of cellular processes. Though protein interactions obey basic chemical principles also within the cell, the in vivo physiological environment may not allow for equilibrium to be reached. Thus, in vitro measured thermodynamic affinity may not provide a complete picture of protein interactions in the biological context. Binding kinetics composed of the association and dissociation rate constants are relevant and important in the cell. Therefore, changes in protein-protein interaction kinetics have a significant impact on the in vivo activity of the proteins. The common protocol for the selection of tighter binders from a mutant library selects for protein complexes with slower dissociation rate constants. Here we describe a method to specifically select for variants with faster association rate constants by using pre-equilibrium selection, starting from a large random library. Toward this end, we refine the selection conditions of a TEM1-β-lactamase library against its natural nanomolar affinity binder β-lactamase inhibitor protein (BLIP). The optimal selection conditions depend on the ligand concentration and on the incubation time. In addition, we show that a second sort of the library helps to separate signal from noise, resulting in a higher percent of faster binders in the selected library. Fast associating protein variants are of particular interest for drug development and other biotechnological applications.

  14. Cooperation of deterministic dynamics and random noise in production of complex syntactical avian song sequences: a neural network model

    Directory of Open Access Journals (Sweden)

    Yuichi eYamashita

    2011-04-01

    Full Text Available How the brain learns and generates temporal sequences is a fundamental issue in neuroscience. The production of birdsongs, a process which involves complex learned sequences, provides researchers with an excellent biological model for this topic. The Bengalese finch in particular learns a highly complex song with syntactical structure. The nucleus HVC (HVC, a premotor nucleus within the avian song system, plays a key role in generating the temporal structures of their songs. From lesion studies, the nucleus interfacialis (NIf projecting to the HVC is considered one of the essential regions that contribute to the complexity of their songs. However, the types of interaction between the HVC and the NIf that can produce complex syntactical songs remain unclear. In order to investigate the function of interactions between the HVC and NIf, we have proposed a neural network model based on previous biological evidence. The HVC is modeled by a recurrent neural network (RNN that learns to generate temporal patterns of songs. The NIf is modeled as a mechanism that provides auditory feedback to the HVC and generates random noise that feeds into the HVC. The model showed that complex syntactical songs can be replicated by simple interactions between deterministic dynamics of the RNN and random noise. In the current study, the plausibility of the model is tested by the comparison between the changes in the songs of actual birds induced by pharmacological inhibition of the NIf and the changes in the songs produced by the model resulting from modification of parameters representing NIf functions. The efficacy of the model demonstrates that the changes of songs induced by pharmacological inhibition of the NIf can be interpreted as a trade-off between the effects of noise and the effects of feedback on the dynamics of the RNN of the HVC. These facts suggest that the current model provides a convincing hypothesis for the functional role of NIf-HVC interaction.

  15. Gift from statistical learning: Visual statistical learning enhances memory for sequence elements and impairs memory for items that disrupt regularities.

    Science.gov (United States)

    Otsuka, Sachio; Saiki, Jun

    2016-02-01

    Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Nonlinear analysis of sequence symmetry of beta-trefoil family proteins

    Energy Technology Data Exchange (ETDEWEB)

    Li Mingfeng [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Huang Yanzhao [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xu Ruizhen [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China); Xiao Yi [Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei (China)]. E-mail: yxiao@mail.hust.edu.cn

    2005-07-01

    The tertiary structures of proteins of beta-trefoil family have three-fold quasi-symmetry while their amino acid sequences appear almost at random. In the present paper we show that these amino acid sequences have hidden symmetries in fact and furthermore the degrees of these hidden symmetries are the same as those of their tertiary structures. We shall present a modified recurrence plot to reveal hidden symmetries in protein sequences. Our results can explain the contradiction in sequence-structure relations of proteins of beta-trefoil family.

  17. Cations form sequence selective motifs within DNA grooves via a combination of cation-pi and ion-dipole/hydrogen bond interactions.

    Science.gov (United States)

    Stewart, Mikaela; Dunlap, Tori; Dourlain, Elizabeth; Grant, Bryce; McFail-Isom, Lori

    2013-01-01

    The fine conformational subtleties of DNA structure modulate many fundamental cellular processes including gene activation/repression, cellular division, and DNA repair. Most of these cellular processes rely on the conformational heterogeneity of specific DNA sequences. Factors including those structural characteristics inherent in the particular base sequence as well as those induced through interaction with solvent components combine to produce fine DNA structural variation including helical flexibility and conformation. Cation-pi interactions between solvent cations or their first hydration shell waters and the faces of DNA bases form sequence selectively and contribute to DNA structural heterogeneity. In this paper, we detect and characterize the binding patterns found in cation-pi interactions between solvent cations and DNA bases in a set of high resolution x-ray crystal structures. Specifically, we found that monovalent cations (Tl⁺) and the polarized first hydration shell waters of divalent cations (Mg²⁺, Ca²⁺) form cation-pi interactions with DNA bases stabilizing unstacked conformations. When these cation-pi interactions are combined with electrostatic interactions a pattern of specific binding motifs is formed within the grooves.

  18. On the design of henon and logistic map-based random number generator

    Science.gov (United States)

    Magfirawaty; Suryadi, M. T.; Ramli, Kalamullah

    2017-10-01

    The key sequence is one of the main elements in the cryptosystem. True Random Number Generators (TRNG) method is one of the approaches to generating the key sequence. The randomness source of the TRNG divided into three main groups, i.e. electrical noise based, jitter based and chaos based. The chaos based utilizes a non-linear dynamic system (continuous time or discrete time) as an entropy source. In this study, a new design of TRNG based on discrete time chaotic system is proposed, which is then simulated in LabVIEW. The principle of the design consists of combining 2D and 1D chaotic systems. A mathematical model is implemented for numerical simulations. We used comparator process as a harvester method to obtain the series of random bits. Without any post processing, the proposed design generated random bit sequence with high entropy value and passed all NIST 800.22 statistical tests.

  19. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  20. Selection in spatial working memory is independent of perceptual selective attention, but they interact in a shared spatial priority map.

    Science.gov (United States)

    Hedge, Craig; Oberauer, Klaus; Leonards, Ute

    2015-11-01

    We examined the relationship between the attentional selection of perceptual information and of information in working memory (WM) through four experiments, using a spatial WM-updating task. Participants remembered the locations of two objects in a matrix and worked through a sequence of updating operations, each mentally shifting one dot to a new location according to an arrow cue. Repeatedly updating the same object in two successive steps is typically faster than switching to the other object; this object switch cost reflects the shifting of attention in WM. In Experiment 1, the arrows were presented in random peripheral locations, drawing perceptual attention away from the selected object in WM. This manipulation did not eliminate the object switch cost, indicating that the mechanisms of perceptual selection do not underlie selection in WM. Experiments 2a and 2b corroborated the independence of selection observed in Experiment 1, but showed a benefit to reaction times when the placement of the arrow cue was aligned with the locations of relevant objects in WM. Experiment 2c showed that the same benefit also occurs when participants are not able to mark an updating location through eye fixations. Together, these data can be accounted for by a framework in which perceptual selection and selection in WM are separate mechanisms that interact through a shared spatial priority map.

  1. Alpha-gamma phase amplitude coupling subserves information transfer during perceptual sequence learning.

    Science.gov (United States)

    Tzvi, Elinor; Bauhaus, Leon J; Kessler, Till U; Liebrand, Matthias; Wöstmann, Malte; Krämer, Ulrike M

    2018-03-01

    Cross-frequency coupling is suggested to serve transfer of information between wide-spread neuronal assemblies and has been shown to underlie many cognitive functions including learning and memory. In previous work, we found that alpha (8-13 Hz) - gamma (30-48 Hz) phase amplitude coupling (αγPAC) is decreased during sequence learning in bilateral frontal cortex and right parietal cortex. We interpreted this to reflect decreased demands for visuo-motor mapping once the sequence has been encoded. In the present study, we put this hypothesis to the test by adding a "simple" condition to the standard serial reaction time task (SRTT) with minimal needs for visuo-motor mapping. The standard SRTT in our paradigm entailed a perceptual sequence allowing for implicit learning of a sequence of colors with randomly assigned motor responses. Sequence learning in this case was thus not associated with reduced demands for visuo-motor mapping. Analysis of oscillatory power revealed a learning-related alpha decrease pointing to a stronger recruitment of occipito-parietal areas when encoding the perceptual sequence. Replicating our previous findings but in contrast to our hypothesis, αγPAC was decreased in sequence compared to random trials over right frontal and parietal cortex. It also tended to be smaller compared to trials requiring a simple motor sequence. We additionally analyzed αγPAC in resting-state data of a separate cohort. PAC in electrodes over right parietal cortex was significantly stronger compared to sequence trials and tended to be higher compared to simple and random trials of the SRTT data. We suggest that αγPAC in right parietal cortex reflects a "default-mode" brain state, which gets perturbed to allow for encoding of visual regularities into memory. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Analysis of selected genes associated with cardiomyopathy by next-generation sequencing.

    Science.gov (United States)

    Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan

    2018-02-01

    As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.

  3. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  4. Central limit theorems for sequences with m(n)-dependent main part

    NARCIS (Netherlands)

    Nieuwenhuis, G.

    1992-01-01

    Let (Xi(n); n ϵ N, 1⩽i⩽h(n)) be a double sequence of random variables with h(n)→∞ as n→∞. Suppose that the sequence can be split into two parts: an m(n)-dependent sequence (Xi,m(n); n ϵ N, 1⩽i⩽h(n)) of main terms and a sequence (Xi,m(n); n ϵ N, 1⩽i⩽h(n)) of residual terms. Here (m(n)) may be

  5. A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-09-01

    Full Text Available The prediction accuracy of short-term load forecast (STLF depends on prediction model choice and feature selection result. In this paper, a novel random forest (RF-based feature selection method for STLF is proposed. First, 243 related features were extracted from historical load data and the time information of prediction points to form the original feature set. Subsequently, the original feature set was used to train an RF as the original model. After the training process, the prediction error of the original model on the test set was recorded and the permutation importance (PI value of each feature was obtained. Then, an improved sequential backward search method was used to select the optimal forecasting feature subset based on the PI value of each feature. Finally, the optimal forecasting feature subset was used to train a new RF model as the final prediction model. Experiments showed that the prediction accuracy of RF trained by the optimal forecasting feature subset was higher than that of the original model and comparative models based on support vector regression and artificial neural network.

  6. Facile construction of a random protein domain insertion library using an engineered transposon.

    Science.gov (United States)

    Shah, Vandan; Pierre, Brennal; Kim, Jin Ryoun

    2013-01-15

    Insertional fusion between multiple protein domains represents a novel means of creating integrated functionalities. Currently, there is no robust guideline for selection of insertion sites ensuring the desired functional outcome of insertional fusion. Therefore, construction and testing of random domain insertion libraries, in which a host protein domain is randomly inserted into a guest protein domain, significantly benefit extensive exploration of sequence spaces for insertion sites. Short peptide residues are usually introduced between protein domains to alleviate structural conflicts, and the interdomain linker residues may affect the functional outcome of protein insertion complexes. Unfortunately, optimal control of interdomain linker residues is not always available in conventional methods used to construct random domain insertion libraries. Moreover, most conventional methods employ blunt-end rather than sticky-end ligation between host and guest DNA fragments, thus lowering library construction efficiency. Here, we report the facile construction of random domain insertion libraries using an engineered transposon. We show that random domain insertion with optimal control of interdomain linker residues was possible with our engineered transposon-based method. In addition, our method employs sticky-end rather than blunt-end ligation between host and guest DNA fragments, thus allowing for facile construction of relatively large sized libraries. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. A P-N Sequence Generator Using LFSR with Dual Edge Trigger Technique

    Directory of Open Access Journals (Sweden)

    Naghwal Nitin Kumar

    2016-01-01

    Full Text Available This paper represents the design and implementation of a low power 4-bit LFSR using Dual edge triggered flip flop. A linear feedback shift register (LFSR is assembled by N number of flip flops connected in series and a combinational logic generally xor gate. An LFSR can generate random number sequence which acts as cipher in cryptography. A known text encrypted over long PN sequence, in order to improve security sequence made longer ie 128 bit; require long chain of flip flop leads to more power consumption. In this paper a novel circuit of random sequence generator using dual edge triggered flip flop has been proposed. Data has been generated on every edge of flip flop instead of single edge. A DETFF-LFSR can generate random number require with less number of clock cycle, it minimizes the number of flip flop result in power saving. In this paper we concentrates on the designing of power competent Test Pattern Generator (TPG using four dual edge triggered flip-flops as the basic building block, overall there is reduction of power around 25% by using these techniques.

  8. [Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

    Science.gov (United States)

    Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

    2011-01-01

    The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.

  9. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  10. Direct random insertion mutagenesis of Helicobacter pylori

    NARCIS (Netherlands)

    de Jonge, Ramon; Bakker, Dennis; van Vliet, Arnoud H. M.; Kuipers, Ernst J.; Vandenbroucke-Grauls, Christina M. J. E.; Kusters, Johannes G.

    2003-01-01

    Random insertion mutagenesis is a widely used technique for the identification of bacterial virulence genes. Most strategies for random mutagenesis involve cloning in Escherichia coli for passage of plasmids or for phenotypic selection. This can result in biased selection due to restriction or

  11. Direct random insertion mutagenesis of Helicobacter pylori.

    NARCIS (Netherlands)

    Jonge, de R.; Bakker, D.; Vliet, van AH; Kuipers, E.J.; Vandenbroucke-Grauls, C.M.J.E.; Kusters, J.G.

    2003-01-01

    Random insertion mutagenesis is a widely used technique for the identification of bacterial virulence genes. Most strategies for random mutagenesis involve cloning in Escherichia coli for passage of plasmids or for phenotypic selection. This can result in biased selection due to restriction or

  12. Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

    Science.gov (United States)

    Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

    2018-04-01

    Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.

  13. Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

    Science.gov (United States)

    Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G

    2012-10-02

    Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.

  14. Translational selection is ubiquitous in prokaryotes.

    Directory of Open Access Journals (Sweden)

    Fran Supek

    2010-06-01

    Full Text Available Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome--between 5% and 33%, depending on genome size--while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl-tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an "adaptome" by highlighting gene functions with expression levels elevated specifically in

  15. Violation of an evolutionarily conserved immunoglobulin diversity gene sequence preference promotes production of dsDNA-specific IgG antibodies.

    Directory of Open Access Journals (Sweden)

    Aaron Silva-Sanchez

    Full Text Available Variability in the developing antibody repertoire is focused on the third complementarity determining region of the H chain (CDR-H3, which lies at the center of the antigen binding site where it often plays a decisive role in antigen binding. The power of VDJ recombination and N nucleotide addition has led to the common conception that the sequence of CDR-H3 is unrestricted in its variability and random in its composition. Under this view, the immune response is solely controlled by somatic positive and negative clonal selection mechanisms that act on individual B cells to promote production of protective antibodies and prevent the production of self-reactive antibodies. This concept of a repertoire of random antigen binding sites is inconsistent with the observation that diversity (DH gene segment sequence content by reading frame (RF is evolutionarily conserved, creating biases in the prevalence and distribution of individual amino acids in CDR-H3. For example, arginine, which is often found in the CDR-H3 of dsDNA binding autoantibodies, is under-represented in the commonly used DH RFs rearranged by deletion, but is a frequent component of rarely used inverted RF1 (iRF1, which is rearranged by inversion. To determine the effect of altering this germline bias in DH gene segment sequence on autoantibody production, we generated mice that by genetic manipulation are forced to utilize an iRF1 sequence encoding two arginines. Over a one year period we collected serial serum samples from these unimmunized, specific pathogen-free mice and found that more than one-fifth of them contained elevated levels of dsDNA-binding IgG, but not IgM; whereas mice with a wild type DH sequence did not. Thus, germline bias against the use of arginine enriched DH sequence helps to reduce the likelihood of producing self-reactive antibodies.

  16. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  17. Elucidation of the sequence selective binding mode of the DNA minor groove binder adozelesin, by high-field 1H NMR and restrained molecular dynamics

    International Nuclear Information System (INIS)

    Cameron, L.

    1999-01-01

    Adozelesin (formerly U73-975, The Upjohn Co.) is a covalent, minor-groove binding analogue of the antitumour antibiotic (+)CC-1065. Adozelesin consists of a cyclopropapyrroloindole alkylating sub-unit identical to (+)CC-1065, plus indole and benzofuran sub-units which replace the more complex pyrroloindole B and C sub-units, respectively, of (+)CC-1065. Adozelesin is a clinically important drug candidate, since it does not contain the ethylene bridge moieties on the B and C sub-units which are thought to be responsible for the unusual delayed hepatotoxicity exhibited by (+)CC-1065. Sequencing techniques identified two consensus sequences for adozelesin binding as p(dA) and 5'(T/A)(T/A)T-A*(C/G)G. This suggests that adozelesin spans a total of five base-pairs and shows a preference for A=T base-pair rich sequences, thus avoiding steric crowding around the exocyclic NH 2 of guanine and a wide minor groove. In this project, the covalent modification of two DNA sequences, i.e. 5'd(CGTAAGCGCTTA*CG) 2 and 5'-d(CGAAAAA*CGG)· 5'-d(CCGTTTTTCG), by adozelesin was examined by high-field NMR and restrained molecular mechanics and dynamics. Previous studies of minor groove binding drugs, using techniques as diverse as NMR, X-ray crystallography and molecular modelling, indicate that the incorporation of a guanine into the consensus sequence sterically hinders binding and, more importantly, produces a wider minor groove which is a 'slack' fit for the ligand. The aim of this investigation was to provide an insight into the sequence selective binding of adozelesin to 5'-AAAAA*CG and 5'-GCTTA*CG. The 1 H NMR data revealed that, in both cases, β-helical structure and Watson-Crick base-pairing was maintained on adduct formation. The 5'-GCTTA*CG adduct displayed significant distortion of the guanine base on the non-covalently modified strand. This distortion resulted from an amalgamation of two factors. Firstly, the presence of a strong hydrogen-bond between the amide linker of the

  18. The span of correlations in dolphin whistle sequences

    International Nuclear Information System (INIS)

    Ferrer-i-Cancho, Ramon; McCowan, Brenda

    2012-01-01

    Long-range correlations are found in symbolic sequences from human language, music and DNA. Determining the span of correlations in dolphin whistle sequences is crucial for shedding light on their communicative complexity. Dolphin whistles share various statistical properties with human words, i.e. Zipf's law for word frequencies (namely that the probability of the ith most frequent word of a text is about i −α ) and a parallel of the tendency of more frequent words to have more meanings. The finding of Zipf's law for word frequencies in dolphin whistles has been the topic of an intense debate on its implications. One of the major arguments against the relevance of Zipf's law in dolphin whistles is that it is not possible to distinguish the outcome of a die-rolling experiment from that of a linguistic or communicative source producing Zipf's law for word frequencies. Here we show that statistically significant whistle–whistle correlations extend back to the second previous whistle in the sequence, using a global randomization test, and to the fourth previous whistle, using a local randomization test. None of these correlations are expected by a die-rolling experiment and other simple explanations of Zipf's law for word frequencies, such as Simon's model, that produce sequences of unpredictable elements

  19. Disentangling the effects of alternation rate and maximum run length on judgments of randomness

    Directory of Open Access Journals (Sweden)

    Sabine G. Scholl

    2011-08-01

    Full Text Available Binary sequences are characterized by various features. Two of these characteristics---alternation rate and run length---have repeatedly been shown to influence judgments of randomness. The two characteristics, however, have usually been investigated separately, without controlling for the other feature. Because the two features are correlated but not identical, it seems critical to analyze their unique impact, as well as their interaction, so as to understand more clearly what influences judgments of randomness. To this end, two experiments on the perception of binary sequences orthogonally manipulated alternation rate and maximum run length (i.e., length of the longest run within the sequence. Results show that alternation rate consistently exerts a unique effect on judgments of randomness, but that the effect of alternation rate is contingent on the length of the longest run within the sequence. The effect of maximum run length was found to be small and less consistent. Together, these findings extend prior randomness research by integrating literature from the realms of perception, categorization, and prediction, as well as by showing the unique and joint effects of alternation rate and maximum run length on judgments of randomness.

  20. Time Correlations of Lightning Flash Sequences in Thunderstorms Revealed by Fractal Analysis

    Science.gov (United States)

    Gou, Xueqiang; Chen, Mingli; Zhang, Guangshu

    2018-01-01

    By using the data of lightning detection and ranging system at the Kennedy Space Center, the temporal fractal and correlation of interevent time series of lightning flash sequences in thunderstorms have been investigated with Allan factor (AF), Fano factor (FF), and detrended fluctuation analysis (DFA) methods. AF, FF, and DFA methods are powerful tools to detect the time-scaling structures and correlations in point processes. Totally 40 thunderstorms with distinguishing features of a single-cell storm and apparent increase and decrease in the total flash rate were selected for the analysis. It is found that the time-scaling exponents for AF (αAF) and FF (αFF) analyses are 1.62 and 0.95 in average, respectively, indicating a strong time correlation of the lightning flash sequences. DFA analysis shows that there is a crossover phenomenon—a crossover timescale (τc) ranging from 54 to 195 s with an average of 114 s. The occurrence of a lightning flash in a thunderstorm behaves randomly at timescales τc but shows strong time correlation at scales >τc. Physically, these may imply that the establishment of an extensive strong electric field necessary for the occurrence of a lightning flash needs a timescale >τc, which behaves strongly time correlated. But the initiation of a lightning flash within a well-established extensive strong electric field may involve the heterogeneities of the electric field at a timescale τc, which behave randomly.

  1. Constraint Satisfaction Inference : Non-probabilistic Global Inference for Sequence Labelling

    NARCIS (Netherlands)

    Canisius, S.V.M.; van den Bosch, A.; Daelemans, W.; Basili, R.; Moschitti, A.

    2006-01-01

    We present a new method for performing sequence labelling based on the idea of using a machine-learning classifier to generate several possible output sequences, and then applying an inference procedure to select the best sequence among those. Most sequence labelling methods following a similar

  2. Utilization of deletion bins to anchor and order sequences along the wheat 7B chromosome.

    Science.gov (United States)

    Belova, Tatiana; Grønvold, Lars; Kumar, Ajay; Kianian, Shahryar; He, Xinyao; Lillemo, Morten; Springer, Nathan M; Lien, Sigbjørn; Olsen, Odd-Arne; Sandve, Simen R

    2014-09-01

    A total of 3,671 sequence contigs and scaffolds were mapped to deletion bins on wheat chromosome 7B providing a foundation for developing high-resolution integrated physical map for this chromosome. Bread wheat (Triticum aestivum L.) has a large, complex and highly repetitive genome which is challenging to assemble into high quality pseudo-chromosomes. As part of the international effort to sequence the hexaploid bread wheat genome by the international wheat genome sequencing consortium (IWGSC) we are focused on assembling a reference sequence for chromosome 7B. The successful completion of the reference chromosome sequence is highly dependent on the integration of genetic and physical maps. To aid the integration of these two types of maps, we have constructed a high-density deletion bin map of chromosome 7B. Using the 270 K Nimblegen comparative genomic hybridization (CGH) array on a set of cv. Chinese spring deletion lines, a total of 3,671 sequence contigs and scaffolds (~7.8 % of chromosome 7B physical length) were mapped into nine deletion bins. Our method of genotyping deletions on chromosome 7B relied on a model-based clustering algorithm (Mclust) to accurately predict the presence or absence of a given genomic sequence in a deletion line. The bin mapping results were validated using three different approaches, viz. (a) PCR-based amplification of randomly selected bin mapped sequences (b) comparison with previously mapped ESTs and (c) comparison with a 7B genetic map developed in the present study. Validation of the bin mapping results suggested a high accuracy of the assignment of 7B sequence contigs and scaffolds to the 7B deletion bins.

  3. Spectral sum rules and search for periodicities in DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.

    2011-01-01

    Periodic patterns play the important regulatory and structural roles in genomic DNA sequences. Commonly, the underlying periodicities should be understood in a broad statistical sense, since the corresponding periodic patterns have been strongly distorted by the random point mutations and insertions/deletions during molecular evolution. The latent periodicities in DNA sequences can be efficiently displayed by Fourier transform. The criteria of significance for observed periodicities are obtained via the comparison versus the counterpart characteristics of the reference random sequences. We show that the restrictions imposed on the significance criteria by the rigorous spectral sum rules can be rationally described with De Finetti distribution. This distribution provides the convenient intermediate asymptotic form between Rayleigh distribution and exact combinatoric theory. - Highlights: → We study the significance criteria for latent periodicities in DNA sequences. → The constraints imposed by sum rules can be described with De Finetti distribution. → It is intermediate between Rayleigh distribution and exact combinatoric theory. → Theory is applicable to the study of correlations between different periodicities. → The approach can be generalized to the arbitrary discrete Fourier transform.

  4. Exploiting BAC-end sequences for the mining, characterization and utility of new short sequences repeat (SSR) markers in Citrus.

    Science.gov (United States)

    Biswas, Manosh Kumar; Chai, Lijun; Mayer, Christoph; Xu, Qiang; Guo, Wenwu; Deng, Xiuxin

    2012-05-01

    The aim of this study was to develop a large set of microsatellite markers based on publicly available BAC-end sequences (BESs), and to evaluate their transferability, discriminating capacity of genotypes and mapping ability in Citrus. A set of 1,281 simple sequence repeat (SSR) markers were developed from the 46,339 Citrus clementina BAC-end sequences (BES), of them 20.67% contained SSR longer than 20 bp, corresponding to roughly one perfect SSR per 2.04 kb. The most abundant motifs were di-nucleotide (16.82%) repeats. Among all repeat motifs (TA/AT)n is the most abundant (8.38%), followed by (AG/CT)n (4.51%). Most of the BES-SSR are located in the non-coding region, but 1.3% of BES-SSRs were found to be associated with transposable element (TE). A total of 400 novel SSR primer pairs were synthesized and their transferability and polymorphism tested on a set of 16 Citrus and Citrus relative's species. Among these 333 (83.25%) were successfully amplified and 260 (65.00%) showed cross-species transferability with Poncirus trifoliata and Fortunella sp. These cross-species transferable markers could be useful for cultivar identification, for genomic study of Citrus, Poncirus and Fortunella sp. Utility of the developed SSR marker was demonstrated by identifying a set of 118 markers each for construction of linkage map of Citrus reticulata and Poncirus trifoliata. Genetic diversity and phylogenetic relationship among 40 Citrus and its related species were conducted with the aid of 25 randomly selected SSR primer pairs and results revealed that citrus genomic SSRs are superior to genic SSR for genetic diversity and germplasm characterization of Citrus spp.

  5. Mitochondrial DNA sequencing of cat hair: an informative forensic tool.

    Science.gov (United States)

    Tarditi, Christy R; Grahn, Robert A; Evans, Jeffrey J; Kurushima, Jennifer D; Lyons, Leslie A

    2011-01-01

    Approximately 81.7 million cats are in 37.5 million U.S. households. Shed fur can be criminal evidence because of transfer to victims, suspects, and/or their belongings. To improve cat hairs as forensic evidence, the mtDNA control region from single hairs, with and without root tags, was sequenced. A dataset of a 402-bp control region segment from 174 random-bred cats representing four U.S. geographic areas was generated to determine the informativeness of the mtDNA region. Thirty-two mtDNA mitotypes were observed ranging in frequencies from 0.6-27%. Four common types occurred in all populations. Low heteroplasmy, 1.7%, was determined. Unique mitotypes were found in 18 individuals, 10.3% of the population studied. The calculated discrimination power implied that 8.3 of 10 randomly selected individuals can be excluded by this region. The genetic characteristics of the region and the generated dataset support the use of this cat mtDNA region in forensic applications. 2010 American Academy of Forensic Sciences. Published 2010. This article is a U.S. Government work and is in the public domain in the U.S.A.

  6. Selective oropharyngeal decontamination versus selective digestive decontamination in critically ill patients: a meta-analysis of randomized controlled trials

    Directory of Open Access Journals (Sweden)

    Zhao D

    2015-07-01

    Full Text Available Di Zhao,1,* Jian Song,2,* Xuan Gao,3 Fei Gao,4 Yupeng Wu,2 Yingying Lu,5 Kai Hou1 1Department of Neurosurgery, The First Hospital of Hebei Medical University, 2Department of Neurosurgery, 3Department of Neurology, The Second Hospital of Hebei Medical University, 4Hebei Provincial Procurement Centers for Medical Drugs and Devices, 5Department of Neurosurgery, The Second Hospital of Hebei Medical University, Shijiazhuang People’s Republic of China *These authors contributed equally to this work Background: Selective digestive decontamination (SDD and selective oropharyngeal decontamination (SOD are associated with reduced mortality and infection rates among patients in intensive care units (ICUs; however, whether SOD has a superior effect than SDD remains uncertain. Hence, we conducted a meta-analysis of randomized controlled trials (RCTs to compare SOD with SDD in terms of clinical outcomes and antimicrobial resistance rates in patients who were critically ill. Methods: RCTs published in PubMed, Embase, and Web of Science were systematically reviewed to compare the effects of SOD and SDD in patients who were critically ill. Outcomes included day-28 mortality, length of ICU stay, length of hospital stay, duration of mechanical ventilation, ICU-acquired bacteremia, and prevalence of antibiotic-resistant Gram-negative bacteria. Results were expressed as risk ratio (RR with 95% confidence intervals (CIs, and weighted mean differences (WMDs with 95% CIs. Pooled estimates were performed using a fixed-effects model or random-effects model, depending on the heterogeneity among studies. Results: A total of four RCTs involving 23,822 patients met the inclusion criteria and were included in this meta-analysis. Among patients whose admitting specialty was surgery, cardiothoracic surgery (57.3% and neurosurgery (29.7% were the two main types of surgery being performed. Pooled results showed that SOD had similar effects as SDD in day-28 mortality (RR =1

  7. Impact of Sequencing Radiation Therapy and Chemotherapy on Long-Term Local Toxicity for Early Breast Cancer: Results of a Randomized Study at 15-Year Follow-Up

    International Nuclear Information System (INIS)

    Pinnarò, Paola; Giordano, Carolina; Farneti, Alessia; Strigari, Lidia; Landoni, Valeria; Marucci, Laura; Petrongari, Maria Grazia; Sanguineti, Giuseppe

    2016-01-01

    Purpose: To compare long-term late local toxicity after either concomitant or sequential chemoradiation therapy after breast-conserving surgery. Methods and Materials: From 1997 to 2002, women aged 18 to 75 years who underwent breast-conserving surgery and axillary dissection for early breast cancer and in whom CMF (cyclophosphamide, methotrexate, and 5-fluorouracil) chemotherapy was planned were randomized between concomitant and sequential radiation therapy. Radiation therapy was delivered to the whole breast through tangential fields to 50 Gy in 20 fractions over a period of 4 weeks, followed by an electron boost. Surviving patients were tentatively contacted and examined between March and September 2014. Patients in whom progressive disease had developed or who had undergone further breast surgery were excluded. Local toxicity (fibrosis, telangiectasia, and breast atrophy or retraction) was scored blindly to the treatment received. A logistic regression was run to investigate the effect of treatment sequence after correction for several patient-, treatment-, and tumor-related covariates on selected endpoints. The median time to cross-sectional analysis was 15.7 years (range, 12.0-17.8 years). Results: Of 206 patients randomized, 154 (74.8%) were potentially eligible. Of these, 43 (27.9%) refused participation and 4 (2.6%) had been lost to follow-up, and for 5 (3.2%), we could not restore planning data; thus, the final number of analyzed patients was 102. No grade 4 toxicity had been observed, whereas the number of grade 3 toxicity events was low (<8%) for each item, allowing pooling of grade 2 and 3 events for further analysis. Treatment sequence (concomitant vs sequential) was an independent predictor of grade 2 or 3 fibrosis according to both the National Cancer Institute Common Terminology Criteria for Adverse Events (odds ratio [OR], 4.05; 95% confidence interval [CI], 1.34-12.2; P=.013) and the SOMA (Subjective, Objective, Management and Analytic

  8. Impact of Sequencing Radiation Therapy and Chemotherapy on Long-Term Local Toxicity for Early Breast Cancer: Results of a Randomized Study at 15-Year Follow-Up

    Energy Technology Data Exchange (ETDEWEB)

    Pinnarò, Paola; Giordano, Carolina; Farneti, Alessia [Department of Radiation Oncology, Regina Elena National Cancer Institute, Rome (Italy); Strigari, Lidia; Landoni, Valeria [Department of Physics, Regina Elena National Cancer Institute, Rome (Italy); Marucci, Laura; Petrongari, Maria Grazia [Department of Radiation Oncology, Regina Elena National Cancer Institute, Rome (Italy); Sanguineti, Giuseppe, E-mail: sanguineti@ifo.it [Department of Radiation Oncology, Regina Elena National Cancer Institute, Rome (Italy)

    2016-07-15

    Purpose: To compare long-term late local toxicity after either concomitant or sequential chemoradiation therapy after breast-conserving surgery. Methods and Materials: From 1997 to 2002, women aged 18 to 75 years who underwent breast-conserving surgery and axillary dissection for early breast cancer and in whom CMF (cyclophosphamide, methotrexate, and 5-fluorouracil) chemotherapy was planned were randomized between concomitant and sequential radiation therapy. Radiation therapy was delivered to the whole breast through tangential fields to 50 Gy in 20 fractions over a period of 4 weeks, followed by an electron boost. Surviving patients were tentatively contacted and examined between March and September 2014. Patients in whom progressive disease had developed or who had undergone further breast surgery were excluded. Local toxicity (fibrosis, telangiectasia, and breast atrophy or retraction) was scored blindly to the treatment received. A logistic regression was run to investigate the effect of treatment sequence after correction for several patient-, treatment-, and tumor-related covariates on selected endpoints. The median time to cross-sectional analysis was 15.7 years (range, 12.0-17.8 years). Results: Of 206 patients randomized, 154 (74.8%) were potentially eligible. Of these, 43 (27.9%) refused participation and 4 (2.6%) had been lost to follow-up, and for 5 (3.2%), we could not restore planning data; thus, the final number of analyzed patients was 102. No grade 4 toxicity had been observed, whereas the number of grade 3 toxicity events was low (<8%) for each item, allowing pooling of grade 2 and 3 events for further analysis. Treatment sequence (concomitant vs sequential) was an independent predictor of grade 2 or 3 fibrosis according to both the National Cancer Institute Common Terminology Criteria for Adverse Events (odds ratio [OR], 4.05; 95% confidence interval [CI], 1.34-12.2; P=.013) and the SOMA (Subjective, Objective, Management and Analytic

  9. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  10. Quantifying biodiversity and asymptotics for a sequence of random strings.

    Science.gov (United States)

    Koyano, Hitoshi; Kishino, Hirohisa

    2010-06-01

    We present a methodology for quantifying biodiversity at the sequence level by developing the probability theory on a set of strings. Further, we apply our methodology to the problem of quantifying the population diversity of microorganisms in several extreme environments and digestive organs and reveal the relation between microbial diversity and various environmental parameters.

  11. Real-time definition of non-randomness in the distribution of genomic events.

    Directory of Open Access Journals (Sweden)

    Ulrich Abel

    Full Text Available Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach. For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.

  12. Strategyproof Peer Selection using Randomization, Partitioning, and Apportionment

    OpenAIRE

    Aziz, Haris; Lev, Omer; Mattei, Nicholas; Rosenschein, Jeffrey S.; Walsh, Toby

    2016-01-01

    Peer review, evaluation, and selection is a fundamental aspect of modern science. Funding bodies the world over employ experts to review and select the best proposals of those submitted for funding. The problem of peer selection, however, is much more general: a professional society may want to give a subset of its members awards based on the opinions of all members; an instructor for a MOOC or online course may want to crowdsource grading; or a marketing company may select ideas from group b...

  13. Selective enhancement of orientation tuning before saccades.

    Science.gov (United States)

    Ohl, Sven; Kuper, Clara; Rolfs, Martin

    2017-11-01

    Saccadic eye movements cause a rapid sweep of the visual image across the retina and bring the saccade's target into high-acuity foveal vision. Even before saccade onset, visual processing is selectively prioritized at the saccade target. To determine how this presaccadic attention shift exerts its influence on visual selection, we compare the dynamics of perceptual tuning curves before movement onset at the saccade target and in the opposite hemifield. Participants monitored a 30-Hz sequence of randomly oriented gratings for a target orientation. Combining a reverse correlation technique previously used to study orientation tuning in neurons and general additive mixed modeling, we found that perceptual reports were tuned to the target orientation. The gain of orientation tuning increased markedly within the last 100 ms before saccade onset. In addition, we observed finer orientation tuning right before saccade onset. This increase in gain and tuning occurred at the saccade target location and was not observed at the incongruent location in the opposite hemifield. The present findings suggest, therefore, that presaccadic attention exerts its influence on vision in a spatially and feature-selective manner, enhancing performance and sharpening feature tuning at the future gaze location before the eyes start moving.

  14. Detecting change in stochastic sound sequences.

    Directory of Open Access Journals (Sweden)

    Benjamin Skerritt-Davis

    2018-05-01

    Full Text Available Our ability to parse our acoustic environment relies on the brain's capacity to extract statistical regularities from surrounding sounds. Previous work in regularity extraction has predominantly focused on the brain's sensitivity to predictable patterns in sound sequences. However, natural sound environments are rarely completely predictable, often containing some level of randomness, yet the brain is able to effectively interpret its surroundings by extracting useful information from stochastic sounds. It has been previously shown that the brain is sensitive to the marginal lower-order statistics of sound sequences (i.e., mean and variance. In this work, we investigate the brain's sensitivity to higher-order statistics describing temporal dependencies between sound events through a series of change detection experiments, where listeners are asked to detect changes in randomness in the pitch of tone sequences. Behavioral data indicate listeners collect statistical estimates to process incoming sounds, and a perceptual model based on Bayesian inference shows a capacity in the brain to track higher-order statistics. Further analysis of individual subjects' behavior indicates an important role of perceptual constraints in listeners' ability to track these sensory statistics with high fidelity. In addition, the inference model facilitates analysis of neural electroencephalography (EEG responses, anchoring the analysis relative to the statistics of each stochastic stimulus. This reveals both a deviance response and a change-related disruption in phase of the stimulus-locked response that follow the higher-order statistics. These results shed light on the brain's ability to process stochastic sound sequences.

  15. A universal algorithm to generate pseudo-random numbers based on uniform mapping as homeomorphism

    International Nuclear Information System (INIS)

    Fu-Lai, Wang

    2010-01-01

    A specific uniform map is constructed as a homeomorphism mapping chaotic time series into [0,1] to obtain sequences of standard uniform distribution. With the uniform map, a chaotic orbit and a sequence orbit obtained are topologically equivalent to each other so the map can preserve the most dynamic properties of chaotic systems such as permutation entropy. Based on the uniform map, a universal algorithm to generate pseudo random numbers is proposed and the pseudo random series is tested to follow the standard 0–1 random distribution both theoretically and experimentally. The algorithm is not complex, which does not impose high requirement on computer hard ware and thus computation speed is fast. The method not only extends the parameter spaces but also avoids the drawback of small function space caused by constraints on chaotic maps used to generate pseudo random numbers. The algorithm can be applied to any chaotic system and can produce pseudo random sequence of high quality, thus can be a good universal pseudo random number generator. (general)

  16. A universal algorithm to generate pseudo-random numbers based on uniform mapping as homeomorphism

    Science.gov (United States)

    Wang, Fu-Lai

    2010-09-01

    A specific uniform map is constructed as a homeomorphism mapping chaotic time series into [0,1] to obtain sequences of standard uniform distribution. With the uniform map, a chaotic orbit and a sequence orbit obtained are topologically equivalent to each other so the map can preserve the most dynamic properties of chaotic systems such as permutation entropy. Based on the uniform map, a universal algorithm to generate pseudo random numbers is proposed and the pseudo random series is tested to follow the standard 0-1 random distribution both theoretically and experimentally. The algorithm is not complex, which does not impose high requirement on computer hard ware and thus computation speed is fast. The method not only extends the parameter spaces but also avoids the drawback of small function space caused by constraints on chaotic maps used to generate pseudo random numbers. The algorithm can be applied to any chaotic system and can produce pseudo random sequence of high quality, thus can be a good universal pseudo random number generator.

  17. Bottom-up driven involuntary auditory evoked field change: constant sound sequencing amplifies but does not sharpen neural activity.

    Science.gov (United States)

    Okamoto, Hidehiko; Stracke, Henning; Lagemann, Lothar; Pantev, Christo

    2010-01-01

    The capability of involuntarily tracking certain sound signals during the simultaneous presence of noise is essential in human daily life. Previous studies have demonstrated that top-down auditory focused attention can enhance excitatory and inhibitory neural activity, resulting in sharpening of frequency tuning of auditory neurons. In the present study, we investigated bottom-up driven involuntary neural processing of sound signals in noisy environments by means of magnetoencephalography. We contrasted two sound signal sequencing conditions: "constant sequencing" versus "random sequencing." Based on a pool of 16 different frequencies, either identical (constant sequencing) or pseudorandomly chosen (random sequencing) test frequencies were presented blockwise together with band-eliminated noises to nonattending subjects. The results demonstrated that the auditory evoked fields elicited in the constant sequencing condition were significantly enhanced compared with the random sequencing condition. However, the enhancement was not significantly different between different band-eliminated noise conditions. Thus the present study confirms that by constant sound signal sequencing under nonattentive listening the neural activity in human auditory cortex can be enhanced, but not sharpened. Our results indicate that bottom-up driven involuntary neural processing may mainly amplify excitatory neural networks, but may not effectively enhance inhibitory neural circuits.

  18. Direct generation of all-optical random numbers from optical pulse amplitude chaos.

    Science.gov (United States)

    Li, Pu; Wang, Yun-Cai; Wang, An-Bang; Yang, Ling-Zhen; Zhang, Ming-Jiang; Zhang, Jian-Zhong

    2012-02-13

    We propose and theoretically demonstrate an all-optical method for directly generating all-optical random numbers from pulse amplitude chaos produced by a mode-locked fiber ring laser. Under an appropriate pump intensity, the mode-locked laser can experience a quasi-periodic route to chaos. Such a chaos consists of a stream of pulses with a fixed repetition frequency but random intensities. In this method, we do not require sampling procedure and external triggered clocks but directly quantize the chaotic pulses stream into random number sequence via an all-optical flip-flop. Moreover, our simulation results show that the pulse amplitude chaos has no periodicity and possesses a highly symmetric distribution of amplitude. Thus, in theory, the obtained random number sequence without post-processing has a high-quality randomness verified by industry-standard statistical tests.

  19. Random mutagenesis of the hyperthermophilic archaeon Pyrococcus furiosus using in vitro mariner transposition and natural transformation.

    Science.gov (United States)

    Guschinskaya, Natalia; Brunel, Romain; Tourte, Maxime; Lipscomb, Gina L; Adams, Michael W W; Oger, Philippe; Charpentier, Xavier

    2016-11-08

    Transposition mutagenesis is a powerful tool to identify the function of genes, reveal essential genes and generally to unravel the genetic basis of living organisms. However, transposon-mediated mutagenesis has only been successfully applied to a limited number of archaeal species and has never been reported in Thermococcales. Here, we report random insertion mutagenesis in the hyperthermophilic archaeon Pyrococcus furiosus. The strategy takes advantage of the natural transformability of derivatives of the P. furiosus COM1 strain and of in vitro Mariner-based transposition. A transposon bearing a genetic marker is randomly transposed in vitro in genomic DNA that is then used for natural transformation of P. furiosus. A small-scale transposition reaction routinely generates several hundred and up to two thousands transformants. Southern analysis and sequencing showed that the obtained mutants contain a single and random genomic insertion. Polyploidy has been reported in Thermococcales and P. furiosus is suspected of being polyploid. Yet, about half of the mutants obtained on the first selection are homozygous for the transposon insertion. Two rounds of isolation on selective medium were sufficient to obtain gene conversion in initially heterozygous mutants. This transposition mutagenesis strategy will greatly facilitate functional exploration of the Thermococcales genomes.

  20. A programmable method for massively parallel targeted sequencing

    Science.gov (United States)

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  1. Sequence-specific DNA alkylation by tandem Py-Im polyamide conjugates.

    Science.gov (United States)

    Taylor, Rhys Dylan; Kawamoto, Yusuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2014-09-01

    Tandem N-methylpyrrole-N-methylimidazole (Py-Im) polyamides with good sequence-specific DNA-alkylating activities have been designed and synthesized. Three alkylating tandem Py-Im polyamides with different linkers, which each contained the same moiety for the recognition of a 10 bp DNA sequence, were evaluated for their reactivity and selectivity by DNA alkylation, using high-resolution denaturing gel electrophoresis. All three conjugates displayed high reactivities for the target sequence. In particular, polyamide 1, which contained a β-alanine linker, displayed the most-selective sequence-specific alkylation towards the target 10 bp DNA sequence. The tandem Py-Im polyamide conjugates displayed greater sequence-specific DNA alkylation than conventional hairpin Py-Im polyamide conjugates (4 and 5). For further research, the design of tandem Py-Im polyamide conjugates could play an important role in targeting specific gene sequences. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Random walk generated by random permutations of {1, 2, 3, ..., n + 1}

    International Nuclear Information System (INIS)

    Oshanin, G; Voituriez, R

    2004-01-01

    We study properties of a non-Markovian random walk X (n) l , l = 0, 1, 2, ..., n, evolving in discrete time l on a one-dimensional lattice of integers, whose moves to the right or to the left are prescribed by the rise-and-descent sequences characterizing random permutations π of [n + 1] = {1, 2, 3, ..., n + 1}. We determine exactly the probability of finding the end-point X n = X (n) n of the trajectory of such a permutation-generated random walk (PGRW) at site X, and show that in the limit n → ∞ it converges to a normal distribution with a smaller, compared to the conventional Polya random walk, diffusion coefficient. We formulate, as well, an auxiliary stochastic process whose distribution is identical to the distribution of the intermediate points X (n) l , l < n, which enables us to obtain the probability measure of different excursions and to define the asymptotic distribution of the number of 'turns' of the PGRW trajectories

  3. Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing.

    Science.gov (United States)

    Jeong, Hyeonsoo; Song, Ki-Duk; Seo, Minseok; Caetano-Anollés, Kelsey; Kim, Jaemin; Kwak, Woori; Oh, Jae-Don; Kim, EuiSoo; Jeong, Dong Kee; Cho, Seoae; Kim, Heebal; Lee, Hak-Kyo

    2015-08-20

    Natural and artificial selection following domestication has led to the existence of more than a hundred pig breeds, as well as incredible variation in phenotypic traits. Berkshire pigs are regarded as having superior meat quality compared to other breeds. As the meat production industry seeks selective breeding approaches to improve profitable traits such as meat quality, information about genetic determinants of these traits is in high demand. However, most of the studies have been performed using trained sensory panel analysis without investigating the underlying genetic factors. Here we investigate the relationship between genomic composition and this phenotypic trait by scanning for signatures of positive selection in whole-genome sequencing data. We generated genomes of 10 Berkshire pigs at a total of 100.6 coverage depth, using the Illumina Hiseq2000 platform. Along with the genomes of 11 Landrace and 13 Yorkshire pigs, we identified genomic variants of 18.9 million SNVs and 3.4 million Indels in the mapped regions. We identified several associated genes related to lipid metabolism, intramuscular fatty acid deposition, and muscle fiber type which attribute to pork quality (TG, FABP1, AKIRIN2, GLP2R, TGFBR3, JPH3, ICAM2, and ERN1) by applying between population statistical tests (XP-EHH and XP-CLR). A statistical enrichment test was also conducted to detect breed specific genetic variation. In addition, de novo short sequence read assembly strategy identified several candidate genes (SLC25A14, IGF1, PI4KA, CACNA1A) as also contributing to lipid metabolism. Results revealed several candidate genes involved in Berkshire meat quality; most of these genes are involved in lipid metabolism and intramuscular fat deposition. These results can provide a basis for future research on the genomic characteristics of Berkshire pigs.

  4. Elastic wave localization in two-dimensional phononic crystals with one-dimensional random disorder and aperiodicity

    International Nuclear Information System (INIS)

    Yan Zhizhong; Zhang Chuanzeng; Wang Yuesheng

    2011-01-01

    The band structures of in-plane elastic waves propagating in two-dimensional phononic crystals with one-dimensional random disorder and aperiodicity are analyzed in this paper. The localization of wave propagation is discussed by introducing the concept of the localization factor, which is calculated by the plane-wave-based transfer-matrix method. By treating the random disorder and aperiodicity as the deviation from the periodicity in a special way, three kinds of aperiodic phononic crystals that have normally distributed random disorder, Thue-Morse and Rudin-Shapiro sequence in one direction and translational symmetry in the other direction are considered and the band structures are characterized using localization factors. Besides, as a special case, we analyze the band gap properties of a periodic planar layered composite containing a periodic array of square inclusions. The transmission coefficients based on eigen-mode matching theory are also calculated and the results show the same behaviors as the localization factor does. In the case of random disorders, the localization degree of the normally distributed random disorder is larger than that of the uniformly distributed random disorder although the eigenstates are both localized no matter what types of random disorders, whereas, for the case of Thue-Morse and Rudin-Shapiro structures, the band structures of Thue-Morse sequence exhibit similarities with the quasi-periodic (Fibonacci) sequence not present in the results of the Rudin-Shapiro sequence.

  5. A Bidirectional Generalized Synchronization Theorem-Based Chaotic Pseudo-random Number Generator

    Directory of Open Access Journals (Sweden)

    Han Shuangshuang

    2013-07-01

    Full Text Available Based on a bidirectional generalized synchronization theorem for discrete chaos system, this paper introduces a new 5-dimensional bidirectional generalized chaos synchronization system (BGCSDS, whose prototype is a novel chaotic system introduced in [12]. Numerical simulation showed that two pair variables of the BGCSDS achieve generalized chaos synchronization via a transform H.A chaos-based pseudo-random number generator (CPNG was designed by the new BGCSDS. Using the FIPS-140-2 tests issued by the National Institute of Standard and Technology (NIST verified the randomness of the 1000 binary number sequences generated via the CPNG and the RC4 algorithm respectively. The results showed that all the tested sequences passed the FIPS-140-2 tests. The confidence interval analysis showed the statistical properties of the randomness of the sequences generated via the CPNG and the RC4 algorithm do not have significant differences.

  6. Fat suppression in MR imaging with binomial pulse sequences

    International Nuclear Information System (INIS)

    Baudovin, C.J.; Bryant, D.J.; Bydder, G.M.; Young, I.R.

    1989-01-01

    This paper reports on a study to develop pulse sequences allowing suppression of fat signal on MR images without eliminating signal from other tissues with short T1. They have developed such a technique involving selective excitation of protons in water, based on a binomial pulse sequence. Imaging is performed at 0.15 T. Careful shimming is performed to maximize separation of fat and water peaks. A spin-echo 1,500/80 sequence is used, employing 90 degrees pulse with transit frequency optimized for water with null excitation of 20 H offset, followed by a section-selective 180 degrees pulse. With use of the binomial sequence for imagining, reduction in fat signal is seen on images of the pelvis and legs of volunteers. Patient studies show dramatic improvement in visualization of prostatic carcinoma compared with standard sequences

  7. 47 CFR 1.1604 - Post-selection hearings.

    Science.gov (United States)

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Post-selection hearings. 1.1604 Section 1.1604 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1604 Post-selection hearings. (a) Following the random...

  8. Random genetic drift, natural selection, and noise in human cranial evolution.

    Science.gov (United States)

    Roseman, Charles C

    2016-08-01

    This study assesses the extent to which relationships among groups complicate comparative studies of adaptation in recent human cranial variation and the extent to which departures from neutral additive models of evolution hinder the reconstruction of population relationships among groups using cranial morphology. Using a maximum likelihood evolutionary model fitting approach and a mixed population genomic and cranial data set, I evaluate the relative fits of several widely used models of human cranial evolution. Moreover, I compare the goodness of fit of models of cranial evolution constrained by genomic variation to test hypotheses about population specific departures from neutrality. Models from population genomics are much better fits to cranial variation than are traditional models from comparative human biology. There is not enough evolutionary information in the cranium to reconstruct much of recent human evolution but the influence of population history on cranial variation is strong enough to cause comparative studies of adaptation serious difficulties. Deviations from a model of random genetic drift along a tree-like population history show the importance of environmental effects, gene flow, and/or natural selection on human cranial variation. Moreover, there is a strong signal of the effect of natural selection or an environmental factor on a group of humans from Siberia. The evolution of the human cranium is complex and no one evolutionary process has prevailed at the expense of all others. A holistic unification of phenome, genome, and environmental context, gives us a strong point of purchase on these problems, which is unavailable to any one traditional approach alone. Am J Phys Anthropol 160:582-592, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence

    Directory of Open Access Journals (Sweden)

    Valeri Stefanov

    2007-01-01

    Full Text Available A binary sequence of zeros and ones is called a (d,k-sequence if it does not contain runs of zeros of length either less than d or greater than k, where d and k are arbitrary, but fixed, non-negative integers and d < k. Such sequences find an abundance of applications in communications, in particular for magnetic and optical recording. Occasionally, one requires that (d,k-sequences do not contain a specific pattern w. Therefore, distribution results concerning pattern occurrence in (d,k-sequences are of interest. In this paper we study the distribution of the waiting time until the r th occurrence of a pattern w in a random (d,k-sequence generated by a Markov source. Numerical examples are also provided.

  10. Transcriptome Sequencing and Development of Genic SSR Markers of an Endangered Chinese Endemic Genus Dipteronia Oliver (Aceraceae).

    Science.gov (United States)

    Zhou, Tao; Li, Zhong-Hu; Bai, Guo-Qing; Feng, Li; Chen, Chen; Wei, Yue; Chang, Yong-Xia; Zhao, Gui-Fang

    2016-02-23

    Dipteronia Oliver (Aceraceae) is an endangered Chinese endemic genus consisting of two living species, Dipteronia sinensis and Dipteronia dyeriana. However, studies on the population genetics and evolutionary analyses of Dipteronia have been hindered by limited genomic resources and genetic markers. Here, the generation, de novo assembly and annotation of transcriptome datasets, and a large set of microsatellite or simple sequence repeat (SSR) markers derived from Dipteronia have been described. After Illumina pair-end sequencing, approximately 93.2 million reads were generated and assembled to yield a total of 99,358 unigenes. A majority of these unigenes (53%, 52,789) had at least one blast hit against the public protein databases. Further, 12,377 SSR loci were detected and 4179 primer pairs were designed for experimental validation. Of these 4179 primer pairs, 435 primer pairs were randomly selected to test polymorphism. Our results show that products from 132 primer pairs were polymorphic, in which 97 polymorphic SSR markers were further selected to analyze the genetic diversity of 10 natural populations of Dipteronia. The identification of SSR markers during our research will provide the much valuable data for population genetic analyses and evolutionary studies in Dipteronia.

  11. Random and externally controlled occurrences of Dansgaard–Oeschger events

    Directory of Open Access Journals (Sweden)

    J. Lohmann

    2018-05-01

    Full Text Available Dansgaard–Oeschger (DO events constitute the most pronounced mode of centennial to millennial climate variability of the last glacial period. Since their discovery, many decades of research have been devoted to understand the origin and nature of these rapid climate shifts. In recent years, a number of studies have appeared that report emergence of DO-type variability in fully coupled general circulation models via different mechanisms. These mechanisms result in the occurrence of DO events at varying degrees of regularity, ranging from periodic to random. When examining the full sequence of DO events as captured in the North Greenland Ice Core Project (NGRIP ice core record, one can observe high irregularity in the timing of individual events at any stage within the last glacial period. In addition to the prevailing irregularity, certain properties of the DO event sequence, such as the average event frequency or the relative distribution of cold versus warm periods, appear to be changing throughout the glacial. By using statistical hypothesis tests on simple event models, we investigate whether the observed event sequence may have been generated by stationary random processes or rather was strongly modulated by external factors. We find that the sequence of DO warming events is consistent with a stationary random process, whereas dividing the event sequence into warming and cooling events leads to inconsistency with two independent event processes. As we include external forcing, we find a particularly good fit to the observed DO sequence in a model where the average residence time in warm periods are controlled by global ice volume and cold periods by boreal summer insolation.

  12. Random and externally controlled occurrences of Dansgaard-Oeschger events

    Science.gov (United States)

    Lohmann, Johannes; Ditlevsen, Peter D.

    2018-05-01

    Dansgaard-Oeschger (DO) events constitute the most pronounced mode of centennial to millennial climate variability of the last glacial period. Since their discovery, many decades of research have been devoted to understand the origin and nature of these rapid climate shifts. In recent years, a number of studies have appeared that report emergence of DO-type variability in fully coupled general circulation models via different mechanisms. These mechanisms result in the occurrence of DO events at varying degrees of regularity, ranging from periodic to random. When examining the full sequence of DO events as captured in the North Greenland Ice Core Project (NGRIP) ice core record, one can observe high irregularity in the timing of individual events at any stage within the last glacial period. In addition to the prevailing irregularity, certain properties of the DO event sequence, such as the average event frequency or the relative distribution of cold versus warm periods, appear to be changing throughout the glacial. By using statistical hypothesis tests on simple event models, we investigate whether the observed event sequence may have been generated by stationary random processes or rather was strongly modulated by external factors. We find that the sequence of DO warming events is consistent with a stationary random process, whereas dividing the event sequence into warming and cooling events leads to inconsistency with two independent event processes. As we include external forcing, we find a particularly good fit to the observed DO sequence in a model where the average residence time in warm periods are controlled by global ice volume and cold periods by boreal summer insolation.

  13. Dominant accident sequences in Oconee-1 pressurized water reactor

    International Nuclear Information System (INIS)

    Dearing, J.F.; Henninger, R.J.; Nassersharif, B.

    1985-04-01

    A set of dominant accident sequences in the Oconee-1 pressurized water reactor was selected using probabilistic risk analysis methods. Because some accident scenarios were similar, a subset of four accident sequences was selected to be analyzed with the Transient Reactor Analysis Code (TRAC) to further our insights into similar types of accidents. The sequences selected were loss-of-feedwater, small-small break loss-of-coolant, loss-of-feedwater-initiated transient without scram, and interfacing systems loss-of-coolant accidents. The normal plant response and the impact of equipment availability and potential operator actions were also examined. Strategies were developed for operator actions not covered in existing emergency operator guidelines and were tested using TRAC simulations to evaluate their effectiveness in preventing core uncovery and maintaining core cooling

  14. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Wenyu Zhang

    Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.

  15. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

    Science.gov (United States)

    Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H

    2017-07-01

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in

  16. Failure Probability Estimation Using Asymptotic Sampling and Its Dependence upon the Selected Sampling Scheme

    Directory of Open Access Journals (Sweden)

    Martinásková Magdalena

    2017-12-01

    Full Text Available The article examines the use of Asymptotic Sampling (AS for the estimation of failure probability. The AS algorithm requires samples of multidimensional Gaussian random vectors, which may be obtained by many alternative means that influence the performance of the AS method. Several reliability problems (test functions have been selected in order to test AS with various sampling schemes: (i Monte Carlo designs; (ii LHS designs optimized using the Periodic Audze-Eglājs (PAE criterion; (iii designs prepared using Sobol’ sequences. All results are compared with the exact failure probability value.

  17. Metaheuristic approaches to order sequencing on a unidirectional picking line

    Directory of Open Access Journals (Sweden)

    AP de Villiers

    2013-06-01

    Full Text Available In this paper the sequencing of orders on a unidirectional picking line is considered. The aim of the order sequencing is to minimise the number of cycles travelled by a picker within the picking line to complete all orders. A tabu search, simulated annealing, genetic algorithm, generalised extremal optimisation and a random local search are presented as possible solution approaches. Computational results based on real life data instances are presented for these metaheuristics and compared to the performance of a lower bound and the solutions used in practise. The random local search exhibits the best overall solution quality, however, the generalised extremal optimisation approach delivers comparable results in considerably shorter computational times.

  18. PILOT DECONTAMINATION THROUGH PILOT SEQUENCE HOPPING IN MASSIVE MIMO SYSTEMS

    DEFF Research Database (Denmark)

    2015-01-01

    path between one of the users and one of the base stations define one of the channels. The system comprises a pilot generation unit configured to assign pilot sequences randomly among the users and a pilot processing unit configured to filter the pilot sequences received from a user of interest so...... that the channel coefficient of the channel of the user of interest is determined. The pilot sequences received from the user of interest are contaminated by other non-orthogonal or identical pilot sequences from other users of the cell of interest or other cells. The filter is configured so that the contamination...... caused by the other non-orthogonal or identical pilot sequences from the other users is reduced....

  19. Universal sequence replication, reversible polymerization and early functional biopolymers: a model for the initiation of prebiotic sequence evolution.

    Directory of Open Access Journals (Sweden)

    Sara Imari Walker

    Full Text Available Many models for the origin of life have focused on understanding how evolution can drive the refinement of a preexisting enzyme, such as the evolution of efficient replicase activity. Here we present a model for what was, arguably, an even earlier stage of chemical evolution, when polymer sequence diversity was generated and sustained before, and during, the onset of functional selection. The model includes regular environmental cycles (e.g. hydration-dehydration cycles that drive polymers between times of replication and functional activity, which coincide with times of different monomer and polymer diffusivity. Template-directed replication of informational polymers, which takes place during the dehydration stage of each cycle, is considered to be sequence-independent. New sequences are generated by spontaneous polymer formation, and all sequences compete for a finite monomer resource that is recycled via reversible polymerization. Kinetic Monte Carlo simulations demonstrate that this proposed prebiotic scenario provides a robust mechanism for the exploration of sequence space. Introduction of a polymer sequence with monomer synthetase activity illustrates that functional sequences can become established in a preexisting pool of otherwise non-functional sequences. Functional selection does not dominate system dynamics and sequence diversity remains high, permitting the emergence and spread of more than one functional sequence. It is also observed that polymers spontaneously form clusters in simulations where polymers diffuse more slowly than monomers, a feature that is reminiscent of a previous proposal that the earliest stages of life could have been defined by the collective evolution of a system-wide cooperation of polymer aggregates. Overall, the results presented demonstrate the merits of considering plausible prebiotic polymer chemistries and environments that would have allowed for the rapid turnover of monomer resources and for

  20. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  1. An efficient binomial model-based measure for sequence comparison and its application.

    Science.gov (United States)

    Liu, Xiaoqing; Dai, Qi; Li, Lihua; He, Zerong

    2011-04-01

    Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.

  2. An Improved Particle Swarm Optimization for Selective Single Machine Scheduling with Sequence Dependent Setup Costs and Downstream Demands

    Directory of Open Access Journals (Sweden)

    Kun Li

    2015-01-01

    Full Text Available This paper investigates a special single machine scheduling problem derived from practical industries, namely, the selective single machine scheduling with sequence dependent setup costs and downstream demands. Different from traditional single machine scheduling, this problem further takes into account the selection of jobs and the demands of downstream lines. This problem is formulated as a mixed integer linear programming model and an improved particle swarm optimization (PSO is proposed to solve it. To enhance the exploitation ability of the PSO, an adaptive neighborhood search with different search depth is developed based on the decision characteristics of the problem. To improve the search diversity and make the proposed PSO algorithm capable of getting out of local optimum, an elite solution pool is introduced into the PSO. Computational results based on extensive test instances show that the proposed PSO can obtain optimal solutions for small size problems and outperform the CPLEX and some other powerful algorithms for large size problems.

  3. Automated Selection Of Pictures In Sequences

    Science.gov (United States)

    Rorvig, Mark E.; Shelton, Robert O.

    1995-01-01

    Method of automated selection of film or video motion-picture frames for storage or examination developed. Beneficial in situations in which quantity of visual information available exceeds amount stored or examined by humans in reasonable amount of time, and/or necessary to reduce large number of motion-picture frames to few conveying significantly different information in manner intermediate between movie and comic book or storyboard. For example, computerized vision system monitoring industrial process programmed to sound alarm when changes in scene exceed normal limits.

  4. An expressed sequence tag (EST) library for Drosophila serrata, a model system for sexual selection and climatic adaptation studies.

    Science.gov (United States)

    Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F

    2009-01-21

    The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for

  5. Selectivity in multiple quantum nuclear magnetic resonance

    International Nuclear Information System (INIS)

    Warren, W.S.

    1980-11-01

    The observation of multiple-quantum nuclear magnetic resonance transitions in isotropic or anisotropic liquids is shown to give readily interpretable information on molecular configurations, rates of motional processes, and intramolecular interactions. However, the observed intensity of high multiple-quantum transitions falls off dramatically as the number of coupled spins increases. The theory of multiple-quantum NMR is developed through the density matrix formalism, and exact intensities are derived for several cases (isotropic first-order systems and anisotropic systems with high symmetry) to shown that this intensity decrease is expected if standard multiple-quantum pulse sequences are used. New pulse sequences are developed which excite coherences and produce population inversions only between selected states, even though other transitions are simultaneously resonant. One type of selective excitation presented only allows molecules to absorb and emit photons in groups of n. Coherent averaging theory is extended to describe these selective sequences, and to design sequences which are selective to arbitrarily high order in the Magnus expansion. This theory and computer calculations both show that extremely good selectivity and large signal enhancements are possible

  6. Selectivity in multiple quantum nuclear magnetic resonance

    Energy Technology Data Exchange (ETDEWEB)

    Warren, Warren Sloan [Univ. of California, Berkeley, CA (United States). Dept. of Chemistry; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Materials Sciences Division

    1980-11-01

    The observation of multiple-quantum nuclear magnetic resonance transitions in isotropic or anisotropic liquids is shown to give readily interpretable information on molecular configurations, rates of motional processes, and intramolecular interactions. However, the observed intensity of high multiple-quantum transitions falls off dramatically as the number of coupled spins increases. The theory of multiple-quantum NMR is developed through the density matrix formalism, and exact intensities are derived for several cases (isotropic first-order systems and anisotropic systems with high symmetry) to shown that this intensity decrease is expected if standard multiple-quantum pulse sequences are used. New pulse sequences are developed which excite coherences and produce population inversions only between selected states, even though other transitions are simultaneously resonant. One type of selective excitation presented only allows molecules to absorb and emit photons in groups of n. Coherent averaging theory is extended to describe these selective sequences, and to design sequences which are selective to arbitrarily high order in the Magnus expansion. This theory and computer calculations both show that extremely good selectivity and large signal enhancements are possible.

  7. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    Science.gov (United States)

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the

  8. A Note on Sequence Prediction over Large Alphabets

    Directory of Open Access Journals (Sweden)

    Travis Gagie

    2012-02-01

    Full Text Available Building on results from data compression, we prove nearly tight bounds on how well sequences of length n can be predicted in terms of the size σ of the alphabet and the length k of the context considered when making predictions. We compare the performance achievable by an adaptive predictor with no advance knowledge of the sequence, to the performance achievable by the optimal static predictor using a table listing the frequency of each (k + 1-tuple in the sequence. We show that, if the elements of the sequence are chosen uniformly at random, then an adaptive predictor can compete in the expected case if k ≤ logσ n – 3 – ε, for a constant ε > 0, but not if k ≥ logσ n.

  9. Diagnostic accuracy of unenhanced, contrast-enhanced perfusion and angiographic MRI sequences for pulmonary embolism diagnosis: results of independent sequence readings

    Energy Technology Data Exchange (ETDEWEB)

    Revel, Marie Pierre [Hopital Europeen Georges Pompidou, APHP, Departments of Radiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); Hotel-Dieu, Service de Radiologie, Paris (France); Sanchez, Olivier; Meyer, Guy [Hopital Europeen Georges Pompidou, APHP, Respiratory and intensive care and, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); INSERM Unite 765, Paris (France); Lefort, Catherine; Couchon, Sophie; Hernigou, Anne; Frija, Guy [Hopital Europeen Georges Pompidou, APHP, Departments of Radiology, Paris (France); Niarra, Ralph [Hopital Europeen Georges Pompidou, APHP, Clinical Epidemiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); Chatellier, Gilles [Hopital Europeen Georges Pompidou, APHP, Clinical Epidemiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); INSERM CIC-EC E4, Paris (France)

    2013-09-15

    To independently evaluate unenhanced, contrast-enhanced perfusion and angiographic MR sequences for pulmonary embolism (PE) diagnosis. Prospective investigation, including 274 patients who underwent perfusion, unenhanced 2D steady-state-free-precession (SSFP) and contrast-enhanced 3D angiographic MR sequences on a 1.5-T unit, in addition to CTA (CT angiography). Two independent readers evaluated each sequence independently in random order. Sensitivity, specificity, predictive values and inter-reader agreement were calculated for each sequence, excluding sequences judged inconclusive. Sensitivity was also calculated according to PE location. Contrast-enhanced angiographic sequences showed the highest sensitivity (82.9 and 89.7 %, reader 1 and reader 2, respectively), specificity (98.5 and 100 %) and agreement (kappa value 0.77). Unenhanced angiographic sequences, although less sensitive overall (68.7 and 76.4 %), were sensitive for the detection of proximal PE (92.7 and 100 %) and showed high specificity (96.1 and 99.1 %) and good agreement (kappa value 0.62). Perfusion sequences showed lower sensitivity (75.0 and 79.3 %), specificity (84.8 and 89.7 %) and agreement (kappa value 0.51), and a negative predictive value of 84.8 % at best. Compared with contrast-enhanced angiographic sequences, unenhanced sequences demonstrate lower sensitivity, except for proximal PE, but high specificity and agreement. The negative predictive value of perfusion sequences was insufficient to safely rule out PE. (orig.)

  10. Absence of zero-temperature transmission rate of a double-chain tight-binding model for DNA with random sequence of nucleotides in thermodynamic limit

    International Nuclear Information System (INIS)

    Xiong Gang; Wang, X.R.

    2005-01-01

    The zero-temperature transmission rate spectrum of a double-chain tight-binding model for real DNA is calculated. It is shown that a band of extended-like states exists only for finite chain length with strong inter-chain coupling. While the whole spectrum tends to zero in thermodynamic limit, regardless of the strength of inter-chain coupling. It is also shown that a more faithful model for real DNA with periodic sugar-phosphate chains in backbone structures can be mapped into the above simple double-chain tight-binding model. Combined with above results, the transmission rate of real DNA with long random sequence of nucleotides is expected to be poor

  11. Structure and Sequence Search on Aptamer-Protein Docking

    Science.gov (United States)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  12. Multiplex Amplification Refractory Mutation System PCR (ARMS-PCR) provides sequencing independent typing of canine parvovirus.

    Science.gov (United States)

    Chander, Vishal; Chakravarti, Soumendu; Gupta, Vikas; Nandi, Sukdeb; Singh, Mithilesh; Badasara, Surendra Kumar; Sharma, Chhavi; Mittal, Mitesh; Dandapat, S; Gupta, V K

    2016-12-01

    Canine parvovirus-2 antigenic variants (CPV-2a, CPV-2b and CPV-2c) ubiquitously distributed worldwide in canine population causes severe fatal gastroenteritis. Antigenic typing of CPV-2 remains a prime focus of research groups worldwide in understanding the disease epidemiology and virus evolution. The present study was thus envisioned to provide a simple sequencing independent, rapid, robust, specific, user-friendly technique for detecting and typing of presently circulating CPV-2 antigenic variants. ARMS-PCR strategy was employed using specific primers for CPV-2a, CPV-2b and CPV-2c to differentiate these antigenic types. ARMS-PCR was initially optimized with reference positive controls in two steps; where first reaction was used to differentiate CPV-2a from CPV-2b/CPV-2c. The second reaction was carried out with CPV-2c specific primers to confirm the presence of CPV-2c. Initial validation of the ARMS-PCR was carried out with 24 sequenced samples and the results were matched with the sequencing results. ARMS-PCR technique was further used to screen and type 90 suspected clinical samples. Randomly selected 15 suspected clinical samples that were typed with this technique were sequenced. The results of ARMS-PCR and the sequencing matched exactly with each other. The developed technique has a potential to become a sequencing independent method for simultaneous detection and typing of CPV-2 antigenic variants in veterinary disease diagnostic laboratories globally. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  14. Unusual loop-sequence flexibility of the proximal RNA replication element in EMCV.

    Directory of Open Access Journals (Sweden)

    Jan Zoll

    Full Text Available Picornaviruses contain stable RNA structures at the 5' and 3' ends of the RNA genome, OriL and OriR involved in viral RNA replication. The OriL RNA element found at the 5' end of the enterovirus genome folds into a cloverleaf-like configuration. In vivo SELEX experiments revealed that functioning of the poliovirus cloverleaf depends on a specific structure in this RNA element. Little is known about the OriL of cardioviruses. Here, we investigated structural aspects and requirements of the apical loop of proximal stem-loop SL-A of mengovirus, a strain of EMCV. Using NMR spectroscopy, we showed that the mengovirus SL-A apical loop consists of an octaloop. In vivo SELEX experiments demonstrated that a large number of random sequences are tolerated in the apical octaloop that support virus replication. Mutants in which the SL-A loop size and the length of the upper part of the stem were varied showed that both stem-length and stability of the octaloop are important determinants for viral RNA replication and virus reproduction. Together, these data show that stem-loop A plays an important role in virus replication. The high degree of sequence flexibility and the lack of selective pressure on the octaloop argue against a role in sequence specific RNA-protein or RNA-RNA interactions in which octaloop nucleotides are involved.

  15. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

    Science.gov (United States)

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153

  16. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers.

    Science.gov (United States)

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies.

  17. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  18. Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

    Science.gov (United States)

    Davis, C A; Wyatt, G R

    1989-01-01

    The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148

  19. Blind Measurement Selection: A Random Matrix Theory Approach

    KAUST Repository

    Elkhalil, Khalil; Kammoun, Abla; Al-Naffouri, Tareq Y.; Alouini, Mohamed-Slim

    2016-01-01

    -aware fashions. We present two potential applications where the proposed algorithms can be used, namely antenna selection for uplink transmissions in large scale multi-user systems and sensor selection for wireless sensor networks. Numerical results are also

  20. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    Directory of Open Access Journals (Sweden)

    Jens H. Kuhn

    2014-09-01

    Full Text Available Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s RefSeq is a non-redundant, curated database for reference (or type nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ (////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

  1. Preparing and Analyzing Expressed Sequence Tags (ESTs Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    Directory of Open Access Journals (Sweden)

    Nehir Ozdemir Ozgenturk

    2017-01-01

    Full Text Available Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260. EST data in this study have provided a new source of information to functional genome studies of sheep.

  2. Effects of choice architecture and chef-enhanced meals on the selection and consumption of healthier school foods: a randomized clinical trial.

    Science.gov (United States)

    Cohen, Juliana F W; Richardson, Scott A; Cluggish, Sarah A; Parker, Ellen; Catalano, Paul J; Rimm, Eric B

    2015-05-01

    Little is known about the long-term effect of a chef-enhanced menu on healthier food selection and consumption in school lunchrooms. In addition, it remains unclear if extended exposure to other strategies to promote healthier foods (eg, choice architecture) also improves food selection or consumption. To evaluate the short- and long-term effects of chef-enhanced meals and extended exposure to choice architecture on healthier school food selection and consumption. A school-based randomized clinical trial was conducted during the 2011-2012 school year among 14 elementary and middle schools in 2 urban, low-income school districts (intent-to-treat analysis). Included in the study were 2638 students in grades 3 through 8 attending participating schools (38.4% of eligible participants). Schools were first randomized to receive a professional chef to improve school meal palatability (chef schools) or to a delayed intervention (control group). To assess the effect of choice architecture (smart café), all schools after 3 months were then randomized to the smart café intervention or to the control group. School food selection was recorded, and consumption was measured using plate waste methods. After 3 months, vegetable selection increased in chef vs control schools (odds ratio [OR], 1.75; 95% CI, 1.36-2.24), but there was no effect on the selection of other components or on meal consumption. After long-term or extended exposure to the chef or smart café intervention, fruit selection increased in the chef (OR, 3.08; 95% CI, 2.23-4.25), smart café (OR, 1.45; 95% CI, 1.13-1.87), and chef plus smart café (OR, 3.10; 95% CI, 2.26-4.25) schools compared with the control schools, and consumption increased in the chef schools (OR, 0.17; 95% CI, 0.03-0.30 cups/d). Vegetable selection increased in the chef (OR, 2.54; 95% CI, 1.83-3.54), smart café (OR, 1.91; 95% CI, 1.46-2.50), and chef plus smart café schools (OR, 7.38, 95% CI, 5.26-10.35) compared with the control schools

  3. Modified random hinge transport mechanics and multiple scattering step-size selection in EGS5

    International Nuclear Information System (INIS)

    Wilderman, S.J.; Bielajew, A.F.

    2005-01-01

    The new transport mechanics in EGS5 allows for significantly longer electron transport step sizes and hence shorter computation times than required for identical problems in EGS4. But as with all Monte Carlo electron transport algorithms, certain classes of problems exhibit step-size dependencies even when operating within recommended ranges, sometimes making selection of step-sizes a daunting task for novice users. Further contributing to this problem, because of the decoupling of multiple scattering and continuous energy loss in the dual random hinge transport mechanics of EGS5, there are two independent step sizes in EGS5, one for multiple scattering and one for continuous energy loss, each of which influences speed and accuracy in a different manner. Further, whereas EGS4 used a single value of fractional energy loss (ESTEPE) to determine step sizes at all energies, to increase performance by decreasing the amount of effort expended simulating lower energy particles, EGS5 permits the fractional energy loss values which are used to determine both the multiple scattering and continuous energy loss step sizes to vary with energy. This results in requiring the user to specify four fractional energy loss values when optimizing computations for speed. Thus, in order to simplify step-size selection and to mitigate step-size dependencies, a method has been devised to automatically optimize step-size selection based on a single material dependent input related to the size of problem tally region. In this paper we discuss the new transport mechanics in EGS5 and describe the automatic step-size optimization algorithm. (author)

  4. Entropy and long-range memory in random symbolic additive Markov chains.

    Science.gov (United States)

    Melnik, S S; Usatenko, O V

    2016-06-01

    The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.

  5. Cluster Tails for Critical Power-Law Inhomogeneous Random Graphs

    Science.gov (United States)

    van der Hofstad, Remco; Kliem, Sandra; van Leeuwaarden, Johan S. H.

    2018-04-01

    Recently, the scaling limit of cluster sizes for critical inhomogeneous random graphs of rank-1 type having finite variance but infinite third moment degrees was obtained in Bhamidi et al. (Ann Probab 40:2299-2361, 2012). It was proved that when the degrees obey a power law with exponent τ \\in (3,4), the sequence of clusters ordered in decreasing size and multiplied through by n^{-(τ -2)/(τ -1)} converges as n→ ∞ to a sequence of decreasing non-degenerate random variables. Here, we study the tails of the limit of the rescaled largest cluster, i.e., the probability that the scaling limit of the largest cluster takes a large value u, as a function of u. This extends a related result of Pittel (J Combin Theory Ser B 82(2):237-269, 2001) for the Erdős-Rényi random graph to the setting of rank-1 inhomogeneous random graphs with infinite third moment degrees. We make use of delicate large deviations and weak convergence arguments.

  6. Development and Evaluation of a Novel Set of EST-SSR Markers Based on Transcriptome Sequences of Black Locust (Robinia pseudoacacia L.).

    Science.gov (United States)

    Guo, Qi; Wang, Jin-Xing; Su, Li-Zhuo; Lv, Wei; Sun, Yu-Han; Li, Yun

    2017-07-07

    Black locust ( Robinia pseudoacacia L. of the family Fabaceae) is an ecologically and economically important deciduous tree. However, few genomic resources are available for this forest species, and few effective expressed sequence tag-derived simple sequence repeat (EST-SSR) markers have been developed to date. In this study, paired-end sequencing was used to sequence transcriptomes of R. pseudoacacia by the Illumina HiSeq TM2000 platform, and EST-SSR loci were identified by de novo assembly. Furthermore, a total of 1697 primer pairs were successfully designed, from which 286 primers met the selection screening criteria; 94 pairs were randomly selected and tested for validation using polymerase chain reaction amplification. Forty-five primers were verified as polymorphic, with clear bands. The polymorphism information content values were 0.033-0.765, the number of alleles per locus ranged from 2 to 10, and the observed and expected heterozygosities were 0.000-0.931 and 0.035-0.810, respectively, indicating a high level of informativeness. Subsequently, 45 polymorphic EST-SSR loci were tested for amplification efficiency, using the verified primers, in an additional nine species of Leguminosae, 23 loci were amplified in more than three species, of which two loci were amplified successfully in all species. These EST-SSR markers provide a valuable tool for investigating the genetic diversity and population structure of R . pseudoacacia , constructing a DNA fingerprint database, performing quantitative trait locus mapping, and preserving genetic information.

  7. Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

    Science.gov (United States)

    Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

    2017-12-01

    The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

  8. Comparative analysis of response to selection with three insecticides in the dengue mosquito Aedes aegypti using mRNA sequencing.

    Science.gov (United States)

    David, Jean-Philippe; Faucon, Frédéric; Chandor-Proust, Alexia; Poupardin, Rodolphe; Riaz, Muhammad Asam; Bonin, Aurélie; Navratil, Vincent; Reynaud, Stéphane

    2014-03-05

    Mosquito control programmes using chemical insecticides are increasingly threatened by the development of resistance. Such resistance can be the consequence of changes in proteins targeted by insecticides (target site mediated resistance), increased insecticide biodegradation (metabolic resistance), altered transport, sequestration or other mechanisms. As opposed to target site resistance, other mechanisms are far from being fully understood. Indeed, insecticide selection often affects a large number of genes and various biological processes can hypothetically confer resistance. In this context, the aim of the present study was to use RNA sequencing (RNA-seq) for comparing transcription level and polymorphism variations associated with adaptation to chemical insecticides in the mosquito Aedes aegypti. Biological materials consisted of a parental susceptible strain together with three child strains selected across multiple generations with three insecticides from different classes: the pyrethroid permethrin, the neonicotinoid imidacloprid and the carbamate propoxur. After ten generations, insecticide-selected strains showed elevated resistance levels to the insecticides used for selection. RNA-seq data allowed detecting over 13,000 transcripts, of which 413 were differentially transcribed in insecticide-selected strains as compared to the susceptible strain. Among them, a significant enrichment of transcripts encoding cuticle proteins, transporters and enzymes was observed. Polymorphism analysis revealed over 2500 SNPs showing > 50% allele frequency variations in insecticide-selected strains as compared to the susceptible strain, affecting over 1000 transcripts. Comparing gene transcription and polymorphism patterns revealed marked differences among strains. While imidacloprid selection was linked to the over transcription of many genes, permethrin selection was rather linked to polymorphism variations. Focusing on detoxification enzymes revealed that permethrin

  9. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  10. Phage display selection of efficient glutamine-donor substrate peptides for transglutaminase 2.

    Science.gov (United States)

    Keresztessy, Zsolt; Csosz, Eva; Hársfalvi, Jolán; Csomós, Krisztián; Gray, Joe; Lightowlers, Robert N; Lakey, Jeremy H; Balajthy, Zoltán; Fésüs, László

    2006-11-01

    Understanding substrate specificity and identification of natural targets of transglutaminase 2 (TG2), the ubiquitous multifunctional cross-linking enzyme, which forms isopeptide bonds between protein-linked glutamine and lysine residues, is crucial in the elucidation of its physiological role. As a novel means of specificity analysis, we adapted the phage display technique to select glutamine-donor substrates from a random heptapeptide library via binding to recombinant TG2 and elution with a synthetic amine-donor substrate. Twenty-six Gln-containing sequences from the second and third biopanning rounds were susceptible for TG2-mediated incorporation of 5-(biotinamido)penthylamine, and the peptides GQQQTPY, GLQQASV, and WQTPMNS were modified most efficiently. A consensus around glutamines was established as pQX(P,T,S)l, which is consistent with identified substrates listed in the TRANSDAB database. Database searches showed that several proteins contain peptides similar to the phage-selected sequences, and the N-terminal glutamine-rich domain of SWI1/SNF1-related chromatin remodeling proteins was chosen for detailed analysis. MALDI/TOF and tandem mass spectrometry-based studies of a representative part of the domain, SGYGQQGQTPYYNQQSPHPQQQQPPYS (SnQ1), revealed that Q(6), Q(8), and Q(22) are modified by TG2. Kinetic parameters of SnQ1 transamidation (K(M)(app) = 250 microM, k(cat) = 18.3 sec(-1), and k(cat)/K(M)(app) = 73,200) classify it as an efficient TG2 substrate. Circular dichroism spectra indicated that SnQ1 has a random coil conformation, supporting its accessibility in the full-length parental protein. Added together, here we report a novel use of the phage display technology with great potential in transglutaminase research.

  11. In silico evidence for sequence-dependent nucleosome sliding

    Energy Technology Data Exchange (ETDEWEB)

    Lequieu, Joshua; Schwartz, David C.; de Pablo, Juan J.

    2017-10-18

    Nucleosomes represent the basic building block of chromatin and provide an important mechanism by which cellular processes are controlled. The locations of nucleosomes across the genome are not random but instead depend on both the underlying DNA sequence and the dynamic action of other proteins within the nucleus. These processes are central to cellular function, and the molecular details of the interplay between DNA sequence and nudeosome dynamics remain poorly understood. In this work, we investigate this interplay in detail by relying on a molecular model, which permits development of a comprehensive picture of the underlying free energy surfaces and the corresponding dynamics of nudeosome repositioning. The mechanism of nudeosome repositioning is shown to be strongly linked to DNA sequence and directly related to the binding energy of a given DNA sequence to the histone core. It is also demonstrated that chromatin remodelers can override DNA-sequence preferences by exerting torque, and the histone H4 tail is then identified as a key component by which DNA-sequence, histone modifications, and chromatin remodelers could in fact be coupled.

  12. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    Science.gov (United States)

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the

  13. Complete plastid genome sequences suggest strong selection for retention of photosynthetic genes in the parasitic plant genus Cuscuta.

    Science.gov (United States)

    McNeal, Joel R; Kuehl, Jennifer V; Boore, Jeffrey L; de Pamphilis, Claude W

    2007-10-24

    Plastid genome content and protein sequence are highly conserved across land plants and their closest algal relatives. Parasitic plants, which obtain some or all of their nutrition through an attachment to a host plant, are often a striking exception. Heterotrophy can lead to relaxed constraint on some plastid genes or even total gene loss. We sequenced plastid genomes of two species in the parasitic genus Cuscuta along with a non-parasitic relative, Ipomoea purpurea, to investigate changes in the plastid genome that may result from transition to the parasitic lifestyle. Aside from loss of all ndh genes, Cuscuta exaltata retains photosynthetic and photorespiratory genes that evolve under strong selective constraint. Cuscuta obtusiflora has incurred substantially more change to its plastid genome, including loss of all genes for the plastid-encoded RNA polymerase. Despite extensive change in gene content and greatly increased rate of overall nucleotide substitution, C. obtusiflora also retains all photosynthetic and photorespiratory genes with only one minor exception. Although Epifagus virginiana, the only other parasitic plant with its plastid genome sequenced to date, has lost a largely overlapping set of transfer-RNA and ribosomal genes as Cuscuta, it has lost all genes related to photosynthesis and maintains a set of genes which are among the most divergent in Cuscuta. Analyses demonstrate photosynthetic genes are under the highest constraint of any genes within the plastid genomes of Cuscuta, indicating a function involving RuBisCo and electron transport through photosystems is still the primary reason for retention of the plastid genome in these species.

  14. Complete plastid genome sequences suggest strong selection for retention of photosynthetic genes in the parasitic plant genus Cuscuta

    Directory of Open Access Journals (Sweden)

    Kuehl Jennifer V

    2007-10-01

    Full Text Available Abstract Background Plastid genome content and protein sequence are highly conserved across land plants and their closest algal relatives. Parasitic plants, which obtain some or all of their nutrition through an attachment to a host plant, are often a striking exception. Heterotrophy can lead to relaxed constraint on some plastid genes or even total gene loss. We sequenced plastid genomes of two species in the parasitic genus Cuscuta along with a non-parasitic relative, Ipomoea purpurea, to investigate changes in the plastid genome that may result from transition to the parasitic lifestyle. Results Aside from loss of all ndh genes, Cuscuta exaltata retains photosynthetic and photorespiratory genes that evolve under strong selective constraint. Cuscuta obtusiflora has incurred substantially more change to its plastid genome, including loss of all genes for the plastid-encoded RNA polymerase. Despite extensive change in gene content and greatly increased rate of overall nucleotide substitution, C. obtusiflora also retains all photosynthetic and photorespiratory genes with only one minor exception. Conclusion Although Epifagus virginiana, the only other parasitic plant with its plastid genome sequenced to date, has lost a largely overlapping set of transfer-RNA and ribosomal genes as Cuscuta, it has lost all genes related to photosynthesis and maintains a set of genes which are among the most divergent in Cuscuta. Analyses demonstrate photosynthetic genes are under the highest constraint of any genes within the plastid genomes of Cuscuta, indicating a function involving RuBisCo and electron transport through photosystems is still the primary reason for retention of the plastid genome in these species.

  15. Evolving artificial metalloenzymes via random mutagenesis

    Science.gov (United States)

    Yang, Hao; Swartz, Alan M.; Park, Hyun June; Srivastava, Poonam; Ellis-Guardiola, Ken; Upp, David M.; Lee, Gihoon; Belsare, Ketaki; Gu, Yifan; Zhang, Chen; Moellering, Raymond E.; Lewis, Jared C.

    2018-03-01

    Random mutagenesis has the potential to optimize the efficiency and selectivity of protein catalysts without requiring detailed knowledge of protein structure; however, introducing synthetic metal cofactors complicates the expression and screening of enzyme libraries, and activity arising from free cofactor must be eliminated. Here we report an efficient platform to create and screen libraries of artificial metalloenzymes (ArMs) via random mutagenesis, which we use to evolve highly selective dirhodium cyclopropanases. Error-prone PCR and combinatorial codon mutagenesis enabled multiplexed analysis of random mutations, including at sites distal to the putative ArM active site that are difficult to identify using targeted mutagenesis approaches. Variants that exhibited significantly improved selectivity for each of the cyclopropane product enantiomers were identified, and higher activity than previously reported ArM cyclopropanases obtained via targeted mutagenesis was also observed. This improved selectivity carried over to other dirhodium-catalysed transformations, including N-H, S-H and Si-H insertion, demonstrating that ArMs evolved for one reaction can serve as starting points to evolve catalysts for others.

  16. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  17. Rapid development of microsatellite markers for Callosobruchus chinensis using Illumina paired-end sequencing.

    Directory of Open Access Journals (Sweden)

    Can-Xing Duan

    Full Text Available BACKGROUND: The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing. PRINCIPAL FINDINGS: In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations. CONCLUSIONS: This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.

  18. Free Energy Self-Averaging in Protein-Sized Random Heteropolymers

    International Nuclear Information System (INIS)

    Chuang, Jeffrey; Grosberg, Alexander Yu.; Kardar, Mehran

    2001-01-01

    Current theories of heteropolymers are inherently macroscopic, but are applied to mesoscopic proteins. To compute the free energy over sequences, one assumes self-averaging -- a property established only in the macroscopic limit. By enumerating the states and energies of compact 18, 27, and 36mers on a lattice with an ensemble of random sequences, we test the self-averaging approximation. We find that fluctuations in the free energy between sequences are weak, and that self-averaging is valid at the scale of real proteins. The results validate sequence design methods which exponentially speed up computational design and simplify experimental realizations

  19. Analysis of the overdispersed clock in the short-term evolution of hepatitis C virus: Using the E1/E2 gene sequences to infer infection dates in a single source outbreak.

    Science.gov (United States)

    Wróbel, Borys; Torres-Puente, Manuela; Jiménez, Nuria; Bracho, María Alma; García-Robles, Inmaculada; Moya, Andrés; González-Candelas, Fernando

    2006-06-01

    The assumption of a molecular clock for dating events from sequence information is often frustrated by the presence of heterogeneity among evolutionary rates due, among other factors, to positively selected sites. In this work, our goal is to explore methods to estimate infection dates from sequence analysis. One such method, based on site stripping for clock detection, was proposed to unravel the clocklike molecular evolution in sequences showing high variability of evolutionary rates and in the presence of positive selection. Other alternatives imply accommodating heterogeneity in evolutionary rates at various levels, without eliminating any information from the data. Here we present the analysis of a data set of hepatitis C virus (HCV) sequences from 24 patients infected by a single individual with known dates of infection. We first used a simple criterion of relative substitution rate for site removal prior to a regression analysis. Time was regressed on maximum likelihood pairwise evolutionary distances between the sequences sampled from the source individual and infected patients. We show that it is indeed the fastest evolving sites that disturb the molecular clock and that these sites correspond to positively selected codons. The high computational efficiency of the regression analysis allowed us to compare the site-stripping scheme with random removal of sites. We demonstrate that removing the fast-evolving sites significantly increases the accuracy of estimation of infection times based on a single substitution rate. However, the time-of-infection estimations improved substantially when a more sophisticated and computationally demanding Bayesian method was used. This method was used with the same data set but keeping all the sequence positions in the analysis. Consequently, despite the distortion introduced by positive selection on evolutionary rates, it is possible to obtain quite accurate estimates of infection dates, a result of especial relevance for

  20. Single-Genome Sequencing of Hepatitis C Virus in Donor-Recipient Pairs Distinguishes Modes and Models of Virus Transmission and Early Diversification.

    Science.gov (United States)

    Li, Hui; Stoddard, Mark B; Wang, Shuyi; Giorgi, Elena E; Blair, Lily M; Learn, Gerald H; Hahn, Beatrice H; Alter, Harvey J; Busch, Michael P; Fierer, Daniel S; Ribeiro, Ruy M; Perelson, Alan S; Bhattacharya, Tanmoy; Shaw, George M

    2016-01-01

    Despite the recent development of highly effective anti-hepatitis C virus (HCV) drugs, the global burden of this pathogen remains immense. Control or eradication of HCV will likely require the broad application of antiviral drugs and development of an effective vaccine. A precise molecular identification of transmitted/founder (T/F) HCV genomes that lead to productive clinical infection could play a critical role in vaccine research, as it has for HIV-1. However, the replication schema of these two RNA viruses differ substantially, as do viral responses to innate and adaptive host defenses. These differences raise questions as to the certainty of T/F HCV genome inferences, particularly in cases where multiple closely related sequence lineages have been observed. To clarify these issues and distinguish between competing models of early HCV diversification, we examined seven cases of acute HCV infection in humans and chimpanzees, including three examples of virus transmission between linked donors and recipients. Using single-genome sequencing (SGS) of plasma vRNA, we found that inferred T/F sequences in recipients were identical to viral sequences in their respective donors. Early in infection, HCV genomes generally evolved according to a simple model of random evolution where the coalescent corresponded to the T/F sequence. Closely related sequence lineages could be explained by high multiplicity infection from a donor whose viral sequences had undergone a pretransmission bottleneck due to treatment, immune selection, or recent infection. These findings validate SGS, together with mathematical modeling and phylogenetic analysis, as a novel strategy to infer T/F HCV genome sequences. Despite the recent development of highly effective, interferon-sparing anti-hepatitis C virus (HCV) drugs, the global burden of this pathogen remains immense. Control or eradication of HCV will likely require the broad application of antiviral drugs and the development of an effective